Data Activation in the Modern Data Stack
Tools and technologies that enable activating or taking action on data
In this issue, I’m covering the technologies that make customer data available in downstream tools where data is eventually activated.
The activation layer of the modern data stack is my favorite since it allows you to take action on the data – in the tools you depend on – to build personalized, data-powered experiences.
You finally get to go beyond looking at dashboards and utilize data in a meaningful manner, and in the process, do more impactful work.
With so many companies innovating and building products to activate data, it’s not straightforward to ascertain which of the processes, tools, and technologies should fall under data activation.
After talking to many founders and giving it a lot of thought, here’s what I recommend the activation layer should comprise:
All the technologies that make customer data available in downstream tools – CDI (Customer Data Infrastructure), CDP (Customer Data Platform), Reverse ETL (or operational analytics), and whatever comes next.
All the downstream tools where data is eventually activated – sales, marketing, advertising, and support tools.
It’s worth noting that each of these technologies has certain pros and cons, and many companies end up combining multiple solutions to cater to the needs of different teams.
CDI and CDP
There is some confusion between CDI and CDP primarily because a CDP is essentially a CDI that also has identity resolution and audience-building capabilities.
The core premise of a CDI is to track customer data from first-party data sources – your website and apps – and sync the data to a data warehouse as well as to third-party tools (downstream systems where data is eventually activated).
Segment, through its range of products, makes it easy to understand the differences between CDI and CDP. Connections, Segment’s core product, is a CDI solution whereas Personas is Segment’s CDP offering that is sold as an add-on. mParticle too offers its CDI solution without CDP capabilities.
Other CDI solutions include RudderStack, Snowplow, Jitsu, MetaRouter, and Freshpaint. They all have different capabilities and support different destinations but all of them can be used to track data and sync it to a data warehouse. Some of these also support downstream tools as destinations where data is activated.
To summarize, both CDI and CDP solutions enable data activation by syncing data to downstream systems like sales, marketing, advertising, and support tools where data is activated.
CDPs, however, are preferred by less-technical folks as they are able to build and sync audiences to downstream tools using a visual interface offered by every CDP. And doing so enables them to move fast without relying on engineering or data teams.
CDPs are Going Beyond Infrastructure
It’s worth mentioning that while CDP vendors have largely been focused on collecting and moving data, some of them also allow you to activate the data and orchestrate campaigns across multiple channels such as email and SMS.
Exponea is one such CDP that has inbuilt functionality to build personalized, data-led experiences. After its acquisition by Twilio, Segment is moving in this direction as well and already integrates deeply with Twilio for SMS and Twilio-owned SendGrid for emails.
This is a natural expansion for CDPs and I believe more vendors will either build or buy activation products going forward.
Reverse ETL or Operational Analytics
The rapid adoption of cloud data warehouses (like Snowflake, BigQuery, and Redshift) has given rise to Reverse ETL – a new paradigm in data integration that enables activating data that is already stored in the data warehouse.
Companies with dedicated data teams are investing heavily into consolidating all customer data in the data warehouse using a combination of CDI solutions and ELT tools (like Fivetran, Airbyte, and Meltano).
And now Reverse ETL tools like Census, Grouparoo, and Hightouch are making it really easy to build data models (or audiences) on top of the data stored in the warehouse, sync those models to downstream tools they integrate with, as well as trigger workflows in those downstream tools. While some of these tools offer a visual interface to query data and build audiences, the primary method to do so is via SQL.
Companies that already have a data warehouse are discovering the benefits of Reverse ETL by syncing modeled data from the warehouse to downstream tools (instead of syncing raw data directly from the source).
However, as mentioned earlier, each of these solutions has some pros and cons. Adopting a Reverse ETL solution requires an in-house data team to do the following:
Maintain a data warehouse and ensure that data is clean and modeled (often using a transformation tool like dbt)
Track data from first-party apps and store it in the warehouse (using a CDI tool)
Ingest data into the warehouse from third-party tools (using an ELT tool)
And finally, write SQL to sync data from the warehouse to downstream tools (using a Reverse ETL tool)
There are obvious benefits to this approach – the biggest being data ownership and the flexibility to switch tools when needed.
However, maintaining such a data stack requires significant investment that is often not feasible for early-stage startups or even mid-size companies that are not in the business of selling technology and don’t have dedicated engineering or data teams.
The Unusual Suspects
There are a couple of unusual suspects that I think are worth mentioning when talking about data activation.
Heap and Amplitude, popular Product Analytics (PA) tools, recently launched Act and Recommend respectively – new data activation products to enable their users to go beyond analysis.
Normally, you’d build cohorts in a PA tool to analyze the data and then build the same segments in a CDP or in your engagement tools; whereas now you can do it all in one integrated system.
But that’s not it – with two-way integrations with engagement tools, you can now analyze your campaign metrics directly inside the product analytics tool, enabling you to measure the true impact of your engagement campaigns.
While data people are bound to have reservations about this approach because point-to-point integrations are likely to cause data woes (lovingly referred to as data spaghetti) in the long run, this new capability is truly exciting for product and growth people.
It is a common struggle for them to go beyond measuring performance via vanity metrics such as email open rates and measure the true impact of those emails – whether or not someone performed the desired action inside the app after opening (or not opening) emails from a particular campaign.
That said, you can still measure the impact of your engagement activities by combining product data with engagement data but the process to do so requires additional tools and talent.
While these new capabilities indicate that Product Analytics tools are treading into CDP waters, I believe they are simply fulfilling the needs of their core user personas – growth and product people – and are unlikely to serve other use cases that CDPs are really good at.