Meandering the Modern Data Landscape

Share this post
Tools to Collect Data From Primary Sources
arpitc.substack.com

Tools to Collect Data From Primary Sources

The core product that runs on proprietary code is a primary or first-party data source that includes websites, web and mobile apps, as well as IoT devices.

Arpit Choudhury
Mar 26, 2021
Comment
Share

This issue covers tools and technologies you can use to collect data from your primary or first-party data sources.

It’s good to keep in mind that the terms tools and technologies are not used interchangeably – tools are specific products that fall under one or more technologies or categories.  

Tracking tools fall under two main categories – Customer Data Platform or CDP (that does a lot more than tracking) and Customer Data Infrastructure or CDI that is purpose-built for tracking.

Customer Data Platform (CDP)

A CDP is an all-in-one solution that not only takes care of data collection from primary or first-party sources (website, web app, and mobile apps), but also has the capability to ingest data from secondary or third-party sources (external tools used for sales, marketing, advertising, etc.) 

More importantly, a CDP’s core capability is identity resolution which makes it possible to create user segments or audiences by combining customer data from multiple sources and syncing those segments to external tools.

Below I have only mentioned horizontal CDPs that are industry-agnostic but there are many vertical CDPs that cater to the needs of specific industries such as SaaS or ecommerce and they are definitely worth exploring if you’re looking to invest in one. 

Segment

Segment is by far the most popular CDP vendor on the market (which led to its acquisition by Twilio for $3.2B) but what’s interesting and not very well-known is that Segment offers multiple products, and one of them, Personas, is their CDP offering which is sold as an add-on to their core product, Connections.

Most people refer to Connections when they talk about Segment and it has long been the go-to tracking solution for companies of all sizes. So even if you don’t need CDP capabilities, Segment Connections can take care of your tracking needs.

Segment also offers a data governance tool called Protocols which is also sold as an add-on to Connections. 

mParticle

mParticle is one of the most popular horizontal CDPs on the market with its core offering being a CDI solution to collect data and sync it to third-party tools and data warehouses. Its CDP capabilities — audience building and identity resolution — are available as add-ons along with data governance tools.

Other popular CDPs include Treasure Data, Tealium, and Lytics.

Customer Data Infrastructure (CDI)

CDI is not yet a well-defined category and is used in various contexts today. However, tracking is a core component of Customer Data Infrastructure, and therefore, it makes sense to categorize purpose-built tracking tools as CDI.

It’s helpful to keep in mind that all the CDI solutions mentioned here are, one way or the other, alternatives to Segment Connections.

So here are various CDI solutions ordered by popularity and relevance:

Segment Connections

As mentioned above, Connections is Segment’s CDI offering and is available as a standalone product, allowing you to use it even if you don’t need a full-blown CDP (Segment Personas).

mParticle Standard

As mentioned above, you can use mParticle’s Standard edition which is its CDI offering.

Rudderstack

Rudderstack also offers multiple products but their core product is exactly like Segment Connections. However, RudderStack is open source and you can choose to self deploy it instead of opting for their managed solution. RudderStack supports all popular warehouses as well as a growing library of third-party tools as destinations.

Snowplow 

Snowplow calls itself a behavioral data collection platform that is also open source and also offers a cloud version. Snowplow’s approach is different in that it only syncs data to data warehouses and doesn’t support any other cloud destinations.

Also, implementing Snowplow requires expertise in its proprietary technology and is therefore only suitable if your company has a dedicated data team.

Jitsu

Jitsu is another open source CDI solution that positions itself as a Segment alternative. Jitsu supports major data warehouses and a handful of external tools as destinations.

Avo

Avo is not exactly positioned as a CDI solution and is focused on solving the collaboration and governance problems around tracking.

Avo enables you to create a smart tracking plan using an easy-to-use interface and then generates code for your developers to implement tracking based on the events and properties defined. This gives product managers more control over the taxonomy and eliminates the scope for errors during implementation.

Iteratively (acquired by Amplitude)

Iteratively is an alternative to Avo and they both have a similar approach to tracking. They both offer limited integrations but integrate with Segment as a destination, allowing you to further send data to all the destinations supported by Segment.

This might seem odd as Iteratively and Avo are also alternatives to Segment Connections but it makes sense to use them in conjunction with Connections (RudderStack) until they expand their destinations library.  In fact, Avo now has an official partnership with RudderStack,

Avo and Iteratively also support custom destinations, allowing you to integrate with internal tools as well. 

MetaRouter

MetaRouter is a relatively new solution that is focused on server-side tracking and offers private cloud and on-premise installations which are ideal for larger companies with stricter norms for data privacy, security, and compliance.

Freshpaint 

Freshpaint offers a hybrid solution wherein besides tracking data via code (which is always recommended), you can set up auto-tracking that gathers data without code (also known as implicit tracking). The hybrid approach brings more flexibility to teams with limited engineering resources.

That’s all!

It’s good to keep in mind that whether you opt for a CDP or not, you definitely need a CDI to collect behavioral data from your primary data sources.

CommentComment
ShareShare

Create your profile

0 subscriptions will be displayed on your profile (edit)

Skip for now

Only paid subscribers can comment on this post

Already a paid subscriber? Sign in

Check your email

For your security, we need to re-authenticate you.

Click the link we sent to , or click here to sign in.

TopNew

No posts

Ready for more?

© 2022 Arpit Choudhury
Privacy ∙ Terms ∙ Collection notice
Publish on Substack Get the app
Substack is the home for great writing