This guide covers the technologies you can use to collect data from your secondary data sources — third-party SaaS tools that power customer experiences.
But first, let’s look at why you need to collect data from secondary sources (third-party tools that power customer experiences) and how this data complements the data collected from primary data sources (core product experience powered by proprietary code).
Why collect data from secondary sources
Data from third-party apps (secondary source) is referred to as object data since data is stored as objects (contacts, leads, messages, campaigns, etc).
Third-party tools or secondary data sources store data as objects (contacts, leads, messages, campaigns, payments, etc) and this data is referred to as object data. However, some third-party tools also store behavioral data or event data.
Whether data is stored as objects or events, some common use cases of collecting data from secondary sources are as follows:
- Data from the CRM provides context about customers and the accounts they belong to
- Data from marketing, advertising, and support tools helps understand how users engage with your brand across multiple touchpoints
- Data from payment processing services provides insights into how users transact with your product
Data collected from primary and secondary sources are complementary as the two need to be combined to understand the customer journey, derive insights, and drive action based on those insights.
Tools and Technologies
Data from secondary sources is stored alongside data from primary sources in a data warehouse such as Snowflake, AWS Redshift, or Google BigQuery.
The process of extracting data from third-party tools and loading the data into a warehouse is referred to as data ingestion. And the tools that enable data ingestion are referred to as ETL or ELT.
ETL and ELT
ELT stands for Extract, Load, Transform which refers to extracting data from the data sources, loading the data into a data warehouse, and then transforming or wrangling the data to derive insights and drive action.
It’s interesting to note that ELT tools only take care of the E (extraction) and L (loading) whereas the T (transformation) takes place in the warehouse for which you may or may not use an external tool.
Popular EL(T) tools include Fivetran, Stitch, and, Matillion as well as open-source alternatives like Airbyte and Meltano.
These tools support a wide range of data sources including third-party SaaS tools, databases (like MySQL and PostgreSQL), as well as cloud storage services (like Amazon S3 and Google Cloud Storage).
The tools mentioned here all have certain strengths and weaknesses, especially in regard to the connectors or data sources they support. Additionally, their pricing models vary and what you end up paying might be different even with the same number of data sources and the same volume of data.
It’s also helpful to know that EL(T) tools are often referred to as ETL tools since ETL (Extract, Transform, Load) is the older paradigm under which data had to be transformed before being loaded into a data warehouse.
Even though ETL has largely been displaced by EL(T) (new paradigm), many EL(T) tools continue to be referred to as ETL tools (old paradigm) and many ETL tools are now calling themselves EL(T) tools (new paradigm).
In an attempt to keep things simple, Fivetran calls itself a data integration tool. However, this further complicates the matter because data integration goes beyond ETL or EL(T), encompassing Reverse ETL, iPaaS, and CDI/CDP.
2023 Update: Fivetran no longer calls itself a data integration tool – it has repositioned itself as a data movement platform.
Customer Data Infrastructure (CDI)
CDI offerings from Segment and mParticle, or RudderStack have the ability to extract data from a variety of cloud applications and store the data in a data warehouse.
However, CDI is not purpose-built to ingest data from third-party tools into a data warehouse — its core utility is to collect behavioral data from primary data sources (core product experience powered by proprietary code).
ELT tools, on the other hand, are purpose-built to ingest data into data warehouses and therefore offer more robust integrations, faster syncing capabilities, and other advanced functionality.
iPaaS-based Integration Tools
iPaaS (integration platform as a service) solutions such as Tray, Workato, Integromat (Make), or Zapier can also be used to extract data from third-party SaaS tools and load the data into data warehouses.
However, iPaaS tools are designed to perform actions (such as loading data) based on a trigger (an event such as a new contact being created in the CRM) and are more suited to automate workflows rather than ingest data in a warehouse.