Link copied to clipboard.

ETL vs. ELT: Data Ingestion Tools

Tools to collect data from secondary data sources.

Created :  
May 29, 2023
Created :  
April 1, 2021
|
Updated :  
September 25, 2023
time illustration svg
(#)
Minutes
(#)
Minutes

This guide covers the technologies you can use to collect data from your secondary data sources — third-party SaaS tools that power customer experiences.

But first, let’s look at why you need to collect data from secondary sources (third-party tools that power customer experiences) and how this data complements the data collected from primary data sources (core product experience powered by proprietary code).

Why collect data from secondary sources

Data from third-party apps (secondary source) is referred to as object data since data is stored as objects (contacts, leads, messages, campaigns, etc).

Third-party tools or secondary data sources store data as objects (contacts, leads, messages, campaigns, payments, etc) and this data is referred to as object data. However, some third-party tools also store behavioral data or event data.

Whether data is stored as objects or events, some common use cases of collecting data from secondary sources are as follows:

  • Data from the CRM provides context about customers and the accounts they belong to
  • Data from marketing, advertising, and support tools helps understand how users engage with your brand across multiple touchpoints
  • Data from payment processing services provides insights into how users transact with your product

Data collected from primary and secondary sources are complementary as the two need to be combined to understand the customer journey, derive insights, and drive action based on those insights.

Tools and Technologies

Data from secondary sources is stored alongside data from primary sources in a data warehouse such as Snowflake, AWS Redshift, or Google BigQuery.

The process of extracting data from third-party tools and loading the data into a warehouse is referred to as data ingestion. And the tools that enable data ingestion are referred to as ETL or ELT.

ETL and ELT

ELT stands for Extract, Load, Transform which refers to extracting data from the data sources, loading the data into a data warehouse, and then transforming or wrangling the data to derive insights and drive action.

It’s interesting to note that ELT tools only take care of the E (extraction) and L (loading) whereas the T (transformation) takes place in the warehouse for which you may or may not use an external tool.

Popular EL(T) tools include Fivetran, Stitch, and, Matillion as well as open-source alternatives like Airbyte and Meltano.

These tools support a wide range of data sources including third-party SaaS tools, databases (like MySQL and PostgreSQL), as well as cloud storage services (like Amazon S3 and Google Cloud Storage).

The tools mentioned here all have certain strengths and weaknesses, especially in regard to the connectors or data sources they support. Additionally, their pricing models vary and what you end up paying might be different even with the same number of data sources and the same volume of data.

It’s also helpful to know that EL(T) tools are often referred to as ETL tools since ETL (Extract, Transform, Load) is the older paradigm under which data had to be transformed before being loaded into a data warehouse.

 ETL vs. ELT: How data flows
ETL vs. ELT: How data flows

Even though ETL has largely been displaced by EL(T) (new paradigm), many EL(T) tools continue to be referred to as ETL tools (old paradigm) and many ETL tools are now calling themselves EL(T) tools (new paradigm).

In an attempt to keep things simple, Fivetran calls itself a data integration tool. However, this further complicates the matter because data integration goes beyond ETL or EL(T), encompassing Reverse ETL, iPaaS, and CDI/CDP.

2023 Update: Fivetran no longer calls itself a data integration tool – it has repositioned itself as a data movement platform.

Customer Data Infrastructure (CDI)

CDI offerings from Segment and mParticle, or RudderStack have the ability to extract data from a variety of cloud applications and store the data in a data warehouse.

However, CDI is not purpose-built to ingest data from third-party tools into a data warehouse — its core utility is to collect behavioral data from primary data sources (core product experience powered by proprietary code).

ELT tools, on the other hand, are purpose-built to ingest data into data warehouses and therefore offer more robust integrations, faster syncing capabilities, and other advanced functionality.

iPaaS-based Integration Tools

iPaaS (integration platform as a service) solutions such as Tray, Workato, Integromat (Make), or Zapier can also be used to extract data from third-party SaaS tools and load the data into data warehouses.

However, iPaaS tools are designed to perform actions (such as loading data) based on a trigger (an event such as a new contact being created in the CRM) and are more suited to automate workflows rather than ingest data in a warehouse.

ABOUT THE AUTHOR
Arpit Choudhury

As the founder and operator of databeats, Arpit has made it his mission to beat the gap between data people and non-data people for good.

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Explore the full series:
No items found.
No items found.
Join the community

It's time to come together

Welcome to the community!
Oops! Your data didn't make it to our database – can you try again?

line

thick-line

red-line

thick-red-line