Building your open source Airbyte integrations from source to destination
In today’s data-driven world, integrating various platforms and databases is crucial. With tools like Airbyte offering many ETL and integration connectors, data synchronization can be efficiently automated.
But what is Airbyte, and how do its connectors play a pivotal role in data orchestration?
What is Airbyte?
Airbyte is a data integration framework designed to streamline ETL processes. With an extensive list of pre-built Airbyte connectors and the flexibility to create custom ones, it has gained traction among data enthusiasts and businesses.
There are two core principles of an Airbyte Integration, the data source and destination.
Airbyte Integrations and Connectors
The strength lies in its vast array of Airbyte’s ETL connectors, which can be categorized as:
Data Source Connectors: These are responsible for fetching data from various sources. Source connectos cover a broad spectrum, from databases to SaaS platforms, ensuring data from almost any source can be integrated.
Destination Connectors: After fetching data, you need to send it somewhere. Whether cloud storage, a data warehouse, or another database, Airbyte’s destination connectors ensure data lands where you need it.
Connecting the two creates an integration “pipeline”, moving data from the source to a target destination.
Airbyte Integrations: Bring Your Own App
Airbyte connectors follow the “Bring Your Own App” (BYOA) model. A “Bring Your Own App” reflects a customer-created app, typically an app upstream at a data source. For example, if you want to use the open-source Amazon Selling Partner API connector, Amazon requires you to “register” an app to access a seller or vendor account to sync data. Without access to the source, there is no data to sync and each source system defines the available data and protocols on how that data can be accessed.
Given the nature of “BYOA,” the configuration of sources and destinations is vital.
Airbyte Connector Config Files?
One of its requirements to successfully run Airbyte pipelines is sorting out your source and destination configurations.
Finding the configuration specifications for Airbyte connectors can be challenging to locate. They are often nested deep within the Airbyte repository or spread across different directories.
Thankfully, many source and destination configs are available on GitHub via the open source Airbridge project.
Go to the Airbridge project on GitHub. In the project are two folders:
- Data Sources: config/reference/source
- Data Destinations: config/reference/destination
Select the data source or destination configuration that matches your needs. You are welcome to browse the hundreds of pre-generated source and destination configs.
Examples Source Connectors
- Sales Connectors: These connectors focus on integrating data from sales platforms like Salesforce, HubSpot, and Pipedrive. They help businesses streamline their sales data, ensuring seamless syncing across platforms and aiding in sales analytics.
- Ads Connectors: Geared towards marketers and advertising professionals, Ads connectors allow data integration from advertising platforms such as TikTok Marketing, Google Ads, Facebook Ads, and Bing Ads. This ensures that advertising metrics, campaign data, and performance indicators are consolidated and easily accessible.
- CRM Connectors: CRM (Customer Relationship Management) connectors cater to platforms like Salesforce Pardot, Microsoft Dynamics, NetSuite, and Freshsales. These connectors are pivotal for businesses aiming to have a unified view of customer interactions, ensuring that data from CRM platforms is synced with other data sources.
- Finance & Accounting Connectors: These connectors are designed to integrate data from financial and accounting platforms such as QuickBooks, Xero, and Stripe. They are essential for businesses to maintain accurate financial records, consolidate transaction data, and ensure financial compliance.
What is in a source config? Here is an example config for TikTok Marketing.
{
"credentials": {
"auth_type": "optional_value",
"app_id": "optional_value",
"secret": "optional_value",
"access_token": "optional_value",
"advertiser_id": "optional_value"
},
"start_date": "optional_value",
"end_date": "optional_value",
"attribution_window": "optional_value",
"include_deleted": "optional_value"
}
The Airbyte connector source config file for TikTok Marketing can be found here:
Here is an example connector config for Bing Ads.
{
"auth_method": "optional_value",
"tenant_id": "optional_value",
"client_id": "required_value",
"client_secret": "optional_value",
"refresh_token": "required_value",
"developer_token": "required_value",
"reports_start_date": "required_value",
"lookback_window": "optional_value"
}
The config file for Bing Ads can be found here:
airbridge/config/reference/source/source_bing_ads_source_spec.json at dev · openbridge/airbridge
Here is an example Airbyte connector config for Hubspot.
{
"start_date": "required_value",
"credentials": {
"credentials_title": "optional_value",
"client_id": "optional_value",
"client_secret": "optional_value",
"refresh_token": "optional_value",
"access_token": "optional_value"
}
The config file for Bing Ads can be found here:https://github.com/openbridge/airbridge/blob/dev/config/reference/source/source_hubspot_source_spec.json
The config file for Hubspot can be found here:
airbridge/config/reference/source/source_hubspot_source_spec.json at dev · openbridge/airbridge
Catalog Configuration: Source and destination integration connector definitions
One of the critical components of Airbyte’s configuration is the catalog.json file, which defines the streams (i.e., tables, collections, or any other structured data) that a specific source connector can read from.
This file plays an integral role in the ETL (Extract, Transform, Load) process, as it defines the schema for the data that will be extracted from a source and loaded into a destination.
To find the catalog.json, navigate to the respective sources on GitHub.
For example, if you are interested in Chargebee, go to source-chargebee/integration_tests/. In that folder, you will find the configured_catalog.json.
Typically, no changes are needed to the catalog unless you are familiar with customizing this file. For most users, leaving it as-is would be sufficient. However, check the docs for each source to learn what changes are needed in the catalog.
Airbyte Connectors: Destinations
The term “ destination” in the context of Airbyte Integrations refers to location where source data will land. This could be a specific cloud database, data warehouse, data lake, or another storage platform.
Here are some of the more popular destinations;
- BigQuery: This is Google Cloud’s fully managed, low-cost analytics database. With the BigQuery destination connector, users can push data directly into BigQuery datasets.
- Amazon Redshift: Redshift is Amazon’s data warehousing solution that’s a part of the Amazon Web Services (AWS) ecosystem. Airbyte’s Redshift destination connector facilitates the transfer of data into Redshift clusters.
- Snowflake: Snowflake is a cloud-based data warehousing platform. The Airbyte Snowflake destination connector ensures data from various sources can be consolidated within Snowflake.
- Amazon S3: Amazon S3 (Simple Storage Service) is a scalable object storage service from AWS. With the Airbyte S3 destination connector, users can save data in various formats (like CSV, JSON, Parquet) directly into S3 buckets.
The following is an example config for Amazon Web Services S3:
{
"access_key_id": "optional_value",
"secret_access_key": "optional_value",
"s3_bucket_name": "required_value",
"s3_bucket_path": "required_value",
"s3_bucket_region": "required_value",
"format": {
"format_type": "optional_value",
"compression_codec": "optional_value",
"flattening": "optional_value",
"compression": {
"compression_type": "optional_value"
},
"block_size_mb": "optional_value",
"max_padding_size_mb": "optional_value",
"page_size_kb": "optional_value",
"dictionary_page_size_kb": "optional_value",
"dictionary_encoding": "optional_value"
},
"s3_endpoint": "optional_value",
"s3_path_format": "optional_value",
"file_name_pattern": "optional_value"
}
Airbyte Development Kits
While Airbyte offers a plethora of pre-built connectors, there will always be niche platforms or databases that aren’t covered out-of-the-box. For example, there is no open-source Hotjar connector built (yet). This is where customer development kits come into play.
Airbyte Custom Connector Development
You can create custom connectors tailored to your specific data sources or destinations using the provided development kit. Whether you’re looking to integrate with a proprietary system or a lesser-known platform, Airbyte CDKs are a path.
Connector Development | Airbyte Documentation
Airbyte Cloud: The Open Source Option
Although not every aspect of Airbyte qualifies as an open-source data integration platform, many of its source and destination connectors do. This mix of open-source and proprietary elements within its framework necessitates a clear distinction for users that want want an Airbyte Cloud deployment.
For those eager to utilize the open-source facets of Airbyte in cloud platforms such as AWS, Google Cloud, and Azure, recognizing the differences between these open-source and proprietary components is vital.
The following are two open source projects for running an open source version of Airtbyte in the cloud:
- GitHub - openbridge/airbridge: Airbridge: Configuration-Driven Airbyte Cloud Data Integration Pipelines
- tap-airbyte-wrapper - Meltano Hub
Airbyte Connectors: Data Source, Destination, and Catalog Configuration was originally published in Openbridge on Medium, where people are continuing the conversation by highlighting and responding to this story.
from Openbridge - Medium https://ift.tt/Zb270Lv
via IFTTT