Quickly run configuration driven Airbyte connectors via Airbridge, an open source, Docker based service.
Earlier this year the Meltano community released the tap-airbyte-wrapper. This is what open source is all about! The Meltano Airbyte wrapper takes advantage of Docker, translating output from a source into a Singer-compatible format. This is a perfect fit for someone invested in the Meltano ecosystem.
While we were inspired by the effort, there was one hitch for us. We wanted something that works for data-curious spelunkers, data engineers, and developers so they can explore configuration-driven data pipelines without relying on Meltano.
Keeping things lean and simple, with a focus on the true open source aspects of Airbyte, was important to us. As a result, we built Airbridge as a pure Python+Docker integration tool that orchestrates data ingestion pipelines from Airbyte data sources like Stripe, Facebook, or Google to Airbyte data destination connectors like a S3 data lake, Redshift, Snowflake, or BigQuery.
Project Framework Approach
While Airbridge shares a common core (Python, Docker…) with Meltano, unlike the Meltano wrapper, Airbridge does not require using the Meltano SDK. Also, unlike Meltano tranforming Airbyte output to the Singer spec, Airbridge retains the raw data source output in an unmodified state. Nothing against Singer formatted outputs, we just wanted to keep the Airbyte output in raw form.
The following are a collection of the guiding principles we set forth at the start of the work effort;
🐳 Use Docker-Driven Workflows: Airbridge must be Docker-centric methodology that capitalizes on ready-to-go source and destination Docker images accessible via Airbyte Docker Hub. This facilitates a streamlined setup procedure, directing focus where needed most: crafting seamless data flows.
🐍 Leverage Python: Anchor the work in the well-traveled and standard-based Python framework. Airbridge offers a clean, swift, and modular data flow, fostering easy setup and modification.
🔗 Seamless Airbyte Sources and Destinations Integration: Airbridge must orchestrate the essential resources to seamlessly sync source connectors to data destinations.
🔄 Automated State Management for Hassle-Free Operations: Provide the ability for automated state tracking for each run. Use simple yet potent solution to ensure uniformity and dependability in tasks, curbing errors, and conserving resource time.
🔓 Open-Source and Community-Driven: Airbridge must be released under the MIT license.
📦 No Bloat, Just Efficiency: No unnecessary wrappers; Airbridge should concentrate on delivering a lean, efficient tool that caters to the core needs of developers, fostering seamless data integration pipelines.
Getting Started: Where can I deploy Airbridge?
Deploy it in the cloud, locally, or somewhere in between. Setting up Airbridge in your cloud computing environment — be it AWS, Azure, or Google Cloud, is a breeze.
Before you begin, ensure that you have the following prerequisites installed on your system:
- Python: Verify or install it using the command: python --version
- Git: Verify or install it using the command: git --version
- Docker: Verify or install it using the command: docker --version
If any of the above software is not installed, please install them before proceeding.
Step 1: Create a local project directory and navigate into it:
mkdir airbridge_project && cd airbridge_project
Step 2: Clone the project from Github and navigate into the project directory:
git clone https://github.com/openbridge/airbridge.git && cd airbridge
Step 3: Set up a virtual environment (optional but recommended):
python -m venv venv
source venv/bin/activate # On Unix or MacOS
# .\venv\Scripts\activate # On Windows
Step 4: Install the project dependencies. There is a pyproject.toml file, so you can use poetry to manage the dependencies. First, install poetry using the following command:
pip install poetry
Now, install the project dependencies using poetry:
poetry install
Step 5: Start the Docker daemon if it is not already running. You can do this via the Docker desktop application or by using the following command, depending on your OS:
systemctl start docker # You might need sudo privileges
Step 6: Once you have your Airbyte source and destination configs ready, you can run Airbridge. Here is an example command using Stripe as a source and Amazon S3 as a destination:
poetry run main -i airbyte/source-stripe -w airbyte/destination-s3 -s /airbridge/env/stripe-source-config.json -d /airbridge/env/s3-destination-config.json -c /airbridge/env/stripe-catalog.json -o /airbridge/tmp/mydataoutput/
Explanation of the command parameters:
- -i: Input source (Stripe in this case)
- -w: Destination (Amazon S3 in this case)
- -s: Path to the source configuration file
- -d: Path to the destination configuration file
- -c: Path to the catalog configuration file
- -o: Output directory
Comprehensive Airbyte install guides are at your disposal, guiding you to get your Airbyte connector and data destinations up and running. Visit the project here:
License
Airbridge is released under the MIT license. Access the code and kickstart your efforts.
Community
For enthusiasts eager to delve deeper, the Airbridge and Airbyte GitHub spaces are where developers can discover a treasure trove of resources, contribute to the projects, or build knowledge from the community. It’s a platform where your expertise, novice or veteran, finds appreciation and acknowledgment.
As with most efforts like these, we cordially invite you to partake in a collaborative expedition as the effort moves forward. Community perspectives, feedback, and involvement are crucial in sculpting a tool that aligns with an open source projects aspirations.
Frequently Asked Questions (FAQs)
What does Airbyte do?
Airbyte is a data integration platform that facilitates the extraction, transformation, and loading (ETL) of data from various sources to destinations of your choice. It helps automate the data pipeline creation process, allowing businesses and developers to focus on deriving insights from the data rather than grappling with data sync challenges.
How does Airbyte work?
Airbyte operates through a series of connectors that enable pipelines between different data sources and destinations. These connectors are pre-configured to work with specific platforms, simplifying the data extraction and loading process. Airbyte also offers a graphical user interface (UI) where users can configure and manage their data pipelines easily. Under the hood, it utilizes Docker containers to manage and isolate different data connectors, ensuring a robust and scalable data solution.
What is Airbyte used for?
Airbyte is primarily used for data integration and ETL processes. It allows organizations and developers to create data pipelines easily, facilitating data movement from various sources, like databases, APIs, and flat files, to different destinations, including data warehouses, data lakes, or other business intelligence tools. This helps consolidate data in a centralized location, making it easier to analyze and derive valuable business insights.
Is Airbyte open-source?
Yes, parts of Airbyte are open-source. Some features of Airbyte operate under the MIT license, allowing users to use, modify, and distribute the software freely. Airbridge focuses on those parts of Airbyte that are open source under the MIT license. Open source fosters a community where developers can contribute to the project, enhancing its functionality and robustness. Airbridge source code is publicly available on GitHub, encouraging transparency and continuous improvement.
Airbridge will not support any parts of Airbyte that fall under, as the Open Source Initiative says, a “fauxpen” source license.
Does it require the Airbyte UI?
Using the Airbyte UI is not needed. The Airbyte UI offers a user-friendly environment where you can configure data connectors, set up data pipelines, and monitor the status of your data integrations. However, Airbridge allows configuring and managing data pipelines without using the UI for users comfortable with command-line operations.
What is the Airbyte Pricing for running open source?
Airbyte Cloud costs $2.50 per credit to run. If you replicate 100M rows of data per month, Airbyte will cost over $14,000 for the year. However, if you run Airbyte in a self-managed context, your costs are infrastructure and management time. You just need to make sure you are compliant with Airbyte licenses.
Discover Airbridge: Configuration-Driven Airbyte Cloud Data Integration Pipelines was originally published in Openbridge on Medium, where people are continuing the conversation by highlighting and responding to this story.
from Openbridge - Medium https://ift.tt/kPW5n6u
via IFTTT