No-Code DAGs (DAG Factory)

This section references code that is available in the Advanced Topics main page under DAG Factory Setup.

One key advantage to Airflow is the robust set of open-source extensions and operators that exist within the Airflow community. One of the most popular use-cases for Airflow is in automating the ingest-transform-load/move pattern for onboarding data. Many enterprises are looking for a no-code way solution that gives their business users the ability to configure and run their own pipelines.

The Airflow community has many solutions to this problem and one popular method is via the dag-factory method.

We will be using the dag-factory library for this example. Check out the library’s github here

The model is simple:

  1. Define DAGs as yaml files
  2. Create boilerplate DAG.py files that build DAGs based on the source yaml configuration.

DAG Factory Setup

We will need some simple code to run our examples and an environment to test in. You can use the Airflow Environment and S3 bucket that you created in the lab, or create a new one ny following the steps in the link here.

There is some example code to put into your S3 bucket which you can grab here

Unzip the package and note the directory structure:

dags/
    example_dag_factory.py
    example_dag_factory.yml
    print_hello.py

You will need to add these dags to any existing dags folder in your S3 bucket. If you already have a requirements.txt, add this line to that file. If not, then you can copy the requirements.txt file in the zip to the requirements folder in your S3 bucket

dag-factory==0.7.2

If you edit the requirements.txt, you will need to update your Airflow instance.
To perform an update, go to the Airflow console, select the instance, Click on Edit, select the new version of the requirements file, click on Next , Next and Save.
The airflow instance update will take 5-10 mins to complete.

This will include the dag-factory python package when your DAGs are discovered and run by the scheduler.

Your dags/ prefix should look like (assuming you don’t have other DAGs):

S3 Bucket

And finally your MWAA S3 config should look like:

S3 Bucket


Let’s take a look at our example files

dags/
    example_dag_factory.py
    example_dag_factory.yml
    print_hello.py

We’ll explore these files in depth in the python and yaml sections. For now, note the location of these files in relation to each other.
This is an important location to note when including your own airflow code. Many libraries require an absolute path to the dags folder itself and on AWS Managed Airflow, this folder is given by the path:

/usr/local/airflow/dags/

We will see this path as we explore the 2 files (config and python) in the following sections. To see the DAGs in action, explore the Admin UI section.