Operators and Plugins

Operators

Operators form the core of Airflow tasks. They define a single unit-of-work within the Airflow model and are the gateway to customizing Airflow to run business logic. The operator library is a key component of what makes Airflow a compelling choice as an orchestration engine.

The Airflow community features hundreds of pre-built operators and subclassing operators is very simple.

The official airflow documentation goes into some depth of how operators work and can be found below:

At a high-level, operators are Python classes. When you want to customize an Airflow operator, we subclass the BaseOperator class from Airflow.

In this section of the awsairflowlib/operators/aws_glue_crawler_operator.py code (in the package you will download as part of the workshop), we import the BaseOperator from the Airflow library and use it to subclass our operator:

Airflow Glue Example

To provide the functional logic (business logic) that we want our operator to execute, we supply an execute method to our class:

Airflow Glue Example

This gives Airflow enough information to create a task for us whenever our operator is used in our DAG definitions.
You can also use custom operators in your DAG factories (covered in Advanced topics) allowing your data engineering team to provide low-level python code that can be used and graphed by business users who may have little or no Python experience.


Plugins

Plugins can be thought of as a set of operators, hooks, and sensors that we’d like to import and consider as a group. In the Lab guide, we zipped up the following objects:

Operators
  • aws_copy_s3_to_redshift.py
  • aws_glue_crawler_operator.py
  • aws_glue_job_operator.py
Sensors
  • aws_glue_job_sensor.py
Hooks
  • aws_glue_crawler_hook.py
  • aws_glue_job_hook.py

You will pass all of these plugins as a packaged .zip file to MWAA by specifying the location of the zip file in s3.