S3 Sensor

We will use the s3_prefix_sensor available in Apache Airflow to add a task in the DAG that waits for objects to be available in S3 before executing next set of tasks.

s3_sensor = S3PrefixSensor(  
  task_id='s3_sensor',  
  bucket_name=S3_BUCKET_NAME,  
  prefix='data/raw/green',  
  dag=dag  
)

With this, we have setup the data pipeline to be triggered when the files arrive in the data/raw/green folder in the S3 bucket.