Next, we will import the job script into Glue via the AWS Console. To complete this activity, follow the below set of steps -
Edit the Glue job script nyc_raw_to_transform.py (previously downloaded)
to set the bucket name to
airflow-yourname-bucket in the data sink (last) step.
If you haven’t downloaded the scripts previously, you can downloaded the package from here
Copy the modified job script to S3 path s3://airflow-yourname-bucket/scripts/glue/.
You will need to create the
glue folder under
scripts if it wasn’t performed during setup.
Login to the AWS Glue Console
Add Job by providing the following details
Using the custom Glue operator and hook, the DAG task can be written as shown below to invoke an existing Glue job.
glue_task = AWSGlueJobOperator( task_id="glue_task", job_name='nyc_raw_to_transform', iam_role_name='AWSGlueServiceRoleDefault', dag=dag)
The above task is going to run the Glue Job and wait for completion to trigger the next step in the data pipeline.