S3

Setting up the S3 Buckets

  1. Go to S3 Console

  2. Click on Create bucket

  3. Enter a name for the bucket - airflow-yourname-bucket

    Replace yourname in the bucket name mentioned above with a short easy-to-remember version of your name/initials.

  4. Leave all other settings as-is and Click on Create bucket

  5. Click on the newly created bucket from the S3 List

    Note down the Amazon Resource Name (ARN)

  6. Click on Create folder

  7. Enter dags for the Folder name, and click on Create folder

  8. Click on Create folder once more

  9. Enter data for the Folder name, and click on Create folder

  10. Click on Create folder again

  11. Enter scripts for the Folder name, and click on Create folder

    • Under the scripts folder, Create two more folders - glue and emr
  12. Download the Plugins that will be used later in the data pipeline from here

  13. Click on Create folder again

  14. Enter plugins for the Folder name, and click on Create folder

  15. Upload the downloaded zip file awsairflowlib.zip to the plugins folder in your S3 bucket

  16. Create a new file called requirements.txt on your local machine, and edit the file to include the below lines of code

psycopg2
fsspec
s3fs
pandas
sagemaker==v1.72
dag-factory==0.7.2

  1. Click on Create folder once more

  2. Enter requirements for the Folder name, and click on Create folder

  3. Upload the requirements.txt to the requirements folder in your S3 bucket

After all the folders are setup, you should have the below structure in the S3 console

Amazon S3 setup

Next step, we will proceed to create the Managed Apache Airflow instance.