Add Dependencies

Before we add new dependencies, let’s check if our MWAA environment successfully connected to CodeArtifact repository to install packages that we defined in requirements.txt.

If you take a closer look at requirements.txt, the first line points to codeartifact.txt that now should contain the correct --index-url to our private PyPi repository in CodeArtifact. This --index-url was generated by Lambda function during the deployment. It tells pip to install packages from CodeArtifact repository, which is in our case numpy library.

-r /usr/local/airflow/dags/codeartifact.txt
numpy==1.20.3

Navigate to MWAA in AWS Console, open mwaa_codeartifact_env environment and navigate to Monitoring section. We will now check Airflow scheduler logs to confirm that it connected to CodeArtifact repository and installed numpy.

DAG Pause

From the scheduler logs we can see that it successfully connected to CodeArtifact repository and installed numpy library.

DAG Pause

You can also open Airflow UI and run example_dag DAG that prints a simple matrix with numpy:

# mwaa-ca-content-bucket/dags/tutorial.py

import numpy as np
...
def example_with_numpy(**kwargs):
    print(np.zeros(5))

Add new Python dependencies

To install preferred Python dependencies to your MWAA environment, update the requirements.txt file and upload it to S3 bucket. To make these changes take effect, you will need to update your MWAA environment by selecting a new version of requirements.txt. You can do so in AWS Console or via AWS CLI.

Upload requirements.txt with new Python dependencies:

aws s3 cp mwaa-ca-bucket-content/requirements.txt s3://YOUR-BUCKET-NAME/

To get requirements.txt versions run:

aws s3api list-object-versions --bucket YOUR-BUCKET-NAME --prefix requirements.txt

Finally, update your MWAA environment with a new version of requirements.txt:

aws mwaa update-environment --name mwaa_codeartifact_env --requirements-s3-object-version OBJECT_VERSION

If you build your own Python packages, you could also add this process to update requirements.txt and MWAA environment as part of your release pipeline.

Clean up

To destroy all resources created for this project execute the destroy rule:

# from the root directory

$ make destroy

AWS CDK CLI will ask for your permissions to destroy the CDK stacks. When asked, please acknowledge with y and press Enter.


Conclusion

In this module, we created a private PyPi repository with AWS CodeArtifact with a connection to external source. Then we provisioned a private MWAA environment without a connection to public internet and leveraged VPC Endpoints to connect to AWS CodeArtifact. We also created a Lambda function to keep the authorization token for our private PyPi repository up to date.