Private PyPi repository

In this module we will learn how to set a private PyPi repository with Amazon MWAA. We will use AWS CodeArtifact for our code repository. This will also enable you to avoid providing MWAA with an internet access via NAT Gateway to install required dependencies and hence reduce the cost of overall infrastructure. You will also be able to leverage AWS CodeArtifact repository to publish your private libraries.

Solution

The solution that we will deploy includes the following AWS services:

AWS Lambda runs every 10 hours to obtain the authorization token for AWS CodeArtifact. This token is then used to create an index-url for PyPi remote repository in CodeArtifact. Generated index-url is saved to codeartifact.txt file that is then uploaded to an Amazon S3 bucket. MWAA fetches DAGs and codeartifact.txt at the runtime, connects to CodeArtifact repository and installs Python dependencies.

Diagram below shows an architectural overview of this solution:

Sagemaker Pipeline

Now, let’s build this!