In this module we will learn how to use Amazon managed workflow for Apache Airflow (MWAA) to develop Machine Learning (ML) workflows or pipelines. ML workflows orchestrate sequence of tasks like data collection, transformation, training, testing, and evaluating a ML model to achieve a business outcome.
Business requirement: Build a model to predict the fare amount for a taxi ride in New York City.
In this exercise we’ll be using the same dataset “NYC taxi ride” and MWAA environment which we created and used earlier in the workshop.
We’ll start by exploring the data, transforming the data, and training a model on the data. We’ll fit the model using an Amazon SageMaker managed training cluster. We’ll then deploy to an endpoint to perform batch predictions on the test data set. All of these tasks will be plugged into a workflow that can be orchestrated and automated through MWAA integration with Amazon SageMaker.
Diagram below shows the ML workflow we’ll implement
The workflow performs the following tasks:
Amazon SageMaker operators are custom operators available with Airflow allowing it to talk to Amazon SageMaker and perform the following ML tasks:
Okay, with that background, it’s time to Build!!