Today, we’re pleased to announce Amazon SageMaker Inference Recommender — a brand-new Amazon SageMaker Studio capability that automates load testing and optimizes model performance across machine learning (ML) instances. Ultimately, it reduces the time it takes to get ML models from development to production and optimizes the costs associated with their operation.
Until now, no service has provided MLOps Engineers with a means to pick the optimal ML instances for their model. To optimize costs and maximize instance utilization, MLOps engineers would have to use their experience and intuition to select an ML instance type that would serve them and their model well, given the requirements to run them. Moreover, given the vast array of ML instances available, and the practically infinite nuances of each model, choosing the right instance type could take more than a few attempts to get it right. SageMaker Inference Recommender now gives MLOps engineers recommendations for the best available instance type to run their model. Once an instance has been selected, their model can be instantly deployed to the selected instance type with only a few clicks. Gone are the days of writing custom scripts to run performance benchmarks and load testing.
For MLOps engineers who want to get data on how their model will perform ahead of pushing to a production environment, SageMaker Inference Recommender also lets them run a load test against their model in a simulated environment. Ahead of deployment, they can specify parameters, such as required throughput, sample payloads, and latency constraints, and test their model against these constraints on a selected set of instances. This lets MLOps engineers gather data on how well their model will perform in the real world, thereby enabling them to feel confident in pushing it to production—or highlighting potential issues that must be addressed before putting it out into the world.
SageMaker Inference Recommender has even more tricks up its sleeve to make the lives of MLOps engineers easier and make sure that their models continue to operate optimally. MLOps Engineers can use SageMaker Inference Recommender benchmarking features to perform custom load tests that estimate model performance when accessed under load in a production environment given certain requirements. Results from these tests can be loaded with either SageMaker Studio or the AWS SDK or AWS CLI, giving the engineers an overview of model performance, comparisons of numerous configurations, and the ability to share the results with any stakeholders.
Find Out More
Get started with Amazon SageMaker Inference Recommender through Amazon SageMaker Studio, AWS SDKs and CLI. Amazon SageMaker Inference Recommender is available in all AWS commercial regions where SageMaker is available except the AWS China Regions.