Cloud computing, including big data and machine learning (ML) tools like Amazon Athena and Amazon SageMaker, are becoming increasingly accessible and feasible to use for businesses in multiple industry sectors. This advancement is influencing a shift in resource efficiency by promoting data analytics and data-driven decision-making in operations, predictive maintenance, and planning. However, the rapid evolution of IT has created challenges, particularly in terms of skillsets, where analysts might lack data science tooling expertise, and data scientists could be deficient in interpreting domain-specific data.
To help address these issues, Amazon introduced SageMaker Canvas. This tool offers domain experts a no-code interface to develop and deploy powerful analytics and ML models. This blog post aims to instruct users on how to manipulate SageMaker Canvas to select appropriate features in their data, and subsequently train a predictive model for anomaly detection.
SageMaker Canvas focuses on business use cases like forecasting, regression, and classification. The aim of this instruction is to identify abnormal data points, particularly in industrial machines, as most machine data (like temperature readings) tends to describe normal operations and is less valuable for decision making. By training the system to recognize relevant abnormal data, the model can predict malfunctions or unusual operations, enabling engineers to be informed of potential faults or improvements in advance.
Four key steps are involved in this process:
1. The domain expert creates the initial model, including data analysis and feature curation using SageMaker Canvas.
2. The same expert shares the model through the Amazon SageMaker Model Registry or deploys it directly as a real-time endpoint.
3. An MLOps expert then creates the inference infrastructure. This includes translating the model output from a prediction into an anomaly indicator and developing a code that runs inside an AWS Lambda function.
4. A serverless application requiring an anomaly detection calls the Lambda function to provide the response.
The modeling process in SageMaker Canvas is broken down into several steps:
1. The domain expert uploads relevant data.
2. The expert selects columns containing characteristic measurements to be used in the final model.
3. The domain expert uses SageMaker Canvas to conduct data transformations.
4. The model is trained, tuned, and evaluated.
5. The model is deployed as an endpoint by the expert.
The author of this blog post, Helge Aufderheide, encourages more domain experts to experiment with ML models using existing knowledge and without an additional requirement for training in data science. SageMaker Canvas is currently offered with a 2-month free usage tier and pay-as-you-go pricing after the complimentary period, making it accessible and cost-effective to many businesses.