The growing influence of artificial intelligence (AI) in large organizations presents crucial challenges in managing AI platforms. These challenges include developing a scalable and operationally efficient platform that complies with organizational compliance and security standards. Amazon’s SageMaker Studio offers a comprehensive set of capabilities for machine learning (ML) practitioners and data scientists. These capabilities include a fully managed AI development environment with an integrated development environment (IDE), simplifying the end-to-end ML workflow. Its collaborative capabilities like real-time co-editing and sharing notebooks within the team ensure smooth teamwork, while the scalability and high-performance training cater to large datasets. With built-in security, cost-effectiveness, and a range of pre-built tools, SageMaker Studio is a powerful platform for accelerating AI projects and empowering data scientists at every level of expertise.
Deutsche Bahn, a leading transportation organization in Germany, is a prime example of an organization leveraging SageMaker Studio. With operations across 130 countries and a workforce of over 300,000, Deutsche Bahn has been at the forefront of adopting AI, using SageMaker Studio as a key AI platform. A dedicated AI platform team manages and operates the SageMaker Studio platform, with multiple data analytics teams within the organization using the platform to develop, train, and run various analytics and ML activities.
The AI platform team’s main objective is to ensure seamless access to Workbench services and SageMaker Studio for all Deutsche Bahn teams and projects, with a focus on data scientists and ML engineers. The platform helps Deutsche Bahn realize a range of use cases, from railway maintenance and forecasting to future applications in generative AI.
The architecture at Deutsche Bahn consists of a central platform account managed by a platform team that oversees infrastructure and operations for SageMaker Studio. Resources are grouped by SageMaker domains, each consisting of an associated Amazon Elastic File System (Amazon EFS) volume, a list of authorized users, and various security, application, policy, and Amazon Virtual Private Cloud (Amazon VPC) configurations. From an infrastructure perspective, the VPC has no outbound internet connectivity to ensure security and compliance. For high availability, multiple identical private isolated subnets are provisioned.
In addition to the SageMaker domain, a customized AWS Identity and Access Management (IAM) role, Amazon Simple Storage Service (Amazon S3) bucket, customer-managed key, and other AWS resources are provisioned during the deployment process. Client separation is implemented at the level of SageMaker domains using IAM authentication mode. A domain-specific IAM role is attached to each domain, granting data scientists the ability to perform various activities such as running processing jobs, tuning jobs, and creating models.
In conclusion, Deutsche Bahn effectively used SageMaker Studio to revamp its AI platform, resulting in a scalable, automated, and manageable solution to support its diverse data analytics teams. This infrastructure features a central platform account, a self-service domain ordering process, and infrastructure provisioning using AWS CDK. It has empowered Deutsche Bahn to construct a robust platform for their AI initiatives, catering to over 100 developers and managing 20 SageMaker domains within a single AWS account.