Artificial intelligence (AI) and machine learning (ML) advancements are transforming the financial industry, enabling new use cases such as fraud detection and creditworthiness assessment. Access to large, disparate datasets, such as credit decision engines and customer transactions, is required for model development. The challenge is managing secure and compliant data access for data scientists working on these models. This data is typically stored in a centralized Amazon Simple Storage Service (Amazon S3) location.
Amazon S3 Access Points can simplify and secure data access at scale for applications using shared Amazon S3 datasets. Entities can create unique hostnames using access points to enforce different, secure permissions and network controls. S3 Access Points simplifies the management of application-specific permissions for shared dataset access. Access points can restrict access to virtual private clouds (VPCs), supporting secure data transfers, firewall implementation, and testing of new access control policies.
This blog post explains how to configure S3 Access Points to enable cross-account access from a SageMaker notebook. Two hypothetical account holders are used as an example: Account A, used by data scientists developing models using a SageMaker notebook, and Account B, which holds the necessary datasets in an S3 bucket.
The process, which should be repeated for each SageMaker account needing access to Account B’s shared dataset, involves three main steps: configuring Account A (including VPC, security group, and SageMaker notebook settings), configuring Account B (including S3 bucket, access point and bucket policy settings), and setting up AWS Identity and Access Management (IAM) permissions and policies in Account A.
Tests can be run to validate the solution, checking if objects could be successfully listed and accessed through the S3 access point. The setup can be deleted following tests to avoid incurring additional costs.
The blog thus demonstrates how S3 Access Points can enable cross-account access to large, shared datasets from SageMaker notebooks while managing access at scale.
Finally, the authors of the blog are introduced: Kiran Khambete, a Senior Technical Account Manager at AWS; Ankit Soni, a Principal Engineer at NatWest Group, and Kesaraju Sai Sandeep, a Cloud Engineer specializing in Big Data Services at AWS.