This post explains how large language models (LLMs) can be fine-tuned to better adapt to specific domains or tasks, using Amazon SageMaker and MLflow. When working with LLMs, customers may have varied requirements such as choosing a suitable pre-trained foundation model (FM) or customizing an existing model for a specific task. Using Amazon SageMaker with MLflow can greatly simplify and streamline the workflow for running LLM fine-tuning and evaluation experiments. MLflow enables efficient experiment tracking, model versioning, and deployment, providing reproducibility. Meanwhile, SageMaker Pipelines can orchestrate the preparation and fine-tuning of data and evaluate model performance. The fine-tuning process may involve one or more experiment and requires multiple iterations with different combinations of datasets, hyperparameters, prompts and fine-tuning techniques. MLflow also helps in logging dataset information alongside key metrics to enable tracking and reproducibility of experiments across different runs. After the fine-tuning and evaluation steps, the best model can be selected, registered and deployed. The post also illustrates a step-by-step solution to model fine-tuning, evaluation and deployment, providing source code and detailed instructions. The LLM workflow using SageMaker and MLflow enables users to handle hundreds of experiments and easily track and compare model performance.