In June 2024, AI organization Databricks made three major announcements, capturing attention in the data science and engineering sectors. The company introduced advancements set to streamline user experience, improve data management, and facilitate data engineering workflows.
The first significant development is the new generation of Databricks Notebooks. With its focus on data-focused authoring, the Notebook offers a user-friendly interface and features for improved data analysis. Upgrades include the integration of AI assistance, performance-enhancing tools, and new functionalities like the Results Table and Python features that increase efficiency.
Accompanying these updates is the introduction of AI-Powered Authoring, which will further optimize functionality and foster more effective coding practices. The Inline Assistant, Side-Panel Chat, and Assistant Autocomplete features incorporate AI capabilities into Notebook cells to help users write codes accurately and swiftly.
Predictive Optimization constitutes the second development. It aims to optimize data layout to enhance query performance automatically and reduce storage costs. The AI-powered tool evaluates data layout, table properties and performance to determine the best means to optimize data layouts, and this process is continuously updated to accommodate changing data usage patterns.
Piloting this optimization tool, energy company Plenitude experienced a 26% reduction in storage costs, globally recognized company Anker reported a 2x improvement in query performance, and AI data annotation platform, Toloka AI, found it more efficient and cost-effective.
Databricks promises further enhancements to the Predictive Optimization tool and aims to enable it by default across all Unity Catalog-managed tables for automatic and efficient storage and data layout optimization.
The company’s third announcement was Databricks LakeFlow, a comprehensive solution designed to improve data pipeline management. This offering includes LakeFlow Connect, LakeFlow Pipelines, and LakeFlow Jobs – each serving different functionality to ensure smoother operations.
LakeFlow Connect focuses on reliable and efficient data transfer from operational databases to lakehouses, while LakeFlow Pipelines applies the Delta Live Tables framework for efficient business logic writing, data orchestration and incremental processing. LakeFlow Jobs, on the other hand, take care of orchestrating and monitoring production workloads for data ingestion, pipelines, notebooks, SQL queries, machine learning training, model deployment and inference.
Databricks LakeFlow integrates AI-powered intelligence, robust data governance, and serverless computing support through the Databricks Data Intelligence Platform. These enhancements are expected to significantly benefit data-driven professionals and reinforce Databricks’ position as a leader in the field.