Skip to content Skip to footer

Generative AI, in the past few years, has gained significant popularity because of its capacity to develop realistic text and images. However, the created text and images form only a portion of the data generated today. Every interaction we have with a medical system, software application, or the effect of any environment, such as a storm on a flight, results in data creation. Utilizing generative AI to synthesize data around these scenarios aids organizations in effectively managing them – particularly when the available data is limited or sensitive.

DataCebo, an MIT spinout, has been doing just that. Offering a software system named the Synthetic Data Vault (SDV) that generates synthetic data for organizations that can be used to test software applications and train machine learning models. The data scientists use this open-source library to generate synthetic tabular data.

The reason for its popularity has been its revolutionary approach towards software testing. It can create synthetic data that matched the statistical properties of real data, which helps organizations to use this data instead of sensitive information to check the performance of new software. Moreover, SDV has been used to synthesize medical records or create flight simulator to predict health outcomes for patients with cystic fibrosis or for airlines to plan for rare weather events.

The SDV also offers valuable assistance in software testing department. Developers generally write scripts manually to create synthetic data. With generative models offered by SDV, developers can learn from the collected sample data, then sample a large volume of synthetic data, or create particular scenarios to test the applications. This process not only saves time but also ensures privacy and regulation compliance in dealing with sensitive data.

DataCebo’s next step is to advance the field of synthetic enterprise data, that are data generated from user behavior on large companies’ software applications. This type of data is not universally available and using DataCebo’s software helps in learning about the unique patterns, which in turn assist in refining algorithms.

DataCebo also recently added new features to SDV such as tools to assess the “realism” of the generated data called the SDMetrics library, and a way to compare models’ performances, the SDGym. The company believes in the transformative power of synthetic data from generative models that will make all data work more transparent and responsible. According to them, almost 90% of enterprise operations can be performed with synthetic data.

Leave a comment

0.0/5