Ghana, currently ranking as the 27th most polluted country in the world, is addressing its critical air pollution problem through technological solutions. This includes the adoption of low-cost air sensor technologies evaluated and supported by the Sensor Evaluation and Training Centre for West Africa (Afri-SET). Afri-SET aids governmental and civil societies in managing air pollution. Recently, Tech to the Rescue, a non-profit organization, partnered with AWS to hold the world’s largest Air Quality Hackathon, where they developed solutions to combat air pollution.
This article highlights one of the top three solutions from the hackathon, which utilizes a generative AI to standardize air quality data collected from diverse low-cost sensors across Africa. Presently, Afri-SET must manually integrate data from various sensors produced by different manufacturers into a unified platform, which is a resource-intensive process. The ultimate aim, however, is to automate this data integration, with the broader goal of enabling scalability across West Africa. Some of the essential criteria for this solution include cloud hosting, automated data ingestion, format flexibility, data preservation, and cost-effectiveness.
The proposed solution employs Anthropic’s Claude 2.1 foundation model via Amazon Bedrock to generate Python codes, which then convert input data into a unified data format. Large Language Models (LLMs) are used to reason over text and generate code, transforming sensor data files that do not conform to a universal standard into a format usable for downstream calibration and analysis.
The solution reads raw data files, checks if the device type (or data format) is recognized, and retrieves and executes appropriate previously-generated Python codes to transform the data. If no code is available, running new code is created, reviewed for efficiency, and then stored in a repository for future use. The transformed data is then stored in Amazon S3 and can be published to OpenAQ, allowing other organizations to use the calibrated air quality data.
The proposed solution improves resource and cost efficiency by minimizing the need for LLM invocations, only invoking the LLM when a new data format is detected. It also includes a human-in-the-loop mechanism to ensure data ingestion only occurs when a new data format is detected, thereby preventing overloading of Afri-SET’s resources. Moreover, this solution reduces the time for data engineering work from months to days.
The data integration made possible by this solution enables expanded air quality monitoring, driving data-informed legislation and enabling community-led initiatives. Organizations like Afri-SET can leverage this technology to foster a cleaner and healthier environment. AWS technology is committed to addressing the issues of poor air quality through technological solutions.