Machine learning models are widely used today in smart devices like smartphones, with diverse applications like autocorrecting keyboards or improved chatbot responses. However, fine-tuning these models requires considerable computational resources and transfers of data to and from cloud servers – which can pose both energy and security issues. The team of researchers from MIT and MIT-IBM Watson AI Lab has developed a technique that allows these models to be updated directly on ‘edge devices’ (like smartphones), rather than requiring cloud interaction. They call this technique PockEngine.
In essence, PockEngine identifies which elements of a machine learning model require updating to improve accuracy, and focuses computational resources solely on these. This not only reduces the computational demand but also speeds up the updating process significantly – up to 15 times faster on some hardware. Moreover, their method doesn’t compromise the accuracy of the models and, in fact, improves the quality of complex responses from AI chatbots.
The way machine learning models work is similar to how the human brain operates, with ‘neurons’ processing data to make predictions. As information is passed through layers of these neurons, the network reaches a prediction or outcome – much like how different parts of the brain will process an image before you recognise what you’re looking at. Updating these models is necessary to keep them accurate and responsive, but it is usually a heavy computational task due to the cyclical process of feedback and updating required for machine learning.
However, PockEngine optimises this process by recognising that not all neurons, or even whole layers of neurons, need to be updated. The system measures the accuracy improvement contributed by each layer and autonomously determines what percentage of it needs to be updated.
Further adding to its efficiency, PockEngine completes its computations during the model’s ‘compiling time’ – that is, while the model is being prepared for use. This reduces the computational load during its actual use, improving the system’s overall speed and efficiency. As a result, the performance of PockEngine outstrips other models, operating up to 15 times faster without compromising on the accuracy.
When applied to popular AI model Llama-V2, PockEngine’s benefits were evident. The system correctly answered a complex question that a non-updated model could not, and the time taken for each iteration of the update process was reduced from seven seconds to less than one second. This breakthrough has far-reaching implications for the efficiency of AI use and for potential advancements in AI-driven, edge device functions, like voice and image recognition, reaching even wider applications across numerous industries.
As well as assisting chatbots and autocorrect features, this approach could potentially reduce the cost of maintaining and optimising more extensive AI models on cloud servers. The valuable contribution of PockEngine in tackling efficiency challenges in AI could be of great interest to companies across different sectors, including those using AI in cloud-based applications. Efforts such as these are backed by supporters like the MIT-IBM Watson AI Lab, the MIT AI Hardware Program, the MIT-Amazon Science Hub, the National Science Foundation, and the Qualcomm Innovation Fellowship.