Meta has developed a machine-learning (ML) model to improve the efficiency and reliability of real-time communication (RTC) across its various apps. Developing this ML-based solution is an answer to the limitations of existing bandwidth estimation (BWE) and congestion control methods, such as the Google Congestion Controller (GCC) used in WebRTC, which relies on hand-tuned parameters and struggles to optimize user experiences across various network conditions.
The existing GCC-based BWE module at Meta is complex, having multiple parameters and actions that rely heavily on the specific network conditions. Implementing plurality of parameters, for example, can often result in a trade-off between quality and reliability; enhancing one can compromise the other.
To remedy this, Meta’s ML model uses two key components: offline ML model learning and parameter tuning which are designed to streamline network performance. The model uses time-series data from actual production calls and simulations, which allows it to recognize different network conditions and optimize parameters accordingly. This data, which combines Long Short-Term Memory (LSTM) layers for processing sequential data and dense layers for non-time-series data, enables the model to classify network environment accurately and adapt its behavior in response.
Integrating machine learning into this system has brought significant improvements in reliability and quality metrics across different network conditions. One example is the model’s capacity to predict congestion under low-bandwidth scenarios. This ability enables real-time optimization, preventing issues like video freezing or dropped connections. The model also detects and makes adjustments for random packet loss, bolstering network resilience.
Further improvements have been seen in user experience, in part, due to a reduction in connection drop rates. Video quality has also improved, with a noteworthy decrease in the frequency of video freezing. These enhancements signify the advantage of machine learning over traditional hand-tuned rules, particularly when it comes to monitoring, targeting, and updating network conditions swiftly and efficiently.
In conclusion, Meta has made significant strides in addressing the challenges of bandwidth estimation and congestion control within real-time communication applications. The ML-based approach, using time series data and offline parameter tuning, has shown substantial improvements in reliability, quality, and user engagement metrics across a range of network conditions. However, the success of this solution heavily lies on the quality of data used and its accurate labelling, underscoring the vital role of accurate training data for much-enhanced results. This ground-breaking solution points to a future in which Machine Learning may be used to effectively and efficiently address a diverse range of network challenges.