Audio deepfakes have recently been in the news, particularly in regards to their negative impacts, such as fraudulent robocalls pretending to be Joe Biden, encouraging people not to vote. These malicious uses could negatively affect political campaigns, financial markets, and lead to identity theft. However, Nauman Dawalatabad, a postdoc student at MIT, argues that deepfakes can also have potential upsides, which he discusses in an interview with MIT News.
Dawalatabad sheds light on the ethical considerations surrounding deepfakes, suggesting that the obscurity of the source speaker’s identity is essential for protecting individual privacy. He highlights that speech contains sensitive information such as age, gender, accent, health status, and even possible future health conditions. His recent research shows the possibility of accurately detecting dementia from speech, thus emphasizing the need for advancements in technology that prevent inadvertent disclosure of such private data.
The use of audio deepfakes in harmful activities like spear-phishing attacks calls for the development of effective countermeasures and detection techniques. Dawalatabad mentions two main approaches for detecting fake audio: artifact detection and liveness detection. The former involves identifying the inconsistencies or flaws in audio generated by AI models, while the latter is based on AI’s difficulty in perfectly replicating inherent qualities of natural speech like breathing patterns, intonations, or rhythms. Additionally, strategies such as audio watermarking can help trace the origin of the audio and prevent tampering.
Despite the potential misuse of deepfakes, Dawalatabad also outlines their potential benefits. For example, in the healthcare sector, deepfakes can be used to preserve the privacy of patients and doctors during cognitive health-care interviews. Furthermore, this technology can also assist people with speech impairments by restoring their ability to communicate effectively.
In terms of the future relationship between AI and audio perception, Dawalatabad has a positive outlook. He believes that the rapid advancement of research and development in this field holds significant promise. With insights derived from psychoacoustics, advancements in augmented and virtual reality, and the continual development of sophisticated models, the potential for generative AI models to revolutionize healthcare, entertainment, education, and more is immense. Thus, despite the risks posed by AI-generated deepfakes, Dawalatabad maintains that the positive trajectory of this research field holds significant potential to benefit society.