Audio deepfakes, although often associated with unethical practices, have potential uses that can benefit society, suggests postdoc Nauman Dawalatabad in a Q&A with MIT News. He highlights the need for technology that protects sensitive information held within speech patterns, such as age, gender, and health conditions, stating that obscuring the speaker’s identity in audio deepfakes could be both a technical challenge and a moral obligation for privacy.
Despite the scalability of audio deepfakes in spear-phishing attacks leading to misinformation spread, identity theft, and content alteration, strategies exist to tackle their misuse. One such strategy is fake detection, which can be artifact detection, i.e., identifying anomalies in the generated signal, or liveness detection, which capitalizes on the nuances of natural speech that AI can’t replicate accurately. Additionally, audio watermarking, which introduces encrypted identifiers within the original audio to trace its origin and deter alteration, offers a proactive defense against deepfake misuse.
Highlighting potential benefits, Dawalatabad suggests that beyond its use in entertainment, audio deepfakes hold promise for healthcare and education sectors. In healthcare, anonymizing patient and doctor voices in interviews could facilitate the global sharing of crucial medical data for research while ensuring privacy. For people with speech impairments, voice restoration using deepfake technology could provide hope, enhancing their communication abilities and quality of life.
Expressing optimism about audio generative AI models, Dawalatabad points at the possibilities of audio experiences at an unprecedented realism through augmented and virtual reality. The sheer innovation and speed of research and development in the field promise to expand and refine these technologies, contributing significantly to sectors such as healthcare, entertainment, and education, showcasing the positive trajectory of audio deepfakes amidst its inherent risks.