Nauman Dawalatabad, a postdoctoral researcher discusses the concerns and potential benefits of audio deepfake technology in a Q&A with MIT News. He addresses ethical considerations regarding the concealment of a source speaker’s identity in audio deepfakes, noting that speech contains a wealth of sensitive personal information beyond identity and content, such as age, gender and health indications. The need to protect this private data in our digital world is not just a technical challenge, but a moral obligation, he states.
Dawalatabad also explores the use of audio deepfakes in spear-phishing attacks and how to counteract the associated risks. The misuse of deepfake technology ranges from disinformation and fake news to identity theft, privacy infringement and malicious content alteration. With the ease and low cost of generating deepfake audio online, there is a pressing need to maintain robust countermeasures. Detecting fake audio generally falls into two broad categories: artifact detection and liveness detection. While the former identifies anomalies introduced by generative models, the latter leverages the natural qualities of human speech that AI finds difficult to replicate accurately. Additionally, audio watermarking provides an encrypted identifier in the audio to trace its origin and prevent tampering.
However, Dawalatabad emphasises not to overlook the significant potential for audio deepfake technology to provide positive benefits in various sectors, particularly healthcare and education. He mentions how his ongoing work involves anonymizing patient and doctor voices in cognitive healthcare interviews to globally share critical medical data for research while preserving patient privacy. Moreover, the potential to restore voices opens new doors for people with speech impairments, enhancing their ability to communicate and improving their quality of life.
Looking ahead, the postdoc is optimistic about the relationship between AI and audio perception, particularly in the field of psychoacoustics and how the perception of sound can elevate augmented and virtual reality experiences. He concludes with the notion that despite the risks, the capacity for audio-generative AI models to transform sectors like healthcare, entertainment and education points towards a positive trajectory of the technology’s development. With continuous advancements, it’s suggested there’s potential for the technology to become more refined and have more expansive applications for societal benefit.