Audio deepfakes, or AI-generated audio, have lately been in the limelight due to harmful deception applied by ill-intentioned individuals. Cases such as robocalls impersonating political figures, spear-phishers tricking individuals into revealing personal information, and actors misusing technology to preserve their voices have surfaced in the media. While these negative instances have been widely publicized, MIT News’ interview with postdoc Nauman Dawalatabad, has emphasized the benefits and potential opportunities that audio deepfakes could offer.
One of the significant ethical considerations surrounding audio deepfakes is the importance of protecting the privacy of the source speaker. Besides revealing one’s identity, speech may contain sensitive data like age, health condition, gender, and accent. According to Dawalatabad, it’s necessary to develop technologies that prevent unauthorized disclosure of such information, thereby respecting an individual’s privacy in this digital age.
The challenges posed by audio deepfakes, especially in spear-phishing attacks, are substantial due to the ease with which deepfake audio can be created. Nevertheless, the development of artifact detection and liveness detection systems may aid to counter such attacks. Artifact detection involves picking up unusual changes when the AI generates sound, although this will soon be limited as machines become more competent. Liveness detection leverages the unique attributes of natural speech that AI struggles to emulate. Additionally, methods like audio watermarking help authenticate real audio while deterring tampering attempts.
Besides the potential misuse of audio deepfakes, they can be extraordinarily beneficial across different sectors, including healthcare and education. For instance, the anonymization of patients’ and doctors’ voices in mental health interviews can foster global information sharing while maintaining privacy. This technology can also provide hope to people with speech disabilities.
Dawalatabad envisions this pervasive technology era having a profoundly positive impact on society. He predicts that AI will continue to advance our experiences of audio perception, particularly through the study of psychoacoustics – the examination of how humans perceive sound. He also envisages the rapid expansion and refinement of AI models, promising breakthrough innovations in augmented and virtual reality, thereby expanding their applications in sectors like healthcare, entertainment, and education. Despite the potential threats, the prospect of AI revolutionizing numerous domains emphasizes the value of ongoing research into these technologies.