Scientists from Tel-Aviv University and The Open University in Israel have developed DiffMoog, the first comprehensive differentiable modular synthesizer. Designed for automating sound matching and replicating audio input, the synthesizer enhances the capabilities of machine learning and neural networks in sound synthesis.
The innovative DiffMoog presents an array of features commonly found in commercial synthesizers, such as modulation capacity, envelope shapers, filters, and low-frequency oscillators. It also integrates both frequency modulation (FM) and subtractive synthesis techniques. With its ability to adapt to neuron networks, DiffMoog makes sound matching an automatic operation, although the process of optimizing the synthesizer for accuracy still faces challenges.
Differentiable digital signal processing (DDSP) incorporates signal processing modules as differential operations in neural networks, enabling backpropagation, or the process of adjusting the internal parameters of the network. Differentiable methods have found value in audio effects applications, including automating DJ transitions and enhancing automatic multitrack mixing. DiffMoog’s differentiable and modular design therefore represents a significant advancement to, and expansion of the capabilities of such systems.
The research combined DiffMoog with an end-to-end sound-matching system via an open-source platform. The resulting system uses a signal-chain loss and an encoder network, contributing to the project’s novelty. Nevertheless, a few issues persisted, such as challenges in frequency estimation and the need for further optimization efforts. Notably, the study showed the effectiveness of the Wasserstein loss at estimating frequencies, a step away from other common approaches.
While the technology presents significant potential in sound synthesis, the researchers admit that accurately replicating sounds poses major hurdles. Proposing the use of the Wasserstein distance could potentially alleviate gradient difficulties in frequency estimation. Suggested further inquiry includes examining enhanced audio loss functions, alternative neural network infrastructures, and refining optimization techniques.
In sum, the DiffMoog synthesizer stands as a promising instrument advancing the field of differentiable synthesis and audio production. Accurately reproducing common sounds continues to be challenging, but potential solutions and approaches such as those suggested in the study may help to further this area of research.