Signal-based audio synthesis methods like the one described in [1] or [2] have seen a regain of interest recently after the introduction of the Differential Digital Signal Processing (DDSP) package [3]. It resort in using a neural architecture to produce control parameters for a signal based synthesis method. During this internship, a study of DDSP based audio generative model will be performed. After a comprehensive study of DDSP inspired architectures on piano and drums datasets [6,7], the students will extend the work proposed in [4] to more complex signal-based synthesizers. In particular the scalability of controls to achieve a perceptive control method similar to [5] will be studied. The internship will result in the implementation of a DDSP as virtual instrument on a Digital Audio Workstation (DAW) or on an embbeded system.
The candidate should be enrolled in a M2 or engineer diploma in one or more of the following fields: signal and image processing, computer science, embedded systems. The candidate should have strong progamming abilities as well as good writing and oral communication skills. A strong interest and/or experience with audio signal processing will be appreciated.
Position can be started anytime from February, 2025 and duration is up to 6 months. The candidate will be based in Annecy. This internship will be hosted in the LISTIC laboratory, with regular meetings and exchanges with researchers from the project.
Classical GAS - Classical methods for Generative Audio Synthesis
Audio synthesis, C/C++ programming, generative models, frugal AI, deep learning,
Annecy, France.
Signal-based audio synthesis methods like the one described in [1] or [2] have seen a regain of interest recently after the introduction of the Differential Digital Signal Processing (DDSP) package [3]. It resort in using a neural architecture to produce control parameters for a signal based synthesis method. The original DDSP implementation perform succesfully in a variety of audio synthesis tasks (synthesis, transfer, ...) using a Variational Autoencoder and a Sine plus noise generative model. It offers an expressive generative model without compromising explainability and control as well as an opportunity for real time sound synthesis.
In contrast, diffusion models lacks interpretability and intuitive control, need an intensive training and suffer from a long inference time. Yet, the training of a generative model remains challenging due the unstable gradients produced by signal-based audio synthesis methods. Notably, the success of such a training of a DDSP inspired generative model is impacted by (i) the signal based synthesis method, (ii) the control neural architecture (iii) the training methodology.
In [4], a GAN-based synthesizer (StyleWaveGAN, SWG) have shown promising results on percussion synthesis with improved expressivity in control, opening way to instruments synthesis using DDSP inpired models.
During this internship, a study of DDSP based audio generative model will be performed. After a comprehensive study of DDSP inspired architectures on piano and drums datasets [6,7], the students will extend the work proposed in [4] to more complex signal-based synthesizers. In particular the scalability of controls to achieve a perceptive control method similar to [5] will be studied. The internship will result in the implementation of a DDSP as virtual instrument on a Digital Audio Workstation (DAW) or on an embbeded system.
She/he should be enrolled in a M2 or engineer diploma in one or more of the following fields: signal and image processing, computer science, embedded systems. The candidate should have strong progamming abilities as well as good writing and oral communication skills. A strong interest and/or experience with audio signal processing will be appreciated.
Position can be started anytime from February, 2025 and duration is up to 6 months. The candidate will be based in Annecy. This internship will be hosted in the LISTIC laboratory, with regular meetings and exchanges with researchers from the project.
Send a detailed CV and motivation letter to antoine.lavault@univ-smb.fr and yassine.mhiri@univ-smb.fr
[1] Serra, Xavier. “Musical Sound Modeling with Sinusoids plus Noise.” (1997).
[2] Chowning, John. “The Synthesis of Complex Audio Spectra by Means of Frequency Modulation.” Journal of The Audio Engineering Society 21 (1973): 526-534.
[3] Jesse Engel, Lamtharn (Hanoi) Hantrakul, Chenjie Gu, Adam Roberts. "DDSP: Differentiable Digital Signal Processing." International Conference on Learning Representations. 2020.
[4] Antoine Lavault. Generative Adversarial Networks for Synthesis and Control of Drum Sounds, Sorbonne Université, 2023.
[5] Antoine Lavault, Axel Roebel, Matthieu Voiry. STYLEWAVEGAN: STYLE-BASED SYNTHESIS OF DRUM SOUNDS WITH EXTENSIVE CONTROLS USING GENERATIVE ADVERSARIAL NETWORKS. 19th Sound and Music Computing Conference, 2022
[6] Gillet, Olivier and Gaël Richard. “ENST-Drums: an extensive audio-visual database for drum signals processing.” International Society for Music Information Retrieval Conference (2006).
[7]Curtis Hawthorne, Andriy Stasyuk, Adam Roberts, Ian Simon, Cheng-Zhi Anna Huang, Sander Dieleman, Erich Elsen, Jesse Engel, and Douglas Eck. "Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset.", International Conference on Learning Representations, 2019.
[8]Renault, Lenny et al. “DDSP-Piano: A Neural Sound Synthesizer Informed by Instrument Knowledge.” Journal of the Audio Engineering Society (2023): n. pag.
(c) GdR IASIS - CNRS - 2024.