The cocktail party problem has been a challenging research topic for decades. It requires separating overlapping speeches back to individual clean speeches even in noisy and reverberant conditions. Recently, machine learning based approaches have made significant progress in separating speech mixtures, especially for monaural mixtures. Time-domain Audio Separation Network (TasNet) made it possible to directly separate speech mixtures in the time domain, instead of resorting to time-frequency representation. Inspired by TasNet and its successors, the Wavesplit model has achieved state-of-the-art performance on various benchmarks, including noisy and reverberant environments. The abovementioned machine learning models will be introduced in this presentation.
Machine learning techniques can also be applied to Head Related Transfer Function (HRTF) interpolation, which is essential for 3D audio construction. Each individual has a slightly different HRTF. It is not feasible to fully measure all individual HRTFs. Rather, it is better to interpolate HRTFs from limited existing measurements. We want to tackle this problem through the lens of functional separation of variables of partial differential equations. A new neural network model based on ordinary differential equations is proposed to investigate the problem.
Longfei Yan is pursuing a dual PhD program in Victoria University of Wellington and Australian National University on machine learning based signal separation. He graduated from Victoria University of Wellington with a first class Honours Degree majoring in Computer Science. His interested topics are deep learning, audio processing, data mining and convex optimization.