Gao S, Wu X, Qu T.
DOA-Informed Self-Supervised Learning Method for SoundSource Enhancement, in
the AES 156th Convention. Madrid, Spain; 2024:10683.
AbstractThe multiple-channel[1] sound source enhancement methods have made a great progress in recent years, especially when combined with the learning-based algorithms. However, the performance of these techniques is limited by the completeness of the training dataset, which may degrade in mismatched environments. In this paper, we propose a reconstruction Model based Self-supervised Learning (RMSL) method for sound source enhancement. A reconstruction module is used to integrate the estimated target signal and noise components to regenerate the multi-channel mixed signals, and it is connected with a separating model to form a closed loop.In this case, the optimization of the separation model can be achieved by continuously iterating the separation-reconstruction process. We use the separation error, the reconstruction error, and the signal-noise independence error as lossfunctions in the self-supervised learning process. This method is applied to the state-of-the-art sound source separation model (ADL-MVDR) and evaluated under different scenarios. Experimental results demonstrate that the proposed method can improve the performance of ADL-MVDR algorithm under different number of sound sources, bringing about 0.5 dB to 1 dB Si-SNR gain, while maintaining good clarity and intelligibility in practical application.
Ge Z, Li L, Qu T.
A Hybrid Time and Time-frequency Domain Implicit NeuralRepresentation for Acoustic Fields, in
the AES 156th Convention. Madrid, Spain; 2024:Express paper 196.
AbstractCreating an immersive scene relies on detailed spatial sound. Traditional methods, using probe points for impulse responses, need lots of storage. Meanwhile, geometry-based simulations struggle with complex sound effects. Now, neural-based methods are improving accuracy and slashing storage needs. In our study, we propose a hybrid time and time-frequency domain strategy to model the time series of Ambisonic acoustic fields. The networks excels in generating high-fidelity time-domain impulse responses at arbitrary source-recceiver positions by learning a continuous representation of the acoustic field. Our experimental results demonstrate that the proposed model outperforms baseline methods in various aspects of sound representation and rendering for different source-receiver positions.
Qian Y, Qu T, Tang W, Chen S, Shen W, Guo X, Chai H.
Automotive acoustic channel equalization method using convex optimization in modal domain, in
the AES 156th Convention. Madrid, Spain; 2024:11696.
AbstractAutomotive audio systems often face sub-optimal sound quality due to the intricate acoustic properties of car cabins. Acoustic channel equalization methods are generally employed to improve sound reproduction quality in such environments. In this paper, we propose an acoustic channel equalization method using convex optimization in the modal domain. The modal domain representation is used to model the whole sound field to be equalized. Besides integrating it into the convex formulation of the acoustic channel reshaping problem, to further control the prering artifacts, the temporal window function modified according to the backward masking effect of the human auditory system is used during equalizer design. Objective and subjective experiments in a real automotive cabin proved that the proposed method enhances spatial robustness and avoids the audible prering artifacts.
Wu D, Wu X, Qu T.
Exploiting Motion Information in Sound Source Localizationand Tracking, in Madrid, Spain; 2024:10687.
AbstractDeep neural networks can be employed for estimating the direction of arrival (DOA) of individual sound sources from audio signals. Existing methods mostly focus on estimating the DOA of each source on individual frames, without utilizing the motion information of the sources. This paper proposes a method for estimating trajectories of sources, leveraging the differential of trajectories across different time scales. Additionally, a neural network is employed for enhancing the trajectories wrongly estimated especially for sound sources with low-energy. Experimental evaluations conducted on simulated dataset validate that the proposed method achieves more precise localization and tracking performance and encounters less interference when the sound source energy is low.
Yuan Z, Gao S, Wu X, Qu T.
Spatial Covariant Matrix based Learning for DOA Estimationin Spherical Harmonics Domain, in
the AES 156th Convention. Madrid, Spain; 2024:10701.
AbstractDirection of arrival (DoA) estimation in complex environments is a challenging task. The traditional methods suffer from invalidity under low signal-to-noise ratio (SNR) and reverberation conditions, and the data-driven methods lack of generalization to unseen data types. In this paper we propose a robust DoA estimation approach by combining the two methods above. To focus on spatial information modeling, the proposed method directly uses the compressed covariance matrix of the first-order ambisonics (FOA) signal as input, while only white noise is used during training. To adapt to different characteristics of FOA signals in different frequency bands, our method estimates DoA in different frequency bands by particular models, and the subband results are finally integrated together. Experiments are carried out on both simulated and measured datasets, and the results show the superiority of the proposed method than existing baselines under complex conditions and the scalability for unseen data types.
Wu D, Wu X, Qu T.
A HYBRID DEEP-ONLINE LEARNING BASED METHOD FOR ACTIVE NOISE CONTROLIN WAVE DOMAIN, in
International Conference on Acoustics, Speech and Signal Processing (ICASSP). COEX, Seoul, Korea; 2024.