科研成果

2025
曲天书, 吴玺宏. 基于球麦克风阵列的高阶声场记录与重放在电影音频制作中的应用. 现代电影技术. 2025;(2):4-11.Abstract
随着电影对极致沉浸式视听体验的发展需求,沉浸式声场记录和重放技术日显重要。本文围绕电影音频制作技术中的声场记录和重放问题,介绍了基于球麦克风阵列的高阶高保真立体声(Higher Order Ambisonics,HOA)分析技术,并针对球麦克风阵列球谐分解中的低频噪声与高频混叠问题,以及双耳重放技术中的阶数受限问题,给出了相应解决方案,研究表明所提方案可为观众提供更真实、更具沉浸感的声场重放效果,提升了观影体验,在电影音频制作中具有广阔的应用前景。
曲天书, 吴玺宏, 吴东航.; 2025. 基于直达声源与一阶反射声源定位的房间几何推断方法. China patent CN 202510556943.0.
Wu D, Wu X, Qu T. Leveraging Sound Source Trajectories for Universal Sound Separation. IEEE Transactions on Audio, Speech, and Language Processing [Internet]. 2025;33:2337-2348. 访问链接Abstract
Existing methods utilizing spatial information for sound source separation require prior knowledge of the direction of arrival (DOA) of the source or utilize estimated but imprecise localization results, which impairs the separation performance, especially when the sound sources are moving. In fact, sound source localization and separation are interconnected problems, that is, sound source localization facilitates sound separation while sound separation contributes to refined source localization. This paper proposes a method utilizing the mutual facilitation mechanism between sound source localization and separation for moving sources. The proposed method comprises three stages. The first stage is initial tracking, which tracks each sound source from the audio mixture based on the source signal envelope estimation. These tracking results may lack sufficient accuracy. The second stage involves mutual facilitation: Sound separation is conducted using preliminary sound source tracking results. Subsequently, sound source tracking is performed on the separated signals, thereby refining the tracking precision. The refined trajectories further improve separation performance. This mutual facilitation process can be iterated multiple times. In the third stage, a neural beamformer estimates precise single-channel separation results based on the refined tracking trajectories and multi-channel separation outputs. Simulation experiments conducted under reverberant conditions and with moving sound sources demonstrate that the proposed method can achieve more accurate separation based on refined tracking results.
2024
Wu D, Wu X, Qu T. A HYBRID DEEP-ONLINE LEARNING BASED METHOD FOR ACTIVE NOISE CONTROLIN WAVE DOMAIN, in International Conference on Acoustics, Speech and Signal Processing (ICASSP). COEX, Seoul, Korea; 2024:1301-1305.Abstract
The traditional feedback Active Noise Control (ANC) algorithms arebuilt upon linear filters, which leads to reduced performance whendealing with real-world noise. Deep learning-based feedback ANCalgorithms have been proposed to overcome this problem. However,methods relying on pre-trained neural networks exhibit performancedegradation when encountering noise from unseen scenes inthe training dataset. This paper proposed a hybrid deep-online learningbased spatial ANC system which combines online learning withpre-trained deep neural networks. The proposed method can keepthe performance on noise from the trained scenes while improve theperformance of cancelling noise from new scenes. Additionally, byincorporating wave domain decomposition, this paper achieves noisecancellation over a control spatial region. Simulation experimentsvalidate the effectiveness of the combination of online learning anddeep learning in handling previously unseen noise. Furthermore, theefficiency of wave domain decomposition in spatial noise cancellationis also verified.
Qian Y, Qu T, Tang W, Chen S, Shen W, Guo X, Chai H. Automotive acoustic channel equalization method using convex optimization in modal domain, in the AES 156th Convention. Madrid, Spain; 2024:11696.Abstract
Automotive audio systems often face sub-optimal sound quality due to the intricate acoustic properties of car cabins. Acoustic channel equalization methods are generally employed to improve sound reproduction quality in such environments. In this paper, we propose an acoustic channel equalization method using convex optimization in the modal domain. The modal domain representation is used to model the whole sound field to be equalized. Besides integrating it into the convex formulation of the acoustic channel reshaping problem, to further control the prering artifacts, the temporal window function modified according to the backward masking effect of the human auditory system is used during equalizer design. Objective and subjective experiments in a real automotive cabin proved that the proposed method enhances spatial robustness and avoids the audible prering artifacts.
Gao S, Wu X, Qu T. DOA-Informed Self-Supervised Learning Method for SoundSource Enhancement, in the AES 156th Convention. Madrid, Spain; 2024:10683.Abstract
The multiple-channel[1] sound source enhancement methods have made a great progress in recent years, especially when combined with the learning-based algorithms. However, the performance of these techniques is limited by the completeness of the training dataset, which may degrade in mismatched environments. In this paper, we propose a reconstruction Model based Self-supervised Learning (RMSL) method for sound source enhancement. A reconstruction module is used to integrate the estimated target signal and noise components to regenerate the multi-channel mixed signals, and it is connected with a separating model to form a closed loop.In this case, the optimization of the separation model can be achieved by continuously iterating the separation-reconstruction process. We use the separation error, the reconstruction error, and the signal-noise independence error as lossfunctions in the self-supervised learning process. This method is applied to the state-of-the-art sound source separation model (ADL-MVDR) and evaluated under different scenarios. Experimental results demonstrate that the proposed method can improve the performance of ADL-MVDR algorithm under different number of sound sources, bringing about 0.5 dB to 1 dB Si-SNR gain, while maintaining good clarity and intelligibility in practical application.
Wu D, Wu X, Qu T. Exploiting Motion Information in Sound Source Localizationand Tracking, in the AES 156th Convention. Madrid, Spain; 2024:10687.Abstract
Deep neural networks can be employed for estimating the direction of arrival (DOA) of individual sound sources from audio signals. Existing methods mostly focus on estimating the DOA of each source on individual frames, without utilizing the motion information of the sources. This paper proposes a method for estimating trajectories of sources, leveraging the differential of trajectories across different time scales. Additionally, a neural network is employed for enhancing the trajectories wrongly estimated especially for sound sources with low-energy. Experimental evaluations conducted on simulated dataset validate that the proposed method achieves more precise localization and tracking performance and encounters less interference when the sound source energy is low.
Ge Z, Li L, Qu T. A Hybrid Time and Time-frequency Domain Implicit NeuralRepresentation for Acoustic Fields, in the AES 156th Convention. Madrid, Spain; 2024:Express paper 196.Abstract
Creating an immersive scene relies on detailed spatial sound. Traditional methods, using probe points for impulse responses, need lots of storage. Meanwhile, geometry-based simulations struggle with complex sound effects. Now, neural-based methods are improving accuracy and slashing storage needs. In our study, we propose a hybrid time and time-frequency domain strategy to model the time series of Ambisonic acoustic fields. The networks excels in generating high-fidelity time-domain impulse responses at arbitrary source-recceiver positions by learning a continuous representation of the acoustic field. Our experimental results demonstrate that the proposed model outperforms baseline methods in various aspects of sound representation and rendering for different source-receiver positions.
Yuan Z, Gao S, Wu X, Qu T. Spatial Covariant Matrix based Learning for DOA Estimationin Spherical Harmonics Domain, in the AES 156th Convention. Madrid, Spain; 2024:10701.Abstract
Direction of arrival (DoA) estimation in complex environments is a challenging task. The traditional methods suffer from invalidity under low signal-to-noise ratio (SNR) and reverberation conditions, and the data-driven methods lack of generalization to unseen data types. In this paper we propose a robust DoA estimation approach by combining the two methods above. To focus on spatial information modeling, the proposed method directly uses the compressed covariance matrix of the first-order ambisonics (FOA) signal as input, while only white noise is used during training. To adapt to different characteristics of FOA signals in different frequency bands, our method estimates DoA in different frequency bands by particular models, and the subband results are finally integrated together. Experiments are carried out on both simulated and measured datasets, and the results show the superiority of the proposed method than existing baselines under complex conditions and the scalability for unseen data types.
曲天书, 吴玺宏, 钱宇凡.; 2024. 一种基于模态分解与凸优化的车舱声学信道均衡方法. China patent CN 202410663433.9.
曲天书, 葛钟书.; 2024. 室内声场的表征方法及室内声场时域表征模型. China patent CN 202410741409.2.
曲天书, 葛钟书.; 2024. 室内声场的表征方法及时域-时频域混合模型框架. China patent CN 202410741392.0.
2023
Yuan Z, Wu D, Wu X, Qu T. Sound event localization and detection based on iterative separation in embedding space, in 2023 6th International Conference on Information Communication and Signal Processing (ICICSP). Xian, China; 2023:455-459.
Ge Z, Tian P, Li L, Qu T. Rendering Near-field Point Sound Sources Through an Iterative Weighted Crosstalk Cancellation Method, in Audio Engineering Society Convention 154. Helsinki, Finland; 2023:10649.
Wang Y, Lan Z, Wu X, Qu T. TT-Net: Dual-Path Transformer Based Sound Field Translation in the Spherical Harmonic Domain, in International Conference on Acoustics, Speech and Signal Processing (ICASSP). Rhodes Island, Greece; 2023:1-5.
曲天书, 吴玺宏, 王奕文.; 2023. 一种基于双路自注意力机制学习的多点采样声场重建方法. China patent CN 202310667120.6.
曲天书, 吴玺宏, 葛钟书.; 2023. 一种基于扬声器阵列的近场声源重放方法. China patent CN 202310532598.8.
曲天书, 吴玺宏, 吴东航.; 2023. 一种基于深度学习和柱谐分解的空间主动降噪方法. China patent CN ZL202310955389.4.
Gao S, Wu X, Qu T. A Physical Model-Based Self-Supervised Learning Method for Signal Enhancement Under Reverberant Environment. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2023;31:2100-2110.Abstract
In a reverberant environment, interferences such as reflections and background noise can degrade the perception of the sound source signal. Although the DNN-based methods have made a tremendous breakthrough in addressing this issue, the performance of these models is highly dependent on the completeness of the training dataset, which will limit its generalization under unknown environments. In this article, we propose a physical model-based self-supervised learning (PMSSL) method to realize the DNN model optimization under unknown scenarios. This method incorporates a room reverberation physical model into the sound source enhancement model optimization process, realizing the self-learning of the DNN model under physical constraints. In this process, the time-frequency characteristics of the input signal and the spatial feature of the reverberation environment are utilized for parameter optimization, improving the adaptability of the DNN model under unknown scenarios. Experimental results based on simulated and measured data prove that the proposed method can obtain much more accurate source signal enhancement results compared with the pre-trained models, verifying its effectiveness and adaptability in new environments.
2022
Qu T, Xu J, Yuan Z, Wu X. Higher order ambisonics compression method based onautoencoder, in Audio Engineering Society Convention 153. online; 2022:Express paper 9. 访问链接Abstract
The compression of three-dimensional sound field signals has always been a very important issue. Recently, an Independent Component Analysis (ICA) based Higher Order Ambisonics (HOA) compression method introduces blind source separation to solve the shortcomings of discontinuity between frames in the existing Singular Value Decomposition (SVD) based methods. However, ICA is weak to model the reverberant environment, and its target is not to recover original signal. In this work, we replace ICA with autoencoder to further improve the above method’s ability to cope with reverberation conditions and ensure the unanimous optimization both in separation and recovery by reconstruction loss. We constructed a dataset with simulated and recorded signals, and verified the effectiveness of our method through objective and subjective experiments.

Pages