科研成果 by Type: 期刊论文

2026

You Y, Qian Y, Qu T, Wang B, Lv X. Spherical Harmonic Beamforming–Based Ambisonics Encoding Method in Frequency and Time Domain. Journal of the Audio Engineering Society [Internet]. 2026;74(6):417-429. 访问链接 Abstract

Implementing Higher-Order Ambisonics (HOA) on consumer devices is hindered by their sparse, irregular microphone arrays, which challenge conventional methods with issues like spatial aliasing and ill-conditioning. This paper proposes a unified Spherical Harmonic Beamforming (SHB-AE) framework that recasts HOA encoding as a spatial filtering problem, enabling robust, signal-independent solutions. We develop two approaches: a frequency-domain (FD) method with compensation for high-frequency artifacts, and a time-domain (TD) methodthat holistically optimizes broadband FIR filters for enhanced stability. The framework is inherently scalable, allowing on-demand order expansion. Using a measured smartphone array, comprehensive objective and subjective evaluations demonstrate the clear superiority of theTD method. It excels in signal fidelity, spatial accuracy, and temporal consistency, outperforming baseline and FD approaches. The TD method also maintains its advantage in adverse conditions, showing remarkable robustness against noise, reverberation, and multi-source environments.It provides a practical, high-performance pathway for enabling high-fidelity spatial audio capture on ubiquitous consumer devices without requiring complex signal analysis or large datasets.

2025

Qian Y, Wu X, Qu T. Automotive sound field reproduction using deep optimization with spatial domain constraint. The Journal of the Acoustical Society of America [Internet]. 2025;158(4):3063-3077. 访问链接 Abstract

Sound field reproduction with undistorted sound quality and precise spatial localization is desirable for automotiveaudio systems. However, the complexity of the automotive cabin acoustic environment often necessitates a trade-offbetween sound quality and spatial accuracy. To overcome this limitation, we propose Spatial Power Map Net, alearning-based sound field reproduction method that improves both sound quality and spatial localization in complexenvironments. We introduce a spatial power map constraint, which characterizes the angular energy distribution ofthe reproduced field using beamforming. This constraint guides energy toward the intended direction to enhance spatiallocalization, and is integrated into a multi-channel equalization framework to also improve sound quality underreverberant conditions. To address the resulting non-convexity, deep optimization that uses neural networks to solveoptimization problems is employed for filter design. Both in situ objective and subjective evaluations confirm thatour method enhances sound quality and improves spatial localization within the automotive cabin. Furthermore, weanalyze the influence of different audio materials and the arrival angles of the virtual sound source in the reproducedsound field, investigating the potential underlying factors affecting these results.

Gao S, Wang Y, Yuan Z, Wu X, Qu T. Joint Estimation of Sound Source Position and Room Boundaries Using a Multitask Deep Neural Network Model. Journal of the Audio Engineering Society [Internet]. 2025;73(10):633-647. 访问链接 Abstract

Conventional room geometry blind inference techniques with acoustic signals often rely on prior knowledge, such as source signals or source positions, limiting their applicability when the sound source is unknown. To solve this problem, the authors propose a novel multitask deep neural network (DNN) model that jointly estimates sound source localization and room geometry using signals captured by a spherical microphone array. Considering the coupling between sound source content and environmental parameters in reverberation signals, extracted early reflection direction and delay information as network inputs to estimate spatial parameters is used, ensuring independence from the sound source signal. The proposed model employs a hierarchical architecturewith dedicated subnetworks to process direction-of-arrival (DOA) andtime-difference-of-arrival features, followed by a shared fusion module that exploits geometricconstraints between source and boundary positions. Compared with traditional methods, thismodel requires less prior environmental information and performs sound source localizationand room geometry inference with single-position sound field measurements. Experimentalresults from simulations and real measurements demonstrate the method’s effectiveness andprecision compared with conventional approaches across various scenarios.

曲天书, 吴玺宏. 基于球麦克风阵列的高阶声场记录与重放在电影音频制作中的应用. 现代电影技术. 2025;(2):4-11.Abstract

随着电影对极致沉浸式视听体验的发展需求,沉浸式声场记录和重放技术日显重要。本文围绕电影音频制作技术中的声场记录和重放问题,介绍了基于球麦克风阵列的高阶高保真立体声(Higher Order Ambisonics,HOA)分析技术,并针对球麦克风阵列球谐分解中的低频噪声与高频混叠问题,以及双耳重放技术中的阶数受限问题,给出了相应解决方案,研究表明所提方案可为观众提供更真实、更具沉浸感的声场重放效果,提升了观影体验,在电影音频制作中具有广阔的应用前景。

Wu D, Wu X, Qu T. Leveraging Sound Source Trajectories for Universal Sound Separation. IEEE Transactions on Audio, Speech, and Language Processing [Internet]. 2025;33:2337-2348. 访问链接 Abstract

Existing methods utilizing spatial information for sound source separation require prior knowledge of the direction of arrival (DOA) of the source or utilize estimated but imprecise localization results, which impairs the separation performance, especially when the sound sources are moving. In fact, sound source localization and separation are interconnected problems, that is, sound source localization facilitates sound separation while sound separation contributes to refined source localization. This paper proposes a method utilizing the mutual facilitation mechanism between sound source localization and separation for moving sources. The proposed method comprises three stages. The first stage is initial tracking, which tracks each sound source from the audio mixture based on the source signal envelope estimation. These tracking results may lack sufficient accuracy. The second stage involves mutual facilitation: Sound separation is conducted using preliminary sound source tracking results. Subsequently, sound source tracking is performed on the separated signals, thereby refining the tracking precision. The refined trajectories further improve separation performance. This mutual facilitation process can be iterated multiple times. In the third stage, a neural beamformer estimates precise single-channel separation results based on the refined tracking trajectories and multi-channel separation outputs. Simulation experiments conducted under reverberant conditions and with moving sound sources demonstrate that the proposed method can achieve more accurate separation based on refined tracking results.

2023

Gao S, Wu X, Qu T. A Physical Model-Based Self-Supervised Learning Method for Signal Enhancement Under Reverberant Environment. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2023;31:2100-2110.Abstract

In a reverberant environment, interferences such as reflections and background noise can degrade the perception of the sound source signal. Although the DNN-based methods have made a tremendous breakthrough in addressing this issue, the performance of these models is highly dependent on the completeness of the training dataset, which will limit its generalization under unknown environments. In this article, we propose a physical model-based self-supervised learning (PMSSL) method to realize the DNN model optimization under unknown scenarios. This method incorporates a room reverberation physical model into the sound source enhancement model optimization process, realizing the self-learning of the DNN model under physical constraints. In this process, the time-frequency characteristics of the input signal and the spatial feature of the reverberation environment are utilized for parameter optimization, improving the adaptability of the DNN model under unknown scenarios. Experimental results based on simulated and measured data prove that the proposed method can obtain much more accurate source signal enhancement results compared with the pre-trained models, verifying its effectiveness and adaptability in new environments.

2022

Wang C, Wang Z, Xie B, Shi X, Yang P, Liu L, Qu T, Qin Q, Xing Y, Zhu W, et al. Binaural processing deficit and cognitive impairment in Alzheimer's disease. Alzheimer's & dementia : the journal of the Alzheimer's Association. 2022;28(6):1085-1099.

Gao S, Lin J, Wu X, Qu T. Sparse DNN Model for Frequency Expanding of Higher Order Ambisonics Encoding Process. IEEE/ACM Transactions on Audio, Speech, and Language Processing [Internet]. 2022;30:1124-1135. 访问链接

2021

Ge Z, Li L, Qu T. Partially Matching Projection Decoding Method Evaluation Under Different Playback Conditions. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2021;29:1411-1423.

Fan L, Kong L, Li L, Qu T. Sensitivity to a break in interaural correlation in frequency-gliding noises. Front. Psychol. - Perception Science. 2021.

2020

Zhang M, Ge Z, Liu T, Wu X, Qu T. Modeling of Individual HRTFs Based on Spatial Principal Component Analysis. IEEE Transactions on Audio Speech and Language Processing. 2020;28:785-797.

2019

Huang Q, Liu T, Wu X, Qu T. A Generative Adervasarial Net-based Bandwidth Extension Method for Audio Compression. J. Audio Eng. Soc.,. 2019;67(12):986-993.Abstract

To reduce the burden of storing and transmitting audio signals, they are often compressed with a lossy single-channel code. Because the high-frequency components are effectively truncated when using a low bitrate encoder, listeners may experience the sound as being uncomfortable, muffled, or dull. To compensate for the perceived degradation, bandwidth extension technology can be used to regenerate the missing high frequencies from the low-frequency components during the decoding process. In this paper the authors propose a bandwidth extension method based on Generative Adversarial Networks (GAN), which is used to estimate the relationship between the MDCT spectrum in the high-frequency part and the low-frequency part. It is evaluated by a discriminant network in the GAN to get a more natural result. A complete audio coding system was built by using AAC Low Complex as the single-channel core encoder with the proposed bandwidth extension method. To evaluate the audio quality decoded by the new system, a subjective evaluation experiment was carried out using the HE-AAC as the baseline system with the MUSHRA experimental method.

2017

Gao Y, Wang Q, Ding Y, Wang C, Li H, Wu X, Qu T, Li L. Selective Attention Enhances Beta-Band Cortical Oscillation to Speech under “Cocktail-Party”Listening Conditions. Frontiers in Human Neuroscience. 2017;11:Artical 34.

2015

Kong LZ, Xie ZL, Lu LX, Qu TS, Wu XH, Yan J, Li L. Similar impacts of the interaural delay and interaural correlation on binaural gap detection. PLOS ONE. 2015;10(6):e0126342.

2014

Lei M, Luo L, Qu TS, Jia HX, Li L. Perceived location specificity in perceptual separation-induced but not fear conditioning-induced enhancement of prepulse inhibition in rats. Behavioural Brain Research. 2014;269:87-94.

Gao YY, Cao SY, Qu TS, Wu XH, Li HF, Zhang JS, Li L. Voice-associated static face image releases speech from informational masking. PsyCh Journal. 2014;3:113-120.

吴玺宏, 吕振扬, 高源, 曲天书. 近场结构化头相关传递函数的测量与分析. 数据采集与处理. 2014;29(2):180-185.

2013

Qu TS, Cao SY, Chen X, Huang Y, Li L, Wu XH, Schneider BA. Aging Effects on Detection of Spectral Changes Induced by a Break in Sound Correlation. Ear and Hearing. 2013;34(3):280-287.

2012

He WX, Gao Y, Qu TS. Introduction to AVS Audio Lossless Coding/Decoding Standard. Multimedia communications technical committee. 2012;7(2):21-24.

2011

Huang Y, J.Y.Li, Zou XF, Qu TS, Wu XH, Mao LH, Wu YH, L.Li. Perceptual Fusion Tendency of Speech Sounds. Journal of Cognitive Neuroscience. 2011;23(4):1003-1014.

Qu Tianshu

北京大学智能学院National Key Laboratory of General Artificial Intelligence;School of Intelligence Science and Technology 副教授博士生导师

科研成果 by Type: 期刊论文

Pages

成果类型

成果概览

最新科研成果