科研成果 by Type: 期刊论文

2025
曲天书, 吴玺宏. 基于球麦克风阵列的高阶声场记录与重放在电影音频制作中的应用. 现代电影技术. 2025;(2):4-11.Abstract
随着电影对极致沉浸式视听体验的发展需求,沉浸式声场记录和重放技术日显重要。本文围绕电影音频制作技术中的声场记录和重放问题,介绍了基于球麦克风阵列的高阶高保真立体声(Higher Order Ambisonics,HOA)分析技术,并针对球麦克风阵列球谐分解中的低频噪声与高频混叠问题,以及双耳重放技术中的阶数受限问题,给出了相应解决方案,研究表明所提方案可为观众提供更真实、更具沉浸感的声场重放效果,提升了观影体验,在电影音频制作中具有广阔的应用前景。
Wu D, Wu X, Qu T. Leveraging Sound Source Trajectories for Universal Sound Separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2025;33:2337-2348.Abstract
Existing methods utilizing spatial information for sound source separation require prior knowledge of the direction of arrival (DOA) of the source or utilize estimated but imprecise localization results, which impairs the separation performance, especially when the sound sources are moving. In fact, sound source localization and separation are interconnected problems, that is, sound source localization facilitates sound separation while sound separation contributes to refined source localization. This paper proposes a method utilizing the mutual facilitation mechanism between sound source localization and separation for moving sources. The proposed method comprises three stages. The first stage is initial tracking, which tracks each sound source from the audio mixture based on the source signal envelope estimation. These tracking results may lack sufficient accuracy. The second stage involves mutual facilitation: Sound separation is conducted using preliminary sound source tracking results. Subsequently, sound source tracking is performed on the separated signals, thereby refining the tracking precision. The refined trajectories further improve separation performance. This mutual facilitation process can be iterated multiple times. In the third stage, a neural beamformer estimates precise single-channel separation results based on the refined tracking trajectories and multi-channel separation outputs. Simulation experiments conducted under reverberant conditions and with moving sound sources demonstrate that the proposed method can achieve more accurate separation based on refined tracking results.
2023
Gao S, Wu X, Qu T. A Physical Model-Based Self-Supervised Learning Method for Signal Enhancement Under Reverberant Environment. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2023;31:2100-2110.Abstract
In a reverberant environment, interferences such as reflections and background noise can degrade the perception of the sound source signal. Although the DNN-based methods have made a tremendous breakthrough in addressing this issue, the performance of these models is highly dependent on the completeness of the training dataset, which will limit its generalization under unknown environments. In this article, we propose a physical model-based self-supervised learning (PMSSL) method to realize the DNN model optimization under unknown scenarios. This method incorporates a room reverberation physical model into the sound source enhancement model optimization process, realizing the self-learning of the DNN model under physical constraints. In this process, the time-frequency characteristics of the input signal and the spatial feature of the reverberation environment are utilized for parameter optimization, improving the adaptability of the DNN model under unknown scenarios. Experimental results based on simulated and measured data prove that the proposed method can obtain much more accurate source signal enhancement results compared with the pre-trained models, verifying its effectiveness and adaptability in new environments.
2022
Wang C, Wang Z, Xie B, Shi X, Yang P, Liu L, Qu T, Qin Q, Xing Y, Zhu W, et al. Binaural processing deficit and cognitive impairment in Alzheimer's disease. Alzheimer's & dementia : the journal of the Alzheimer's Association. 2022;28(6):1085-1099.
Gao S, Lin J, Wu X, Qu T. Sparse DNN Model for Frequency Expanding of Higher Order Ambisonics Encoding Process. IEEE/ACM Transactions on Audio, Speech, and Language Processing [Internet]. 2022;30:1124-1135. 访问链接
2021
Ge Z, Li L, Qu T. Partially Matching Projection Decoding Method Evaluation Under Different Playback Conditions. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2021;29:1411-1423.
Fan L, Kong L, Li L, Qu T. Sensitivity to a break in interaural correlation in frequency-gliding noises. Front. Psychol. - Perception Science. 2021.
2020
Zhang M, Ge Z, Liu T, Wu X, Qu T. Modeling of Individual HRTFs Based on Spatial Principal Component Analysis. IEEE Transactions on Audio Speech and Language Processing. 2020;28:785-797.
2019
Huang Q, Liu T, Wu X, Qu T. A Generative Adervasarial Net-based Bandwidth Extension Method for Audio Compression. J. Audio Eng. Soc.,. 2019;67(12):986-993.Abstract
To reduce the burden of storing and transmitting audio signals, they are often compressed with a lossy single-channel code. Because the high-frequency components are effectively truncated when using a low bitrate encoder, listeners may experience the sound as being uncomfortable, muffled, or dull. To compensate for the perceived degradation, bandwidth extension technology can be used to regenerate the missing high frequencies from the low-frequency components during the decoding process. In this paper the authors propose a bandwidth extension method based on Generative Adversarial Networks (GAN), which is used to estimate the relationship between the MDCT spectrum in the high-frequency part and the low-frequency part. It is evaluated by a discriminant network in the GAN to get a more natural result. A complete audio coding system was built by using AAC Low Complex as the single-channel core encoder with the proposed bandwidth extension method. To evaluate the audio quality decoded by the new system, a subjective evaluation experiment was carried out using the HE-AAC as the baseline system with the MUSHRA experimental method.
2017
Gao Y, Wang Q, Ding Y, Wang C, Li H, Wu X, Qu T, Li L. Selective Attention Enhances Beta-Band Cortical Oscillation to Speech under “Cocktail-Party”Listening Conditions. Frontiers in Human Neuroscience. 2017;11:Artical 34.
2015
Kong LZ, Xie ZL, Lu LX, Qu TS, Wu XH, Yan J, Li L. Similar impacts of the interaural delay and interaural correlation on binaural gap detection. PLOS ONE. 2015;10(6):e0126342.
2014
Lei M, Luo L, Qu TS, Jia HX, Li L. Perceived location specificity in perceptual separation-induced but not fear conditioning-induced enhancement of prepulse inhibition in rats. Behavioural Brain Research. 2014;269:87-94.
Gao YY, Cao SY, Qu TS, Wu XH, Li HF, Zhang JS, Li L. Voice-associated static face image releases speech from informational masking. PsyCh Journal. 2014;3:113-120.
吴玺宏, 吕振扬, 高源, 曲天书. 近场结构化头相关传递函数的测量与分析. 数据采集与处理. 2014;29(2):180-185.
2013
Qu TS, Cao SY, Chen X, Huang Y, Li L, Wu XH, Schneider BA. Aging Effects on Detection of Spectral Changes Induced by a Break in Sound Correlation. Ear and Hearing. 2013;34(3):280-287.
2012
He WX, Gao Y, Qu TS. Introduction to AVS Audio Lossless Coding/Decoding Standard. Multimedia communications technical committee. 2012;7(2):21-24.
2011
Huang Y, J.Y.Li, Zou XF, Qu TS, Wu XH, Mao LH, Wu YH, L.Li. Perceptual Fusion Tendency of Speech Sounds. Journal of Cognitive Neuroscience. 2011;23(4):1003-1014.
2010
Qu TS, Cao SW, Wu XH. Relationship between Distance and Binaural Cues on Sound Source Localization. Acta Scientiarum Naturalium Universitatis Pekinensis. 2010;46(06):901-906.
曲天书, 何文欣, 高懿, 张搏, 吴玺宏. 一种基于提升小波变换的音频无损编解码方法. 电声技术. 2010;34(12):65-68.
杨新辉, 舒海燕, 曲天书, 张涛, 窦维蓓. 从有损到无损的音频编解码框架. 电声技术. 2010;34(12):60-64.

Pages