摘要:
Existing methods utilizing spatial information for sound source separation require prior knowledge of the direction of arrival (DOA) of the source or utilize estimated but imprecise localization results, which impairs the separation performance, especially when the sound sources are moving. In fact, sound source localization and separation are interconnected problems, that is, sound source localization facilitates sound separation while sound separation contributes to refined source localization. This paper proposes a method utilizing the mutual facilitation mechanism between sound source localization and separation for moving sources. The proposed method comprises three stages. The first stage is initial tracking, which tracks each sound source from the audio mixture based on the source signal envelope estimation. These tracking results may lack sufficient accuracy. The second stage involves mutual facilitation: Sound separation is conducted using preliminary sound source tracking results. Subsequently, sound source tracking is performed on the separated signals, thereby refining the tracking precision. The refined trajectories further improve separation performance. This mutual facilitation process can be iterated multiple times. In the third stage, a neural beamformer estimates precise single-channel separation results based on the refined tracking trajectories and multi-channel separation outputs. Simulation experiments conducted under reverberant conditions and with moving sound sources demonstrate that the proposed method can achieve more accurate separation based on refined tracking results.