Date Presented:
15-17 June 2024
摘要:
Creating an immersive scene relies on detailed spatial sound. Traditional methods, using probe points for impulse responses, need lots of storage. Meanwhile, geometry-based simulations struggle with complex sound effects. Now, neural-based methods are improving accuracy and slashing storage needs. In our study, we propose a hybrid time and time-frequency domain strategy to model the time series of Ambisonic acoustic fields. The networks excels in generating high-fidelity time-domain impulse responses at arbitrary source-recceiver positions by learning a continuous representation of the acoustic field. Our experimental results demonstrate that the proposed model outperforms baseline methods in various aspects of sound representation and rendering for different source-receiver positions.