Adaptive Prompt Learning for Blind Image Quality Assessment with Multi-modal Mixed-datasets Training

Citation:

Zhong Y, Zhao X, Zhang L, Song X, Jiang T. Adaptive Prompt Learning for Blind Image Quality Assessment with Multi-modal Mixed-datasets Training, in Proceedings of the 33rd ACM International Conference on Multimedia. Dublin, Ireland; 2025:7453-7462.

摘要:

Due to the high cost and small scale of Image Quality Assessment (IQA) datasets, achieving robust generalization remains challenging for prevalent Blind IQA (BIQA) methods. Traditional deep learning-based methods emphasize visual information to capture quality features, while recent developments in Vision-Language Models (VLMs) demonstrate strong potential in learning generalizable representations through textual information. However, applying VLMs to BIQA poses three major Challenges: (1) How to make full use of the multi-modal information. (2) The prompt engineering for appropriate quality description is extremely time-consuming. (3) How to use mixed data for joint training to enhance the generalization of VLM-based BIQA model. To this end, we propose a Multi-modal BIQA method with prompt learning, named MMP-IQA. For (1), we propose a conditional fusion module to better utilize the cross-modality information. By jointly adjusting visual and textual features, our model can capture quality information with a stronger representation ability. For (2), we model the quality prompt's context words with learnable vectors during the training process, which can be adaptively updated for superior performances. For (3), we jointly train a linearity-induced quality evaluator, a relative quality evaluator, and a dataset-specific absolute quality evaluator. In addition, we propose a dual automatic weight adjustment strategy to adaptively balance the loss weights between different datasets and among various losses within the same dataset. Extensive experiments illustrate the superior effectiveness of MMP-IQA.

访问链接