Zhong Y, Zhao X, Zhang L, Song X, Jiang T.
Adaptive Prompt Learning for Blind Image Quality Assessment with Multi-modal Mixed-datasets Training, in
Proceedings of the 33rd ACM International Conference on Multimedia. Dublin, Ireland; 2025:7453-7462.
访问链接AbstractDue to the high cost and small scale of Image Quality Assessment (IQA) datasets, achieving robust generalization remains challenging for prevalent Blind IQA (BIQA) methods. Traditional deep learning-based methods emphasize visual information to capture quality features, while recent developments in Vision-Language Models (VLMs) demonstrate strong potential in learning generalizable representations through textual information. However, applying VLMs to BIQA poses three major Challenges: (1) How to make full use of the multi-modal information. (2) The prompt engineering for appropriate quality description is extremely time-consuming. (3) How to use mixed data for joint training to enhance the generalization of VLM-based BIQA model. To this end, we propose a Multi-modal BIQA method with prompt learning, named MMP-IQA. For (1), we propose a conditional fusion module to better utilize the cross-modality information. By jointly adjusting visual and textual features, our model can capture quality information with a stronger representation ability. For (2), we model the quality prompt's context words with learnable vectors during the training process, which can be adaptively updated for superior performances. For (3), we jointly train a linearity-induced quality evaluator, a relative quality evaluator, and a dataset-specific absolute quality evaluator. In addition, we propose a dual automatic weight adjustment strategy to adaptively balance the loss weights between different datasets and among various losses within the same dataset. Extensive experiments illustrate the superior effectiveness of MMP-IQA.
Zhong Y, Yang C, Zhao S, Jiang T.
Semi-Supervised Blind Quality Assessment with Confidence-quantifiable Pseudo-label Learning for Authentic Images, in
Proceedings of the 42nd International Conference on Machine Learning. Vancouver, Canada: PMLR 267; 2025.
访问链接AbstractThis paper presents CPL-IQA, a novel semi-supervised blind image quality assessment (BIQA) framework for authentic distortion scenarios. To address the challenge of limited labeled data in IQA area, our approach leverages confidence-quantifiable pseudo-label learning to effectively utilize unlabeled authentically distorted images. The framework operates through a preprocessing stage and two training phases: first converting MOS labels to vector labels via entropy minimization, followed by an iterative process that alternates between model training and label optimization. The key innovations of CPL-IQA include a manifold assumption-based label optimization strategy and a confidence learning method for pseudo-labels, which enhance reliability and mitigate outlier effects. Experimental results demonstrate the framework's superior performance on real-world distorted image datasets, offering a more standardized semi-supervised learning paradigm without requiring additional supervision or network complexity.
Liu Z, Qiao L, Chu X, Ma L, Jiang T.
Towards Efficient Foundation Model for Zero-shot Amodal Segmentation, in
IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025, June 11-15. Nashville, TN, USA: Computer Vision Foundation / IEEE; 2025:20254–20264.
访问链接