科研成果 by Type: 期刊论文

2025
Huang Y, Liao X, Liang J, Shi B, Xu Y, Le Callet P. Detail-Preserving Diffusion Models for Low-Light Image Enhancement. IEEE Transactions on Circuits and Systems for Video Technology. 2025;35:3396–3409.Abstract
Existing diffusion models for low-light image enhancement typically incrementally remove noise introduced during the forward diffusion process using a denoising loss, with the process being conditioned on input low-light images. While these models demonstrate remarkable abilities in generating realistic high-frequency details, they often struggle to restore fine details that are faithful to the input. To address this, we present a novel detail-preserving diffusion model for realistic and faithful low-light image enhancement. Our approach integrates a size-agnostic diffusion process with a reverse process reconstruction loss, significantly enhancing the fidelity of enhanced images to their low-light counterparts and enabling more accurate recovery of fine details. To ensure the preservation of region- and content-aware details, we employ an efficient noise estimation network with a simplified channel-spatial attention mechanism. Additionally, we propose a multiscale ensemble scheme to maintain detail fidelity across diverse illumination regions. Comprehensive experiments on eight benchmark datasets demonstrate that our method achieves state-of-the-art results compared to over twenty existing methods in terms of both perceptual quality (LPIPS) and distortion metrics (PSNR and SSIM). The code is available at: https://github.com/CSYanH/DePDiff.
2024
Hong Y, Chang Y, Liang J, Ma L, Huang T, Shi B. Light Flickering Guided Reflection Removal. International Journal of Computer Vision (IJCV). 2024.Abstract
When photographing through a piece of glass, reflections usually degrade the quality of captured images or videos. In this paper, by exploiting periodically varying light flickering, we investigate the problem of removing strong reflections from contaminated image sequences or videos with a unified capturing setup. We propose a learning-based method that utilizes short-term and long-term observations of mixture videos to exploit one-side contextual clues in fluctuant components and brightness-consistent clues in consistent components for achieving layer separation and flickering removal, respectively. A dataset containing synthetic and real mixture videos with light flickering is built for network training and testing. The effectiveness of the proposed method is demonstrated by the comprehensive evaluation on synthetic and real data, the application for video flickering removal, and the exploratory experiment on high-speed scenes.
2023
Zhou C, Teng M, Han J, Liang J, Xu C, Cao G, Shi B. Deblurring Low-Light Images with Events. International Journal of Computer Vision (IJCV). 2023;131:1284–1298.Abstract
Modern image-based deblurring methods usually show degenerate performance in low-light conditions since the images often contain most of the poorly visible dark regions and a few saturated bright regions, making the amount of effective features that can be extracted for deblurring limited. In contrast, event cameras can trigger events with a very high dynamic range and low latency, which hardly suffer from saturation and naturally encode dense temporal information about motion. However, in low-light conditions existing event-based deblurring methods would become less robust since the events triggered in dark regions are often severely contaminated by noise, leading to inaccurate reconstruction of the corresponding intensity values. Besides, since they directly adopt the event-based double integral model to perform pixel-wise reconstruction, they can only handle low-resolution grayscale active pixel sensor images provided by the DAVIS camera, which cannot meet the requirement of daily photography. In this paper, to apply events to deblurring low-light images robustly, we propose a unified two-stage framework along with a motion-aware neural network tailored to it, reconstructing the sharp image under the guidance of high-fidelity motion clues extracted from events. Besides, we build an RGB-DAVIS hybrid camera system to demonstrate that our method has the ability to deblur high-resolution RGB images due to the natural advantages of our two-stage framework. Experimental results show our method achieves state-of-the-art performance on both synthetic and real-world images.
2022
Song Y, Wang J, Ma L, Yu J, Liang J, Yuan L, Yu Z. MARN: Multi-level Attentional Reconstruction Networks for Weakly Supervised Video Temporal Grounding. Neurocomputing. 2022;554:126625.Abstract
Video temporal grounding is a challenging task in computer vision that involves localizing a video segment semantically related to a given query from a set of videos and queries. In this paper, we propose a novel weakly-supervised model called the Multi-level Attentional Reconstruction Networks (MARN), which is trained on video-sentence pairs. During the training phase, we leverage the idea of attentional reconstruction to train an attention map that can reconstruct the given query. At inference time, proposals are ranked based on attention scores to localize the most suitable segment. In contrast to previous methods, MARN effectively aligns video-level supervision and proposal scoring, thereby reducing the training-inference discrepancy. In addition, we incorporate a multi-level framework that encompasses both proposal-level and clip-level processes. The proposal-level process generates and scores variable-length time sequences, while the clip-level process generates and scores fix-length time sequences to refine the predicted scores of the proposal in both training and testing. To improve the feature representation of the video, we propose a novel representation mechanism that utilizes intra-proposal information and adopts 2D convolution to extract inter-proposal clues for learning reliable attention maps. By accurately representing these proposals, we can better align them with the textual modalities, and thus facilitate the learning of the model. Our proposed MARN is evaluated on two benchmark datasets, and extensive experiments demonstrate its superiority over existing methods.
Liang J, Xu Y, Quan Y, Shi B, Ji H. Self-Supervised Low-Light Image Enhancement Using Discrepant Untrained Network Priors. IEEE Transactions on Circuits and Systems for Video Technology (TCSVT). 2022;32:7332–7345.Abstract
This paper proposes a deep learning method for low-light image enhancement, which exploits the generation capability of Neural Networks (NNs) while requiring no training samples except the input image itself. Based on the Retinex decomposition model, the reflectance and illumination of a low-light image are parameterized by two untrained NNs. The ambiguity between the two layers is resolved by the discrepancy between the two NNs in terms of architecture and capacity, while the complex noise with spatially-varying characteristics is handled by an illumination-adaptive self-supervised denoising module. The enhancement is done by jointly optimizing the Retinex decomposition and the illumination adjustment. Extensive experiments show that the proposed method not only outperforms existing non-learning-based and unsupervised-learning-based methods, but also competes favorably with some supervised-learning-based methods in extreme low-light conditions.
2021
Liang J, Wang J, Quan Y, Chen T, Liu J, Ling H, Xu Y. Recurrent Exposure Generation for Low-Light Face Detection. IEEE Transactions on Multimedia. 2021;24:1609–1621.
2020
Yang W, Yuan Y, Ren W, Liu J, Scheirer WJ, Wang Z, Zhang T, Zhong Q, Xie D, Pu S, et al. Advancing Image Understanding in Poor Visibility Environments: A Collective Benchmark Study. IEEE Transactions on Image Processing (TIP). 2020;29:5737–5752.Abstract
Existing enhancement methods are empirically expected to help the high-level end computer vision task: however, that is observed to not always be the case in practice. We focus on object or face detection in poor visibility enhancements caused by bad weathers (haze, rain) and low light conditions. To provide a more thorough examination and fair comparison, we introduce three benchmark sets collected in real-world hazy, rainy, and low-light conditions, respectively, with annotated objects/faces. We launched the UG2+ challenge Track 2 competition in IEEE CVPR 2019, aiming to evoke a comprehensive discussion and exploration about whether and how low-level vision techniques can benefit the high-level automatic visual recognition in various scenarios. To our best knowledge, this is the first and currently largest effort of its kind. Baseline results by cascading existing enhancement and detection models are reported, indicating the highly challenging nature of our new data as well as the large room for further technical innovations. Thanks to a large participation from the research community, we are able to analyze representative team solutions, striving to better identify the strengths and limitations of existing mindsets as well as the future directions.
2019
Liang J, Xu Y, Bao C, Quan Y, Ji H. Barzilai–Borwein-based Adaptive Learning Rate for Deep Learning. Pattern Recognition Letters (PRL). 2019;128:197–203.Abstract
Learning rate is arguably the most important hyper-parameter to tune when training a neural network. As manually setting right learning rate remains a cumbersome process, adaptive learning rate algorithms aim at automating such a process. Motivated by the success of the Barzilai–Borwein (BB) step-size method in many gradient descent methods for solving convex problems, this paper aims at investigating the potential of the BB method for training neural networks. With strong motivation from related convergence analysis, the BB method is generalized to adaptive learning rate of mini-batch gradient descent. The experiments showed that, in contrast to many existing methods, the proposed BB method is highly insensitive to initial learning rate, especially in terms of generalization performance. Also, the BB method showed its advantages on both learning speed and generalization performance over other available methods.