科研成果

2016

Huang WC; *YT; YW; T. Fixed-point Gaussian Mixture Model for Analysis-Friendly Surveillance Video Coding. Computer Vision and Image Understanding. 2016;142(1):65-79.

2015

Li, Bing; *Duan L-Y; LC-W; HT; GW. Depth-Preserving Warping for Stereo Image Retargeting. IEEE Transactions on Image Processing. 2015;24(9):2811-2826.Abstract

The popularity of stereo images and various display devices poses the need of stereo image retargeting techniques. Existing warping-based retargeting methods can well preserve the shape of salient objects in a retargeted stereo image pair. Nevertheless, these methods often incur depth distortion, since they attempt to preserve depth by maintaining the disparity of a set of sparse correspondences, rather than directly controlling the warping. In this paper, by considering how to directly control the warping functions, we propose a warping-based stereo image retargeting approach that can simultaneously preserve the shape of salient objects and the depth of 3D scenes. We first characterize the depth distortion in terms of warping functions to investigate the impact of a warping function on depth distortion. Based on the depth distortion model, we then exploit binocular visual characteristics of stereo images to derive region-based depth-preserving constraints which directly control the warping functions so as to faithfully preserve the depth of 3D scenes. Third, with the region-based depth-preserving constraints, we present a novel warping-based stereo image retargeting framework. Since the depth-preserving constraints are derived regardless of shape preservation, we relax the depth-preserving constraints to fulfill a tradeoff between shape preservation and depth preservation. Finally, we propose a quad-based implementation of the proposed framework. The results demonstrate the efficacy of our method in both depth and shape preservation for stereo image retargeting.

*Duan, Ling-Yu; Lin J; WZ; HT; GW. Weighted Component Hashing of Binary Aggregated Descriptors for Fast Visual Search. IEEE Transactions on Multimedia. 2015;17(6):828-842.Abstract

Towards low bit rate mobile visual search, recent works have proposed to aggregate the local features and compress the aggregated descriptor (such as Fisher vector, the vector of locally aggregated descriptors) for low latency query delivery as well as moderate search complexity. Even though Hamming distance can be computed very fast, the computational cost of exhaustive linear search over the binary descriptors grows linearly with either the length of a binary descriptor or the number of database images. In this paper, we propose a novel weighted component hashing (WeCoHash) algorithm for long binary aggregated descriptors to significantly improve search efficiency over a large scale image database. Accordingly, the proposed WeCoHash has attempted to address two essential issues in Hashing algorithms: "what to hash" and "how to search." "What to hash" is tackled by a hybrid approach, which utilizes both image-specific component (i.e., visual word) redundancy and bit dependency within each component of a binary aggregated descriptor to produce discriminative hash values for bucketing. "How to search" is tackled by an adaptive relevance weighting based on the statistics of hash values. Extensive comparison results have shown that WeCoHash is at least 20 times faster than linear search and 10 times faster than local sensitive hash (LSH) when maintaining comparable search accuracy. In particular, the WeCoHash solution has been adopted by the emerging MPEG compact descriptor for visual search (CDVS) standard to significantly speed up the exhaustive search of the binary aggregated descriptors.

Chen, Jie; *Duan L-Y; GF; CJ; KAHTC ;. A Low Complexity Interest Point Detector. IEEE Signal Processing Letters. 2015;22(2):172-176.Abstract

Interest point detection is a fundamental approach to feature extraction in computer vision tasks. To handle the scale invariance, interest points usually work on the scale-space representation of an image. In this letter, we propose a novel block-wise scale-space representation to significantly reduce the computational complexity of an interest point detector. Laplacian of Gaussian (LoG) filtering is applied to implement the block-wise scale-space representation. Extensive comparison experiments have shown the block-wise scale-space representation enables the efficient and effective implementation of an interest point detector in terms of memory and time complexity reduction, as well as promising performance in visual search.

*Li, Jia; Duan L-Y; CX; HT; TY. Finding the Secret of Image Saliency in the Frequency Domain. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2015;37(12):2428-2440.Abstract

There are two sides to every story of visual saliency modeling in the frequency domain. On the one hand, image saliency can be effectively estimated by applying simple operations to the frequency spectrum. On the other hand, it is still unclear which part of the frequency spectrum contributes the most to popping-out targets and suppressing distractors. Toward this end, this paper tentatively explores the secret of image saliency in the frequency domain. From the results obtained in several qualitative and quantitative experiments, we find that the secret of visual saliency may mainly hide in the phases of intermediate frequencies. To explain this finding, we reinterpret the concept of discrete Fourier transform from the perspective of template-based contrast computation and thus develop several principles for designing the saliency detector in the frequency domain. Following these principles, we propose a novel approach to design the saliency detector under the assistance of prior knowledge obtained through both unsupervised and supervised learning processes. Experimental results on a public image benchmark show that the learned saliency detector outperforms 18 state-of-the-art approaches in predicting human fixations.

huang; Chen JL; SF; *YT; T-in X. Image Saliency Estimation via Random Walk Guided by Informativeness and Latent Signal Correlations. Signal Processing: Image Communication. 2015;38:3-14.

Huang YT; JL; SY; T. Learning Complementary Saliency Priors for Foreground Object Segmentation in Complex Scenes. International Journal of Computer Vision. 2015;111(2):153-170.

Huang YT; MQ; T. TASC: A Transformation-Aware Soft Cascading Approach for Multimodal Video Copy Detection. ACM Transactions on Information Systems. 2015;33(2):Article 7-34 pages.

Yao YZ; *QH; LQ; SZ; XL; XS; H. Strategy for AestheticPhotographing Recommendation via Collaborative Composition Model. IET Computer Vision. 2015;9(5):691-698.

PeixiPeng(博士生)；*YonghongTian；YaoweiWang；JiaLi；TiejunHuang. Robust Multiple Cameras Pedestrian Detection with Multi-view Bayesian Network. Pattern Recognition. 2015;48(5):1760-1772.

Huang HL; BM; *LQ; JP; CZ; Q. Set-Label Modeling and DeepMetric Learning on Person Re-Identification. Neurocomputing. 2015;151:1283-1292.

2014

Ji, Rongrong; *Duan L-Y; CJ; HT; GW. Mining compact bag-of-patterns for low bit rate mobile visual search. IEEE Transactions on Image Processing [Internet]. 2014;23(7):3099-3113. 访问链接 Abstract

Visual patterns, i.e., high-order combinations of visual words, contributes to a discriminative abstraction of the high-dimensional bag-of-words image representation. However, the existing visual patterns are built upon the 2D photographic concurrences of visual words, which is ill-posed comparing with their real-world 3D concurrences, since the words from different objects or different depth might be incorrectly bound into an identical pattern. On the other hand, designing compact descriptors from the mined patterns is left open. To address both issues, in this paper, we propose a novel compact bag-of-patterns (CBoPs) descriptor with an application to low bit rate mobile landmark search. First, to overcome the ill-posed 2D photographic configuration, we build up a 3D point cloud from the reference images of each landmark, therefore more accurate pattern candidates can be extracted from the 3D concurrences of visual words. A novel gravity distance metric is then proposed to mine discriminative visual patterns. Second, we come up with compact image description by introducing a CBoPs descriptor. CBoP is figured out by sparse coding over the mined visual patterns, which maximally reconstructs the original bag-of-words histogram with a minimum coding length. We developed a low bit rate mobile landmark search prototype, in which CBoP descriptor is directly extracted and sent from the mobile end to reduce the query delivery latency. The CBoP performance is quantized in several large-scale benchmarks with comparisons to the state-of-the-art compact descriptors, topic features, and hashing descriptors. We have reported comparable accuracy to the million-scale bag-of-words histogram over the million scale visual words, with high descriptor compression rate (approximately 100-bits) than the state-of-the-art bag-of-words compression scheme.

*Gao, Wen; Huang T; RC; DW; CX. IEEE standards for advanced audio and video coding in emerging applications. Computer [Internet]. 2014;47(5):81-83. 访问链接 Abstract

The IEEE audio- and video-coding standards family includes updated tools that can be configured to serve new applications, such as surveillance, Internet, and intelligent systems video.

Duan, Ling-Yu; Lin J; CJ; HT; GW. Compact descriptors for visual search. IEEE Multimedia [Internet]. 2014;21(3):30-40. 访问链接 Abstract

To ensure application interoperability in visual object search technologies, the MPEG Working Group has made great efforts in standardizing visual search technologies. Moreover, extraction and transmission of compact descriptors are valuable for next-generation, mobile, visual search applications. This article reviews the significant progress of MPEG Compact Descriptors for Visual Search (CDVS) in standardizing technologies that will enable efficient and interoperable design of visual search applications. In addition, the article presents the location search and recognition oriented data collection and benchmark under the MPEG CDVS evaluation framework.

Huang, Tiejun; Dong S; *TY. Representing Visual Objects in HEVC Coding Loop. IEEE Journal on Emerging and Selected Topics in Circuits and Systems. 2014;4(1):5-16.Abstract

Different from the previous video coding standards that employ fixed-size coding blocks (and macroblocks), the latest high efficiency video coding (HEVC) introduces a quadtree structure to represent variable-size coding blocks in the coding loop. The main objective of this study is to investigate a novel way to reuse these variable-size blocks to represent the foreground objects in the picture. Towards this end, this paper proposes three methods, i.e., flagging the blocks lying in the object regions flagging compression blocks (FCB), adding an object tree in each Coding Tree Unit to describe the objects' shape in it additional object tree (AOT) and confining the block splitting procedure to fit the object shape confining by shape (CBS). Among them, FCB and CBS add a flag bit in the syntax description of the block to indicate whether it lies in the objects region, while AOT adds a separate quadtree to represent the objects. For all these methods, the additional bits are then fed into the HEVC entropy coding module to compress. As such, the representation of visual objects in the pictures can be implemented in the HEVC coding loop by reusing the variable-size blocks and entropy coding, without additional coding tools. The experiments on six manually-segmented HEVC testing sequences (three in 1080P and three in 720P) demonstrate the feasibility and effectiveness of our proposal. To represent the objects in the 1080P testing sequences, the BD rate increases of FCB, AOT, and CBS over the HEVC anchor are 1.57%, 3.27%, and 5.93% respectively; while for the 720P conference videos, those are 4.57%, 17.23%, and 26.93% (note that the average bitrate of the anchor is only 1009 kb/s).

*Li, Jia; Tian Y; HT. Visual saliency with statistical priors. International Journal of Computer Vision [Internet]. 2014;107(3):239-253. 访问链接 Abstract

Visual saliency is a useful cue to locate the conspicuous image content. To estimate saliency, many approaches have been proposed to detect the unique or rare visual stimuli. However, such bottom-up solutions are often insufficient since the prior knowledge, which often indicates a biased selectivity on the input stimuli, is not taken into account. To solve this problem, this paper presents a novel approach to estimate image saliency by learning the prior knowledge. In our approach, the influences of the visual stimuli and the prior knowledge are jointly incorporated into a Bayesian framework. In this framework, the bottom-up saliency is calculated to pop-out the visual subsets that are probably salient, while the prior knowledge is used to recover the wrongly suppressed targets and inhibit the improperly popped-out distractors. Compared with existing approaches, the prior knowledge used in our approach, including the foreground prior and the correlation prior, is statistically learned from 9.6 million images in an unsupervised manner. Experimental results on two public benchmarks show that such statistical priors are effective to modulate the bottom-up saliency to achieve impressive improvements when compared with 10 state-of-the-art methods.

Chen, Jie; *Duan L-Y; GF; CJ; KAHTC ;. A low complexity interest point detector. IEEE Signal Processing Letters [Internet]. 2014;22(2):172-176. 访问链接 Abstract

LinJie(博士生)；*DuanLing-Yu；HuangYaping；LuoSiwei；HuangTiejun；GaoWen. Rate-adaptive compact fisher codes for mobile visual search. IEEE Signal Processing Letters [Internet]. 2014;21(2):195-198. 访问链接 Abstract

Extraction and transmission of compact descriptors are of great importance for next-generation mobile visual search applications. Existing visual descriptor techniques mainly compress visual features into compact codes of fixed bit rate, which is not adaptive to the bandwidth fluctuation in wireless environment. In this letter, we propose a Rate-adaptive Compact Fisher Codes (RCFC) to produce a bit rate scalable image signature. In particular, RCFC supports fast matching of descriptors based on Hamming distance; meanwhile, low memory footprint is offered. Extensive evaluation over benchmark databases shows that RCFC significantly outperforms the state-of-the-art and provides a promising descriptor scalability in terms of bit rates versus desired search performance.

Lin, Jie; *Duan L-Y; HY; LS; HT; GW. Rate-adaptive Compact Fisher Codes for Mobile Visual Search. IEEE Signal Processing Letters. 2014;21(2):195-198.Abstract

Tiejun H. Surveillance Video: The Biggest Big Data. Computing Now [Internet]. 2014;7(2). 访问链接

tjhuang

北京大学信息科学技术学院教授，博士，计算机科学技术系主任，数字媒体研究所所长，AVS标准工作组秘书长

科研成果

Pages

成果类型

成果概览

最新科研成果