科研成果 by Type: 期刊论文

2016

Huang WC; *YT; YW; T. Fixed-point Gaussian Mixture Model for Analysis-Friendly Surveillance Video Coding. Computer Vision and Image Understanding. 2016;142(1):65-79.

2015

Yao YZ; *QH; LQ; SZ; XL; XS; H. Strategy for AestheticPhotographing Recommendation via Collaborative Composition Model. IET Computer Vision. 2015;9(5):691-698.

*Duan, Ling-Yu; Lin J; WZ; HT; GW. Weighted Component Hashing of Binary Aggregated Descriptors for Fast Visual Search. IEEE Transactions on Multimedia. 2015;17(6):828-842.Abstract

Towards low bit rate mobile visual search, recent works have proposed to aggregate the local features and compress the aggregated descriptor (such as Fisher vector, the vector of locally aggregated descriptors) for low latency query delivery as well as moderate search complexity. Even though Hamming distance can be computed very fast, the computational cost of exhaustive linear search over the binary descriptors grows linearly with either the length of a binary descriptor or the number of database images. In this paper, we propose a novel weighted component hashing (WeCoHash) algorithm for long binary aggregated descriptors to significantly improve search efficiency over a large scale image database. Accordingly, the proposed WeCoHash has attempted to address two essential issues in Hashing algorithms: "what to hash" and "how to search." "What to hash" is tackled by a hybrid approach, which utilizes both image-specific component (i.e., visual word) redundancy and bit dependency within each component of a binary aggregated descriptor to produce discriminative hash values for bucketing. "How to search" is tackled by an adaptive relevance weighting based on the statistics of hash values. Extensive comparison results have shown that WeCoHash is at least 20 times faster than linear search and 10 times faster than local sensitive hash (LSH) when maintaining comparable search accuracy. In particular, the WeCoHash solution has been adopted by the emerging MPEG compact descriptor for visual search (CDVS) standard to significantly speed up the exhaustive search of the binary aggregated descriptors.

Chen, Jie; *Duan L-Y; GF; CJ; KAHTC ;. A Low Complexity Interest Point Detector. IEEE Signal Processing Letters. 2015;22(2):172-176.Abstract

Interest point detection is a fundamental approach to feature extraction in computer vision tasks. To handle the scale invariance, interest points usually work on the scale-space representation of an image. In this letter, we propose a novel block-wise scale-space representation to significantly reduce the computational complexity of an interest point detector. Laplacian of Gaussian (LoG) filtering is applied to implement the block-wise scale-space representation. Extensive comparison experiments have shown the block-wise scale-space representation enables the efficient and effective implementation of an interest point detector in terms of memory and time complexity reduction, as well as promising performance in visual search.

Huang YT; MQ; T. TASC: A Transformation-Aware Soft Cascading Approach for Multimodal Video Copy Detection. ACM Transactions on Information Systems. 2015;33(2):Article 7-34 pages.

Huang YT; JL; SY; T. Learning Complementary Saliency Priors for Foreground Object Segmentation in Complex Scenes. International Journal of Computer Vision. 2015;111(2):153-170.

PeixiPeng(博士生)；*YonghongTian；YaoweiWang；JiaLi；TiejunHuang. Robust Multiple Cameras Pedestrian Detection with Multi-view Bayesian Network. Pattern Recognition. 2015;48(5):1760-1772.

Huang HL; BM; *LQ; JP; CZ; Q. Set-Label Modeling and DeepMetric Learning on Person Re-Identification. Neurocomputing. 2015;151:1283-1292.

*Li, Jia; Duan L-Y; CX; HT; TY. Finding the Secret of Image Saliency in the Frequency Domain. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2015;37(12):2428-2440.Abstract

There are two sides to every story of visual saliency modeling in the frequency domain. On the one hand, image saliency can be effectively estimated by applying simple operations to the frequency spectrum. On the other hand, it is still unclear which part of the frequency spectrum contributes the most to popping-out targets and suppressing distractors. Toward this end, this paper tentatively explores the secret of image saliency in the frequency domain. From the results obtained in several qualitative and quantitative experiments, we find that the secret of visual saliency may mainly hide in the phases of intermediate frequencies. To explain this finding, we reinterpret the concept of discrete Fourier transform from the perspective of template-based contrast computation and thus develop several principles for designing the saliency detector in the frequency domain. Following these principles, we propose a novel approach to design the saliency detector under the assistance of prior knowledge obtained through both unsupervised and supervised learning processes. Experimental results on a public image benchmark show that the learned saliency detector outperforms 18 state-of-the-art approaches in predicting human fixations.

huang; Chen JL; SF; *YT; T-in X. Image Saliency Estimation via Random Walk Guided by Informativeness and Latent Signal Correlations. Signal Processing: Image Communication. 2015;38:3-14.

Li, Bing; *Duan L-Y; LC-W; HT; GW. Depth-Preserving Warping for Stereo Image Retargeting. IEEE Transactions on Image Processing. 2015;24(9):2811-2826.Abstract

The popularity of stereo images and various display devices poses the need of stereo image retargeting techniques. Existing warping-based retargeting methods can well preserve the shape of salient objects in a retargeted stereo image pair. Nevertheless, these methods often incur depth distortion, since they attempt to preserve depth by maintaining the disparity of a set of sparse correspondences, rather than directly controlling the warping. In this paper, by considering how to directly control the warping functions, we propose a warping-based stereo image retargeting approach that can simultaneously preserve the shape of salient objects and the depth of 3D scenes. We first characterize the depth distortion in terms of warping functions to investigate the impact of a warping function on depth distortion. Based on the depth distortion model, we then exploit binocular visual characteristics of stereo images to derive region-based depth-preserving constraints which directly control the warping functions so as to faithfully preserve the depth of 3D scenes. Third, with the region-based depth-preserving constraints, we present a novel warping-based stereo image retargeting framework. Since the depth-preserving constraints are derived regardless of shape preservation, we relax the depth-preserving constraints to fulfill a tradeoff between shape preservation and depth preservation. Finally, we propose a quad-based implementation of the proposed framework. The results demonstrate the efficacy of our method in both depth and shape preservation for stereo image retargeting.

2014

Tiejun H. Surveillance Video: The Biggest Big Data. Computing Now [Internet]. 2014;7(2). 访问链接

LinJie(博士生)；*DuanLing-Yu；HuangYaping；LuoSiwei；HuangTiejun；GaoWen. Rate-adaptive compact fisher codes for mobile visual search. IEEE Signal Processing Letters [Internet]. 2014;21(2):195-198. 访问链接 Abstract

Extraction and transmission of compact descriptors are of great importance for next-generation mobile visual search applications. Existing visual descriptor techniques mainly compress visual features into compact codes of fixed bit rate, which is not adaptive to the bandwidth fluctuation in wireless environment. In this letter, we propose a Rate-adaptive Compact Fisher Codes (RCFC) to produce a bit rate scalable image signature. In particular, RCFC supports fast matching of descriptors based on Hamming distance; meanwhile, low memory footprint is offered. Extensive evaluation over benchmark databases shows that RCFC significantly outperforms the state-of-the-art and provides a promising descriptor scalability in terms of bit rates versus desired search performance.

Duan, Ling-Yu; *Ji R; CZ; HT; GW. Towards Mobile Document Image Retrieval for Digital Library. IEEE Transactions on Multimedia. 2014;16(2):346-359.Abstract

With the proliferation of mobile devices, recent years have witnessed an emerging potential to integrate mobile visual search techniques into digital library. Such a mobile application scenario in digital library has posed significant and unique challenges in document image search. The mobile photograph makes it tough to extract discriminative features from the landmark regions of documents, like line drawings, as well as text layouts. In addition, both search scalability and query delivery latency remain challenging issues in mobile document search. The former relies on an effective yet memory-light indexing structure to accomplish fast online search, while the latter puts a bit budget constraint of query images over the wireless link. In this paper, we propose a novel mobile document image retrieval framework, consisting of a robust Local Inner-distance Shape Context (LISC) descriptor of line drawings, a Hamming distance KD-Tree for scalable and memory-light document indexing, as well as a JBIG2 based query compression scheme, together with a Retinex based enhancement and an OTSU based binarization, to reduce the latency of delivering query while maintaining query quality in terms of search performance. We have extensively validated the key techniques in this framework by quantitative comparison to alternative approaches.

Lin, Jie; *Duan L-Y; HY; LS; HT; GW. Rate-adaptive Compact Fisher Codes for Mobile Visual Search. IEEE Signal Processing Letters. 2014;21(2):195-198.Abstract

Chen, Jie; *Duan L-Y; GF; CJ; KAHTC ;. A low complexity interest point detector. IEEE Signal Processing Letters [Internet]. 2014;22(2):172-176. 访问链接 Abstract

MouLuntian(博士生)；*HuangTiejun；TianYonghong；JiangMenglin；GaoWen. Content-based copy detection through multimodal feature representation and temporal pyramid matching. ACM Transactions on Multimedia Computing, Communications, and Applications [Internet]. 2014;10(1). 访问链接 Abstract

Content-based copy detection (CBCD) is drawing increasing attention as an alternative technology to watermarking for video identification and copyright protection. In this article, we present a comprehensive method to detect copies that are subjected to complicated transformations. A multimodal feature representation scheme is designed to exploit the complementarity of audio features, global and local visual features so that optimal overall robustness to a wide range of complicated modifications can be achieved. Meanwhile, a temporal pyramid matching algorithm is proposed to assemble frame-level similarity search results into sequence-level matching results through similarity evaluation over multiple temporal granularities. Additionally, inverted indexing and locality sensitive hashing (LSH) are also adopted to speed up similarity search. Experimental results over benchmarking datasets of TRECVID 2010 and 2009 demonstrate that the proposed method outperforms other methods for most transformations in terms of copy detection accuracy. The evaluation results also suggest that our method can achieve competitive copy localization preciseness.

黄铁军. AVS2标准及未来展望. 电视技术 [Internet]. 2014;(22):7-10. 访问链接 Abstract

概要介绍了AVS2国家标准《信息技术高效多媒体编码》系统、视频、音频三个部分,分析了AVS2与AVS1以及AVC/H.264和HEVC/H.265在编码效率上的对比,具体阐释了帧结构、块结构、帧内预测、帧间预测、变换、熵编码、环路滤波等为AVS2带来的编码增益。介绍了AVS2的特色功能——场景视频编码的概念、思路及优势。最后展望了AVS3云媒体编码标准。

黄铁军；郑锦；李波；傅慧源；马华东；薛向阳；姜育刚；于俊清. 多媒体技术研究:2013——面向智能视频监控的视觉感知与处理. 中国图象图形学报 [Internet]. 2014;(11):1539-1562. 访问链接 Abstract

目的随着视频监控技术的日益成熟和监控设备的普及,视频监控应用日益广泛,监控视频数据量呈现出爆炸性的增长,已经成为大数据时代的重要数据对象。然而由于视频数据本身的非结构化特性,使得监控视频数据的处理和分析相对困难。面对大量摄像头采集的监控视频大数据,如何有效地按照视频的内容和特性去传输、存储、分析和识别这些数据,已经成为一种迫切的需求。方法本文面向智能视频监控中大规模视觉感知与智能处理问题,围绕监控视频编码、目标检测与跟踪、监控视频增强、视频运动与异常行为识别等4个主要研究方向,系统阐述2013年度的技术发展状况,并对未来的发展趋势进行展望。结果中国最新制定的国家标准AVS2在对监控视频的编码效...

Zhang, Xianguo; *Tian Y; HT; DS; GW. Optimizing the hierarchical prediction and coding in HEVC for surveillance and conference videos with background modeling. IEEE Transactions on Image Processing [Internet]. 2014;23(10):4511-4526. 访问链接 Abstract

For the real-time and low-delay video surveillance and teleconferencing applications, the newly video coding standard HEVC can achieve much higher coding efficiency over H.264/AVC. However, we still argue that the hierarchical prediction structure in the HEVC low-delay encoder still does not fully utilize the special characteristics of surveillance and conference videos that are usually captured by stationary cameras. In this case, the background picture (G-picture), which is modeled from the original input frames, can be used to further improve the HEVC low-delay coding efficiency meanwhile reducing the complexity. Therefore, we propose an optimization method for the hierarchical prediction and coding in HEVC for these videos with background modeling. First, several experimental and theoretical analyses are conducted on how to utilize the G-picture to optimize the hierarchical prediction structure and hierarchical quantization. Following these results, we propose to encode the G-picture as the long-term reference frame to improve the background prediction, and then present a G-picture-based bit-allocation algorithm to increase the coding efficiency. Meanwhile, according to the proportions of background and foreground pixels in coding units (CUs), an adaptive speed-up algorithm is developed to classify each CU into different categories and then adopt different speed-up strategies to reduce the encoding complexity. To evaluate the performance, extensive experiments are performed on the HEVC test model. Results show our method can averagely save 39.09% bits and reduce the encoding complexity by 43.63% on surveillance videos, whereas those are 5.27% and 43.68% on conference videos.

tjhuang

北京大学信息科学技术学院教授，博士，计算机科学技术系主任，数字媒体研究所所长，AVS标准工作组秘书长

科研成果 by Type: 期刊论文

Pages

成果类型

成果概览

最新科研成果