科研成果 by Year: 2014

2014

Tiejun H. Surveillance Video: The Biggest Big Data. Computing Now [Internet]. 2014;7(2). 访问链接

LinJie(博士生)；*DuanLing-Yu；HuangYaping；LuoSiwei；HuangTiejun；GaoWen. Rate-adaptive compact fisher codes for mobile visual search. IEEE Signal Processing Letters [Internet]. 2014;21(2):195-198. 访问链接 Abstract

Extraction and transmission of compact descriptors are of great importance for next-generation mobile visual search applications. Existing visual descriptor techniques mainly compress visual features into compact codes of fixed bit rate, which is not adaptive to the bandwidth fluctuation in wireless environment. In this letter, we propose a Rate-adaptive Compact Fisher Codes (RCFC) to produce a bit rate scalable image signature. In particular, RCFC supports fast matching of descriptors based on Hamming distance; meanwhile, low memory footprint is offered. Extensive evaluation over benchmark databases shows that RCFC significantly outperforms the state-of-the-art and provides a promising descriptor scalability in terms of bit rates versus desired search performance.

Duan, Ling-Yu; *Ji R; CZ; HT; GW. Towards Mobile Document Image Retrieval for Digital Library. IEEE Transactions on Multimedia. 2014;16(2):346-359.Abstract

With the proliferation of mobile devices, recent years have witnessed an emerging potential to integrate mobile visual search techniques into digital library. Such a mobile application scenario in digital library has posed significant and unique challenges in document image search. The mobile photograph makes it tough to extract discriminative features from the landmark regions of documents, like line drawings, as well as text layouts. In addition, both search scalability and query delivery latency remain challenging issues in mobile document search. The former relies on an effective yet memory-light indexing structure to accomplish fast online search, while the latter puts a bit budget constraint of query images over the wireless link. In this paper, we propose a novel mobile document image retrieval framework, consisting of a robust Local Inner-distance Shape Context (LISC) descriptor of line drawings, a Hamming distance KD-Tree for scalable and memory-light document indexing, as well as a JBIG2 based query compression scheme, together with a Retinex based enhancement and an OTSU based binarization, to reduce the latency of delivering query while maintaining query quality in terms of search performance. We have extensively validated the key techniques in this framework by quantitative comparison to alternative approaches.

Lin, Jie; *Duan L-Y; HY; LS; HT; GW. Rate-adaptive Compact Fisher Codes for Mobile Visual Search. IEEE Signal Processing Letters. 2014;21(2):195-198.Abstract

Chen, Jie; *Duan L-Y; GF; CJ; KAHTC ;. A low complexity interest point detector. IEEE Signal Processing Letters [Internet]. 2014;22(2):172-176. 访问链接 Abstract

Interest point detection is a fundamental approach to feature extraction in computer vision tasks. To handle the scale invariance, interest points usually work on the scale-space representation of an image. In this letter, we propose a novel block-wise scale-space representation to significantly reduce the computational complexity of an interest point detector. Laplacian of Gaussian (LoG) filtering is applied to implement the block-wise scale-space representation. Extensive comparison experiments have shown the block-wise scale-space representation enables the efficient and effective implementation of an interest point detector in terms of memory and time complexity reduction, as well as promising performance in visual search.

MouLuntian(博士生)；*HuangTiejun；TianYonghong；JiangMenglin；GaoWen. Content-based copy detection through multimodal feature representation and temporal pyramid matching. ACM Transactions on Multimedia Computing, Communications, and Applications [Internet]. 2014;10(1). 访问链接 Abstract

Content-based copy detection (CBCD) is drawing increasing attention as an alternative technology to watermarking for video identification and copyright protection. In this article, we present a comprehensive method to detect copies that are subjected to complicated transformations. A multimodal feature representation scheme is designed to exploit the complementarity of audio features, global and local visual features so that optimal overall robustness to a wide range of complicated modifications can be achieved. Meanwhile, a temporal pyramid matching algorithm is proposed to assemble frame-level similarity search results into sequence-level matching results through similarity evaluation over multiple temporal granularities. Additionally, inverted indexing and locality sensitive hashing (LSH) are also adopted to speed up similarity search. Experimental results over benchmarking datasets of TRECVID 2010 and 2009 demonstrate that the proposed method outperforms other methods for most transformations in terms of copy detection accuracy. The evaluation results also suggest that our method can achieve competitive copy localization preciseness.

黄铁军. AVS2标准及未来展望. 电视技术 [Internet]. 2014;(22):7-10. 访问链接 Abstract

概要介绍了AVS2国家标准《信息技术高效多媒体编码》系统、视频、音频三个部分,分析了AVS2与AVS1以及AVC/H.264和HEVC/H.265在编码效率上的对比,具体阐释了帧结构、块结构、帧内预测、帧间预测、变换、熵编码、环路滤波等为AVS2带来的编码增益。介绍了AVS2的特色功能——场景视频编码的概念、思路及优势。最后展望了AVS3云媒体编码标准。

黄铁军；郑锦；李波；傅慧源；马华东；薛向阳；姜育刚；于俊清. 多媒体技术研究:2013——面向智能视频监控的视觉感知与处理. 中国图象图形学报 [Internet]. 2014;(11):1539-1562. 访问链接 Abstract

目的随着视频监控技术的日益成熟和监控设备的普及,视频监控应用日益广泛,监控视频数据量呈现出爆炸性的增长,已经成为大数据时代的重要数据对象。然而由于视频数据本身的非结构化特性,使得监控视频数据的处理和分析相对困难。面对大量摄像头采集的监控视频大数据,如何有效地按照视频的内容和特性去传输、存储、分析和识别这些数据,已经成为一种迫切的需求。方法本文面向智能视频监控中大规模视觉感知与智能处理问题,围绕监控视频编码、目标检测与跟踪、监控视频增强、视频运动与异常行为识别等4个主要研究方向,系统阐述2013年度的技术发展状况,并对未来的发展趋势进行展望。结果中国最新制定的国家标准AVS2在对监控视频的编码效...

Zhang, Xianguo; *Tian Y; HT; DS; GW. Optimizing the hierarchical prediction and coding in HEVC for surveillance and conference videos with background modeling. IEEE Transactions on Image Processing [Internet]. 2014;23(10):4511-4526. 访问链接 Abstract

For the real-time and low-delay video surveillance and teleconferencing applications, the newly video coding standard HEVC can achieve much higher coding efficiency over H.264/AVC. However, we still argue that the hierarchical prediction structure in the HEVC low-delay encoder still does not fully utilize the special characteristics of surveillance and conference videos that are usually captured by stationary cameras. In this case, the background picture (G-picture), which is modeled from the original input frames, can be used to further improve the HEVC low-delay coding efficiency meanwhile reducing the complexity. Therefore, we propose an optimization method for the hierarchical prediction and coding in HEVC for these videos with background modeling. First, several experimental and theoretical analyses are conducted on how to utilize the G-picture to optimize the hierarchical prediction structure and hierarchical quantization. Following these results, we propose to encode the G-picture as the long-term reference frame to improve the background prediction, and then present a G-picture-based bit-allocation algorithm to increase the coding efficiency. Meanwhile, according to the proportions of background and foreground pixels in coding units (CUs), an adaptive speed-up algorithm is developed to classify each CU into different categories and then adopt different speed-up strategies to reduce the encoding complexity. To evaluate the performance, extensive experiments are performed on the HEVC test model. Results show our method can averagely save 39.09% bits and reduce the encoding complexity by 43.63% on surveillance videos, whereas those are 5.27% and 43.68% on conference videos.

Ji, Rongrong; *Duan L-Y; CJ; HT; GW. Mining compact bag-of-patterns for low bit rate mobile visual search. IEEE Transactions on Image Processing [Internet]. 2014;23(7):3099-3113. 访问链接 Abstract

Visual patterns, i.e., high-order combinations of visual words, contributes to a discriminative abstraction of the high-dimensional bag-of-words image representation. However, the existing visual patterns are built upon the 2D photographic concurrences of visual words, which is ill-posed comparing with their real-world 3D concurrences, since the words from different objects or different depth might be incorrectly bound into an identical pattern. On the other hand, designing compact descriptors from the mined patterns is left open. To address both issues, in this paper, we propose a novel compact bag-of-patterns (CBoPs) descriptor with an application to low bit rate mobile landmark search. First, to overcome the ill-posed 2D photographic configuration, we build up a 3D point cloud from the reference images of each landmark, therefore more accurate pattern candidates can be extracted from the 3D concurrences of visual words. A novel gravity distance metric is then proposed to mine discriminative visual patterns. Second, we come up with compact image description by introducing a CBoPs descriptor. CBoP is figured out by sparse coding over the mined visual patterns, which maximally reconstructs the original bag-of-words histogram with a minimum coding length. We developed a low bit rate mobile landmark search prototype, in which CBoP descriptor is directly extracted and sent from the mobile end to reduce the query delivery latency. The CBoP performance is quantized in several large-scale benchmarks with comparisons to the state-of-the-art compact descriptors, topic features, and hashing descriptors. We have reported comparable accuracy to the million-scale bag-of-words histogram over the million scale visual words, with high descriptor compression rate (approximately 100-bits) than the state-of-the-art bag-of-words compression scheme.

*Gao, Wen; Huang T; RC; DW; CX. IEEE standards for advanced audio and video coding in emerging applications. Computer [Internet]. 2014;47(5):81-83. 访问链接 Abstract

The IEEE audio- and video-coding standards family includes updated tools that can be configured to serve new applications, such as surveillance, Internet, and intelligent systems video.

*Li, Jia; Tian Y; HT. Visual saliency with statistical priors. International Journal of Computer Vision [Internet]. 2014;107(3):239-253. 访问链接 Abstract

Visual saliency is a useful cue to locate the conspicuous image content. To estimate saliency, many approaches have been proposed to detect the unique or rare visual stimuli. However, such bottom-up solutions are often insufficient since the prior knowledge, which often indicates a biased selectivity on the input stimuli, is not taken into account. To solve this problem, this paper presents a novel approach to estimate image saliency by learning the prior knowledge. In our approach, the influences of the visual stimuli and the prior knowledge are jointly incorporated into a Bayesian framework. In this framework, the bottom-up saliency is calculated to pop-out the visual subsets that are probably salient, while the prior knowledge is used to recover the wrongly suppressed targets and inhibit the improperly popped-out distractors. Compared with existing approaches, the prior knowledge used in our approach, including the foreground prior and the correlation prior, is statistically learned from 9.6 million images in an unsupervised manner. Experimental results on two public benchmarks show that such statistical priors are effective to modulate the bottom-up saliency to achieve impressive improvements when compared with 10 state-of-the-art methods.

Huang, Tiejun; Dong S; *TY. Representing Visual Objects in HEVC Coding Loop. IEEE Journal on Emerging and Selected Topics in Circuits and Systems. 2014;4(1):5-16.Abstract

Different from the previous video coding standards that employ fixed-size coding blocks (and macroblocks), the latest high efficiency video coding (HEVC) introduces a quadtree structure to represent variable-size coding blocks in the coding loop. The main objective of this study is to investigate a novel way to reuse these variable-size blocks to represent the foreground objects in the picture. Towards this end, this paper proposes three methods, i.e., flagging the blocks lying in the object regions flagging compression blocks (FCB), adding an object tree in each Coding Tree Unit to describe the objects' shape in it additional object tree (AOT) and confining the block splitting procedure to fit the object shape confining by shape (CBS). Among them, FCB and CBS add a flag bit in the syntax description of the block to indicate whether it lies in the objects region, while AOT adds a separate quadtree to represent the objects. For all these methods, the additional bits are then fed into the HEVC entropy coding module to compress. As such, the representation of visual objects in the pictures can be implemented in the HEVC coding loop by reusing the variable-size blocks and entropy coding, without additional coding tools. The experiments on six manually-segmented HEVC testing sequences (three in 1080P and three in 720P) demonstrate the feasibility and effectiveness of our proposal. To represent the objects in the 1080P testing sequences, the BD rate increases of FCB, AOT, and CBS over the HEVC anchor are 1.57%, 3.27%, and 5.93% respectively; while for the 720P conference videos, those are 4.57%, 17.23%, and 26.93% (note that the average bitrate of the anchor is only 1009 kb/s).

Duan, Ling-Yu; Lin J; CJ; HT; GW. Compact descriptors for visual search. IEEE Multimedia [Internet]. 2014;21(3):30-40. 访问链接 Abstract

To ensure application interoperability in visual object search technologies, the MPEG Working Group has made great efforts in standardizing visual search technologies. Moreover, extraction and transmission of compact descriptors are valuable for next-generation, mobile, visual search applications. This article reviews the significant progress of MPEG Compact Descriptors for Visual Search (CDVS) in standardizing technologies that will enable efficient and interoperable design of visual search applications. In addition, the article presents the location search and recognition oriented data collection and benchmark under the MPEG CDVS evaluation framework.

ZhangXianguo(博士生)；*HuangTieJun；TianYonghong；GaoWen. Background-modeling-based adaptive prediction for surveillance video coding. IEEE Transactions on Image Processing [Internet]. 2014;23(2):769-784. 访问链接 Abstract

The exponential growth of surveillance videos presents an unprecedented challenge for high-efficiency surveillance video coding technology. Compared with the existing coding standards that were basically developed for generic videos, surveillance video coding should be designed to make the best use of the special characteristics of surveillance videos (e.g., relative static background). To do so, this paper first conducts two analyses on how to improve the background and foreground prediction efficiencies in surveillance video coding. Following the analysis results, we propose a background-modeling-based adaptive prediction (BMAP) method. In this method, all blocks to be encoded are firstly classified into three categories. Then, according to the category of each block, two novel inter predictions are selectively utilized, namely, the background reference prediction (BRP) that uses the background modeled from the original input frames as the long-term reference and the background difference prediction (BDP) that predicts the current data in the background difference domain. For background blocks, the BRP can effectively improve the prediction efficiency using the higher quality background as the reference; whereas for foreground-background-hybrid blocks, the BDP can provide a better reference after subtracting its background pixels. Experimental results show that the BMAP can achieve at least twice the compression ratio on surveillance videos as AVC (MPEG-4 Advanced Video Coding) high profile, yet with a slightly additional encoding complexity. Moreover, for the foreground coding performance, which is crucial to the subjective quality of moving objects in surveillance videos, BMAP also obtains remarkable gains over several state-of-the-art methods.

Xianguo GW; TY; HT; MS; Z. IEEE 1857 Standard Empowering Smart Video Surveillance Systems. IEEE Intelligent Systems. 2014;29(1).Abstract

IEEE 1857, Standard for Advanced Audio and Video Coding, was released as IEEE 1857-2013 in June 2013. Despite consisting of several different groups, the most significant feature of IEEE 1857-2013 is its surveillance groups, which can not only achieve at least twice coding efficiency on surveillance videos as H.264/AVC HP, but also should be the most recognition-friendly video coding standard till to now. This article presents an overview of IEEE 1857 surveillance groups, highlighting the background model based coding technology and recognition-friendly functionalities. We believe that IEEE 1857-2013 will bring new opportunities and drives to the research communities and industries on smart video surveillance systems.

Wen HT; TY; G. IEEE 1857: Boosting Video Applications in CPSS. IEEE Intelligent Systems. 2014;29(2).

tjhuang

北京大学信息科学技术学院教授，博士，计算机科学技术系主任，数字媒体研究所所长，AVS标准工作组秘书长

科研成果 by Year: 2014

成果类型

成果概览

最新科研成果