科研成果 by Year: 2013

2013
LiJia(博士后);*TianYonghong;DuanLingyu;HuangTiejun. Estimating Visual Saliency Through Single Image Optimization. IEEE Signal Processing Letters. 2013;20(9):845-848.Abstract
This letter presents a novel approach for visual saliency estimation through single image optimization. Instead of directly mapping visual features to saliency values with a unified model, we treat regional saliency values as the optimization objective on each single image. By using a quadratic programming framework, our approach can adaptively optimize the regional saliency values on each specific image to simultaneously meet multiple saliency hypotheses on visual rarity, center-bias and mutual correlation. Experimental results show that our approach can outperform 14 state-of-the-art approaches on a public image benchmark.
姜延 张海波; 黄铁军;. 基于颜色和纹理特征的面料图像情感语义分析. 天津工业大学学报 [Internet]. 2013;(04):26-32. 访问链接Abstract
在前期对服装面料图像的情感描述进行研究并建立3维面料图像情感因子空间模型的基础上,通过对面料图像样品的颜色、纹理低层特征(饱和度、色相冷暖、对比度、灰度图、灰度矩阵、平均色调等)和3个因子之间对应关系的分析,得出第1个因子可以用7维特征(6维的饱和度-冷暖模糊直方图加1维的对比度)来表征;第2个因子可以用257维特征(256维的灰度图加1维的彩色对比度)来表征;第3个因子可以用4维特征(3维的灰度矩阵参数加1维的平均色调值)来表征,为实现面料图像情感识别和检索奠定基础.
高文 黄铁军; 张贤国;. 支持监控视频高效压缩与识别的IEEE 1857标准. 电子产品世界 [Internet]. 2013;(07):22-26+29. 访问链接Abstract
我国技术专家为主研究制定的数字视频编解码技术标准AVS于2013年6月4日被国际电子电气工程师协会(IEEE)标准化委员会颁布为IEEE1857标准。该标准独具特色的一个部分是针对视频监控的监控档次AVS-S2,编码压缩性能达到目前视频监控业界主流使用的H.264(又称MPEG-4AVC)标准的两倍,而且在码流层支持感兴趣区域的自动提取与表达。本文介绍了AVS-S2的制定过程、关键技术及其与其它标准的压缩效率对比情况。
王国中 黄铁军; 高文;. 数字音视频编解码技术标准AVS发展历程与应用前景. 上海大学学报(自然科学版) [Internet]. 2013;(03):221-224. 访问链接Abstract
数字音视频编解码技术标准AVS是我国自主创新战略实施的典型案例.在认真分析本领域国内外知识产权现状的基础上,相继制定了AVS国家标准、广电行业标准和IEEE国际标准,有力地支撑了我国数字视听产业"由大变强"的新格局.建立了由上百项自主专利组成的专利群,扭转了我国本领域相关企业长期受制于国外标准高额专利费而难以健康发展的被动局面.带动20多家芯片企业开发出了符合的芯片,构建了"以我为主、全面开放"的完整产业链.全国20多个省市和多个国家采用了AVS标准播出的电视频道上千路.目前,中国的中央电视台正在部署采用这个标准进行高清立体节目的卫星播出工作.
田永鸿 许腾(硕士生); 黄. 车载视觉系统中的行人检测技术综述. 中国图象图形学报 [Internet]. 2013;(04):359-367. 访问链接Abstract
作为计算机视觉以及智能车辆领域的一个重要研究方向,车载视觉系统中的行人检测技术近年来得到了业界广泛关注。本文对2005年以来该技术中最重要的两个环节——感兴趣区域分割以及目标识别的研究现状进行综述,首先将感兴趣区域分割的典型方法按照分割所用信息的不同进行分类并对比它们的优缺点,之后对行人目标识别的特征提取、分类器构造以及搜索框架等方面的进展进行总结,最后对未来发展作出展望。
刘瑞璞 张海波; 黄铁军;. 基于颜色特征的男西装图像情感语义分析. 东华大学学报(自然科学版) [Internet]. 2013;(02):185-190+195. 访问链接Abstract
在前期对男西装情感描述研究并建立二维男西装情感因子空间模型的基础上,通过对男西装图像样品的颜色特征(色相冷暖、色彩亮度及对比度)的分析,得出第一个情感因子可以较好地用10维亮度——冷暖模糊直方图解释,第二个情感因子可以利用7维的饱和度——冷暖模糊直方图和图像对比度综合起来解释.研究结果为下一步实现男西装图像情感识别和检索打下基础.
Duan, Ling-Yu; *Ji R; CJ; YH; HT; GW. Learning from mobile contexts to minimize the mobile location search latency. Signal Processing: Image Communication [Internet]. 2013;28(4):368-385. 访问链接Abstract
We propose to learn an extremely compact visual descriptor from the mobile contexts towards low bit rate mobile location search. Our scheme combines location related side information from the mobile devices to adaptively supervise the compact visual descriptor design in a flexible manner, which is very suitable to search locations or landmarks within a bandwidth constraint wireless link. Along with the proposed compact descriptor learning, a large-scale, contextual aware mobile visual search benchmark dataset PKUBench is also introduced, which serves as the first comprehensive benchmark for the quantitative evaluation of how the cheaply available mobile contexts can help the mobile visual search systems. Our proposed contextual learning based compact descriptor has shown to outperform the existing works in terms of compression rate and retrieval effectiveness.
Tian, Yonghong; *Huang T; JM; GW. Video copy-detection and localization with a scalable cascading framework. IEEE Multimedia [Internet]. 2013;20(3):72-86. 访问链接Abstract
For video copy detection, no single audio-visual feature, or single detector based on several features, can work well for all transformations. This article proposes a novel video copy-detection and localization approach with scalable cascading of complementary detectors and multiscale sequence matching. In this cascade framework, a soft-threshold learning algorithm is utilized to estimate the optimal decision thresholds for detectors, and a multiscale sequence matching method is employed to precisely locate copies using a 2D Hough transform and multigranularities similarity evaluation. Excellent performance on the TRECVID-CBCD 2011 benchmark dataset shows the effectiveness and efficiency of the proposed approach.
Duan, Ling-Yu; Chen J; JR; HT; GW. Learning compact visual descriptors for low bit rate mobile landmark search. AI Magazine [Internet]. 2013;34(2):67-85. 访问链接Abstract
Along with the ever-growing computational power of mobile devices, mobile visual search has undergone an evolution in techniques and applications. A significant trend is low bit rate visual search, where compact visual descriptors are extracted directly over a mobile and delivered as queries rather than raw images to reduce the query transmission latency. In this article, we introduce our work on low bit rate mobile landmark search, in which a compact yet discriminative landmark image descriptor is extracted by using a location context such as GPS, crowd-sourced hotspot WLAN, and cell tower locations. The compactness originates from the bag-of-words image representation, with offline learning from geotagged photos from online photosharing websites including Flickr and Panoramio. The learning process involves segmenting the landmark photo collection by discrete geographical regions using a Gaussian mixture model and then boosting a ranking-sensitive vocabulary within each region, with "entropy"-based feedback on the compactness of the descriptor to refine both phases iteratively. In online search, when entering a geographical region, the code book in a mobile device is downstream adapted to generate extremely compact descriptors with promising discriminative ability. We have deployed landmark search apps to both HTC and iPhone mobile phones, accessing a database of a million scale images in typical areas like Beijing, New York, and Barcelona, and others. Our descriptor outperforms alternative compact descriptors (Chen et al. 2009; Chen et al., 2010; Chandrasekhar et al. 2009a; Chandrasekhar et al. 2009b) by significant margins. Beyond landmark search, this article will summarize the MPEG standarization progress of compact descriptor for visual search (CDVS) (Yuri et al. 2010; Yuri et al. 2011) toward application interoperability.
黄铁军. 面向高清和3D电视的视频编解码标准AVS+. 电视技术 [Internet]. 2013;(02):11-14. 访问链接Abstract
介绍了AVS+(GY/T 257—2012《广播电视先进音视频编解码第1部分:视频》)标准的制定背景与过程,重点介绍了AVS+新增加的编码工具以及新特性,介绍了AVS+与AVS、AVC/H.264 HP(High Profile,高级档次)的性能对比,说明了AVS+与AVC HP性能相当。AVS+在多个部委的支持与推进下,将在中国高清与立体电视播出中得到应用。
*Zhang, Xianguo; Huang T; TY; GM; MS; GW. Fast and Efficient Transcoding Based on Low-Complexity Background Modeling and Adaptive Block Classification. IEEE Transactions on Multimedia. 2013;15(8):1769-1785.Abstract
It is in urgent need to develop fast and efficient transcoding methods so as to remarkably save the storage of surveillance videos and synchronously transmit conference videos over different bandwidths. Towards this end, the special characteristics of these videos, e. g., the relatively static background, should be utilized for transcoding. Therefore, we propose a fast and efficient transcoding method (FET) based on background modeling and block classification in this paper. To improve the transcoding efficiency, FET adds the background picture, which is modeled from the originally decoded frames in low complexity, into stream in the form of an intra-coded G-picture. And then, FET utilizes the reconstructed G-picture as the long-term reference frame to transcode the following frames. This is mainly because our theoretical analyses show that G-picture can significantly improve the transcoding performance. To reduce the complexity, FET utilizes an adaptive threshold updating model for block classification and then adopts different transcoding strategies for different categories. This is due to the following statistics: after dividing blocks into categories of foreground, background and hybrid ones, different block categories have different distributions of prediction modes, motion vectors and reference frames. Extensive experiments on transcoding high-bit-rate H. 264/AVC streams to low-bit-rate ones are carried out to evaluate our FET. Over the traditional full-decoding-and-full-encoding methods, FET can save more than 35% of the transcoding bit-rate with a speed-up ratio of larger than 10 on the surveillance videos. On the conference videos which should be transcoded more timely, FET achieves more than 20 times speed- up ratio with 0.2 dB gain.
Huang YT; *YW; ZH; T. Selective eigenbackground for background modeling and subtraction in crowded scenes. IEEE Transactions on Circuits and Systems for Video Technology [Internet]. 2013;23(11):1849-1864. 访问链接Abstract
Background subtraction is a fundamental preprocessing step in many surveillance video analysis tasks. In spite of significant efforts, however, background subtraction in crowded scenes remains challenging, especially, when a large number of foreground objects move slowly or just keep still. To address the problem, this paper proposes a selective eigenbackground method for background modeling and subtraction in crowded scenes. The contributions of our method are three-fold: First, instead of training eigenbackgrounds using the original video frames that may contain more or less foregrounds, a virtual frame construction algorithm is utilized to assemble clean background pixels from different original frames so as to construct some virtual frames as the training and update samples. This can significantly improve the purity of the trained eigenbackgrounds. Second, for a crowded scene with diversified environmental conditions (e.g., illuminations), it is difficult to use only one eigenbackground model to deal with all these variations, even using some online update strategies. Thus given several models trained offline, we utilize peak signal-to-noise ratio to adaptively choose the optimal one to initialize the online eigenbackground model. Third, to tackle the problem that not all pixels can obtain the optimal results when the reconstruction is performed at once for the whole frame, our method selects the best eigenbackground for each pixel to obtain an improved quality of the reconstructed background image. Extensive experiments on the TRECVID-SED dataset and the Road video dataset show that our method outperforms several state-of-the-art methods remarkably.