科研成果

2012

陈明立；张畅芯；杨少娟；毛利华；田永鸿；黄铁军；吴玺宏；高文；李量；CHENMing-Li1；ZHANGChang-Xin1；YANGShao-Juan1；MAOLi-Hua1, 3；TIANYong-Hong2 3；HUANGT-J2 3；WUX-H2 3；GAOW2 3；LIL1 3(1D. 基于双眼视差的立体视觉去掩蔽效应. 心理科学进展 [Internet]. 2012;(09):1355-1363. 访问链接 Abstract

基于双眼视差的立体视觉不改变目标与掩蔽刺激之间的信噪比,但能使不同的刺激被知觉在不同的深度位置上以降低目标信号所受到的掩蔽作用。本综述在总结前人双眼去掩蔽研究的基础上,强调深度维度上的视觉注意在这种主观空间分离去掩蔽过程中所起的重要作用,并介绍了双眼去掩蔽在现代科技领域中的典型性应用。最后,结合听觉主观空间分离去掩蔽的研究进展,本综述认为主观空间分离去掩蔽是大脑处理复杂刺激场景的一个基本功能。

高文田永鸿; 黄铁军;. 社会多媒体计算. 中国计算机学会通讯. 2012;8(4):8-13.

段凌宇；黄铁军；高文；DuanLingyuHuangTiejunGaoWenSchoolofElectro. 移动视觉搜索技术研究与标准化进展. 信息通信技术. 2012;(06):51-58.Abstract

移动视觉搜索成为未来移动世界中有影响的基础技术之一。文章介绍了移动视觉搜索面临的技术挑战,探讨了包括紧凑视觉描述子、视觉检索流程、检索系统互操作性等关键技术。围绕紧凑视觉描述子概述了移动视觉搜索国际标准的工作进展,提出建设大规模视觉对象数据集的重要意义。

Gao YT; TH; W. Multimodal Video Copy Detection using Multi-Detectors Fusion. IEEE COMSOC MMTC E-Letter, vol.7, no.5, pp.3-6,Sept.2012. 2012;7(5):3-6.

JingjingYang(博士生)；YonghongTian；LingyuDuan；TiejunHuang；WenGao. Group-sensitive Multiple Kernel Learning for Object Recognition. IEEE Transactions on Image Processing. 2012;21(5):2838-2852.Abstract

In this paper, a group-sensitive multiple kernel learning (GS-MKL) method is proposed for object recognition to accommodate the intra-class diversity and the inter-class correlation. By introducing the group between the object category and individual images as an intermediate representation, GS-MKL attempts to learn group-sensitive multi-kernel combinations together with the associated classifier. For each object category, the image corpus from the same category is partitioned into groups. Images with similar appearance are partitioned into the same group, which corresponds to the sub-category of the object category. Accordingly, intra-class diversity can be represented by the set of groups from the same category but with diverse appearances; Inter-class correlation can be represented by the correlation between the groups from different categories. GS-MKL provides a tractable solution to adapt multikernel combination to local data distribution and to seek a trade-off between capturing the diversity and keeping the invariance for each object category. Different from the simple hybrid grouping strategy that solves sample grouping and GS-MKL training independently, two sample grouping strategies are proposed to integrate sample grouping and GS-MKL training. The first one is looping hybrid grouping method where global kernel clustering method and GS-MKL interact with each other by sharing group-sensitive multi- kernel combination. The second one is dynamic divisive grouping method where hierarchical kernel-based grouping process interacts with GS-MKL. Experimental results show that performance of GS-MKL does not vary significantly with different grouping strategies, but looping hybrid grouping method produces slightly better results. On four challenging datasets, our proposed method has achieved encouraging performance comparable to the state-of-the-art and outperformed several existing MKL methods.

张海波；黄铁军；万飞飞. 基于Web的服装面料图像情感测试系统. 针织工业 [Internet]. 2012;(08):55-58. 访问链接 Abstract

服装设计师要想设计出具有情感感知的服装,需要关注服装面料给人带来的情感。而情感是非常主观性的,为了量化评价服装面料对人的情感影响,文中开发了一套基于网络(Web)的服装面料情感测试系统。文中首先对该系统进行功能分析,采用的情感测试理论,利用PHP+MYSQL进行了技术实现,并在实际应用中取得了较好的效果。该系统采集的情感测试数据为后续的服装面料情感语义识别和理解等研究工作打下了基础。

黄铁军陈宜明; 段凌宇;. 基于潜在主题的分布式视觉检索模型. 计算机工程 [Internet]. 2012;(24):146-151. 访问链接 Abstract

为将文档聚类划分的分布式检索方法直接应用于视觉检索领域,提出一种基于潜在主题的分布式视觉检索模型。给出模型框架,包括图像视觉单词的数据集划分方法和图像子集选择方法,以此优化图像分布式检索性能。实验结果表明,该模型在不降低检索准确率的前提下,能优先选择少量的图像子集进行检索,并提高查询的吞吐量。

高文段凌宇; 黄铁军;. 移动视觉搜索技术研究与标准化进展. 信息通信技术 [Internet]. 2012;(06):51-58. 访问链接 Abstract

2011

LiJia(博士生)；TianYonghong；HuangTiejun；GaoWen. Multi-Task Rank Learning for Visual Saliency Estimation. IEEE Transactions on Circuits and Systems for Video Technology. 2011;21(5):623-636.Abstract

Visual saliency plays an important role in various video applications such as video retargeting and intelligent video advertising. However, existing visual saliency estimation approaches often construct a unified model for all scenes, thus leading to poor performance for the scenes with diversified contents. To solve this problem, we propose a multi-task rank learning approach which can be used to infer multiple saliency models that apply to different scene clusters. In our approach, the problem of visual saliency estimation is formulated in a pair-wise rank learning framework, in which the visual features can be effectively integrated to distinguish salient targets from distractors. A multi-task learning algorithm is then presented to infer multiple visual saliency models simultaneously. By an appropriate sharing of information across models, the generalization ability of each model can be greatly improved. Extensive experiments on a public eye-fixation dataset show that our multi-task rank learning approach outperforms 12 state-of-the-art methods remarkably in visual saliency estimation.

HaoNan HTJ; TYH; LJ; Y. Salient region detection and segmentation for general object recognition and image understanding. Science China Information Sciences. 2011;54(12):2461-2470.Abstract

General object recognition and image understanding is recognized as a dramatic goal for computer vision and multimedia retrieval. In spite of the great efforts devoted in the last two decades, it still remains an open problem. In this paper, we propose a selective attention-driven model for general image understanding, named GORIUM (general object recognition and image understanding model). The key idea of our model is to discover recurring visual objects by selective attention modeling and pairwise local invariant features matching on a large image set in an unsupervised manner. Towards this end, it can be formulated as a four-layer bottomup model, i.e., salient region detection, object segmentation, automatic object discovering and visual dictionary construction. By exploiting multi-task learning methods to model visual saliency simultaneously with the bottom-up and top-down factors, the lowest layer can effectively detect salient objects in an image. The second layer exploits a simple yet effective learning approach to generate two complementary maps from several raw saliency maps, which then can be utilized to segment the salient objects precisely from a complex scene. For the third layer, we have also implemented an unsupervised approach to automatically discover general objects from large image set by pairwise matching with local invariant features. Afterwards, visual dictionary construction can be implemented by using many state-of-the-art algorithms and tools available nowadays.

tjhuang

北京大学信息科学技术学院教授，博士，计算机科学技术系主任，数字媒体研究所所长，AVS标准工作组秘书长

科研成果

Pages

成果类型

最新科研成果