摘要:
General object recognition and image understanding is recognized as a dramatic goal for computer vision and multimedia retrieval. In spite of the great efforts devoted in the last two decades, it still remains an open problem. In this paper, we propose a selective attention-driven model for general image understanding, named GORIUM (general object recognition and image understanding model). The key idea of our model is to discover recurring visual objects by selective attention modeling and pairwise local invariant features matching on a large image set in an unsupervised manner. Towards this end, it can be formulated as a four-layer bottomup model, i.e., salient region detection, object segmentation, automatic object discovering and visual dictionary construction. By exploiting multi-task learning methods to model visual saliency simultaneously with the bottom-up and top-down factors, the lowest layer can effectively detect salient objects in an image. The second layer exploits a simple yet effective learning approach to generate two complementary maps from several raw saliency maps, which then can be utilized to segment the salient objects precisely from a complex scene. For the third layer, we have also implemented an unsupervised approach to automatically discover general objects from large image set by pairwise matching with local invariant features. Afterwards, visual dictionary construction can be implemented by using many state-of-the-art algorithms and tools available nowadays.