The major focus of this work is on the application of indefinite kernels in multimedia processing applications illustrated on the problem of content-based digital image analysis and retrieval. The term "indefinite" here relates to kernel functions associated with non-metric distance measures that are known in many applications to better capture perceptual similarity defining relations among higher level semantic concepts. This paper describes a kernel extension of distance-based discriminant analysis method whose formulation remains convex irrespective of the definiteness property of the underlying kernel. The presented method deploys indefinite kernels rendered as unrestricted linear combinations of hyperkernels to approach the problem of visual object categorization. The benefits of the proposed technique are demonstrated empirically on a real-world image data set, showing an improvement in categorization accuracy.
The problem of semantic video structuring is vital for automated management of large video collections. The goal is to automatically extract from the raw data the inner structure of a video collection; so that a whole new range of applications to browse and search video collections can be derived out of this high-level segmentation. To reach this goal, we exploit techniques that consider the full spectrum of video content; it is fundamental to properly integrate technologies from the fields of computer vision, audio analysis, natural language processing and machine learning. In this paper, a multimodal feature vector providing a rich description of the audio, visual and text modalities is first constructed. Boosted Random Fields are then used to learn two types of relationships: between features and labels and between labels associated with various modalities for improved consistency of the results. The parameters of this enhanced model are found iteratively by using two successive stages of Boosting. We experimented using the TRECvid corpus and show results that validate the approach over existing studies.
The description of visual documents is a fundamental aspect of any efficient information management system, but the process of manually annotating large collections of documents is tedious and far from being perfect. The need for a generic and extensible annotation model therefore arises. In this paper, we present DEVA, an open, generic and expressive multimedia annotation framework. DEVA is an extension of the Dublin Core specification. The model can represent the semantic content of any visual document. It is described in the ontology language DAML+OIL and can easily be extended with external specialized ontologies, adapting the vocabulary to the given application domain.
In parallel, we present the Magritte annotation tool, which is an early prototype that validates the DEVA features. Magritte allows to manually annotating image collections. It is designed with a modular and extensible architecture, which enables the user to dynamically adapt the user interface to specialized ontologies merged into DEVA.
Content-based image and, more generally, multimedia retrieval calls for a semantic understanding of the content of an image. However, the discrepancy between any automated segmentation technique and a operator-based segmentation is now established. There is therefore a strong need for a technique that exploits the best of analytical segmentation techniques while remaining flexible enough to allow for a useful decomposition of the image. In this paper, we describe an interactive image segmentation framework and applications where this framework is essential. Our technique is general enough to allow a generic segmentation technique to be embedded and optimally exploited. The principle is to efficiently obtain a fine-to-coarse image decomposition and represent this structure using an XML description scheme. When browsing through this pyramid, the user will retrieve and mark semantically useful structures within the document. The result may then be used in different contexts such as image indexing and annotation, object-based multimedia query formulation, object tracking in video, etc. We first study the conditions for a base segmentation technique to fit within our framework. Then, we show that our framework provides a solution to typical problems encountered in the context of multimedia information indexing and retrieval. We detail an example Java-based implementation which may be used as a stand-alone tool or as a module within other applications.
KEYWORDS: Multimedia, Video, Databases, Library classification systems, Visualization, Medical imaging, Information visualization, Content based image retrieval, Data storage, Video processing
Annotating image collections is crucial for different multimedia applications. Not only this provides an alternative access to visual information but it is a critical step to perform the evaluation of content-based image retrieval systems. Annotation is a tedious task so that there is a real need for developing tools that lighten the work of annotators. The tool should be flexible and offer customization so as to make the annotator the most comfortable. It should also automate the most tasks as possible. In this paper, we present a still image annotation tool that has been developed with the aim of being flexible and adaptive. The principle is to create a set of dynamic web pages that are an interface to a SQL database. The keyword set is fixed and every image receives from concurrent annotators a set of keywords along with time stamps and annotator Ids. Each annotator has the possibility of going back and forth within the collection and its previous annotations. He is helped by a number of search services and customization options. An administrative section allows the supervisor to control the parameter of the annotation, including the keyword set, given via an XML structure. The architecture of the tool is made flexible so as to accommodate further options through its development.
While in the area of relational databases interoperability is ensured by common communication protocols (e.g. ODBC/JDBC using SQL), Content Based Image Retrieval Systems (CBIRS) and other multimedia retrieval systems are lacking both a common query language and a common communication protocol. Besides its obvious short term convenience, interoperability of systems is crucial for the exchange and analysis of user data. In this paper, we present and describe an extensible XML-based query markup language, called MRML (Multimedia Retrieval markup Language). MRML is primarily designed so as to ensure interoperability between different content-based multimedia retrieval systems. Further, MRML allows researchers to preserve their freedom in extending their system as needed. MRML encapsulates multimedia queries in a way that enable multimedia (MM) query languages, MM content descriptions, MM query engines, and MM user interfaces to grow independently from each other, reaching a maximum of interoperability while ensuring a maximum of freedom for the developer. For benefitting from this, only a few simple design principles have to be respected when extending MRML for one's f\private needs. The design of extensions withing the MRML framework will be described in detail in the paper. MRML has been implemented and tested for the CBIRS Viper, using the user interface Snake Charmer. Both are part of the GNU project and can be downloaded at our site.
The recent literature has shown that the principal difficulty in multimedia retrieval is the bridging of the semantic gap between the user's wishes and his ability to formulate queries. This insight has spawned two main directions of research Query By Example (QBE) with relevance feedback (i.e. learning to improve the result of a previously formulated query) and the research in query formulation techniques, like browsing or query by sketch. Browsing techniques try to help the user in finding his target image, or an image which is sufficiently close to the desired result that it can be used in a subsequent QBE query. From the feature space viewpoint, each browsing system tries to permit the user to move consciously in feature space and eventually reach the target image. How to provide this functionality to the user is presently an open question. In fact even obtaining objective performance evaluation and comparison of these browsing paradigms is difficult. We distinguish between deterministic browsers, which try to optimize the possibility for the user to learn how the system behaves, and stochastic browsers based on more sophisticated Monte-Carlo algorithms thus sacrificing reproducibility to a better performance. Presently, these two browsing paradigms are practically incomparable, except by large scale user studies. This makes it infeasible for research groups to evaluate incremental improvement of browsing schemes. Moreover, automated benchmarks in the current literature simulate a user by a model derived directly from the distance measures used within the tested systems. Such a circular reference cannot provide a serious alternative to real user tests. In this paper, we present an automatic benchmark which uses user- annotated collections for simulating the semantic gap, thus providing a means for automatic evaluation and comparison of the different browsing paradigms. We use a very precise annotation of few words together with a thesaurus to provide sufficiently smooth behavior of the annotation-based user model. We discuss the design and evaluation of this annotation as well as the implementation of the benchmark in n MRML-compliant script with pluggable modules which allow testing of new interaction schemes (Multimedia Retrieval Markup Language).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.