For more efficiently organizing, browsing, and retrieving digital video, it is important to extract video structure information at both scene and shot levels. This paper present an effective approach to video scene segmentation based on probabilistic model merging. In our proposed method, we regard the shots in video sequence as hidden state variable and use probabilistic clustering to get the best clustering performance. The experimental results show that our method produces reasonable clustering results based on the visual content. A project named HomeVideo is introduced to show the application of the proposed method for personal video materials management.
KEYWORDS: Image retrieval, Image segmentation, Image processing, Feature extraction, Composites, Digital imaging, Human vision and color perception, Systems modeling, Visual process modeling, Image storage
This paper presents an image retrieval method based on region shape similarity. In our approach, we first segment images into primitive regions and then combine some of the primitive regions to generate meaningful composite shapes, which are used as semantic units of the images during the similarity assessment process. We employ three global shape features and a set of normalized Fourier descriptors to characterize each meaningful shape. All these features are invariant under similar transformations. Finally, we measure the similarity between two image by finding the most similar pair of shapes in the two images. Our approach has demonstrated good performance in our retrieval experiments on clipart images.
As an effective solution of the content-based image retrieval problems, relevance feedback has been put on many efforts for the past few years. In this paper, we propose a new relevance feedback approach with progressive leaning capability. It is based on a Bayesian classifier and treats positive and negative feedback examples with different strategies. It can utilitize previous users' feedback information to help the current query. Experimental results show that our algorithm achieves high accuracy and effectiveness on real-world image collections.
Image annotation is used in traditional image database systems. However, without the help of human beings, it is very difficult to extract the semantic content of an image automatically. On the other hand, it is a tedious work to annotate images in large databases one by one manually. In this paper, we present a web based semi-automatic annotation and image retrieval scheme, which integrates image search and image annotation seamlessly and effectively. In this scheme, we use both low-level features and high-level semantics to measure similarity between images in an image database. A relevance feedback process at both levels is used to refine similarity assessment. The annotation process is activated when the user provides feedback on the retrieved images. With the help of the proposed similarity metrics and relevance feedback approach at these two levels, the system can find out those images that are relevant to the user's keyword or image query more efficiently. Experimental results have proved that our scheme is effective and efficient and can be used in large image databases for image annotation and retrieval.
KEYWORDS: Internet, Video, Transform theory, Data conversion, Computing systems, Data modeling, Control systems, Personal digital assistants, Prototyping, Explosives
The explosive growth of the Internet has come with increasing diversity and heterogeneity in terms of client device capability, network bandwidth, and user preferences. To date, most Web content has been designed with desktop computers in mind, and often contains rich media such as images, audio, and video. In many cases, this content is not suitable for devices like netTVs, handheld computers, personal digital assistants, and smart phones with relatively limited display capability, storage, processing power, and network access. Thus, Internet access is still constrained on these devices and there is a need to develop alternative approaches for information delivery. In this paper, we propose a framework for adaptive content delivery in heterogeneous environments. The goal is to improve content accessibility and perceived quality of service for information access under changing network and viewer conditions. The framework includes content adaptation algorithms, client capability and network bandwidth discovery methods, and a Decision Engine for determining when and how to adapt content. We describe this framework, initial system implementations based upon this framework, and the issues associated with the deployment of such systems based on different architectures.
An image retrieval system based on an information embedding scheme is proposed. Using relevance feedback, the system gradually embeds correlations between images from a high- level semantic perspective. The system starts with low-level image features and acquires knowledge from users to correlate different images in the database. Through the selection of positive and negative examples based on a given query, the semantic relationships between images are captured and embedded into the system by splitting/merging image clusters and updating the correlation matrix. Image retrieval is then based on the resulting image clusters and the correlation matrix obtained through relevance feedback.
Grouping images into (semantically) meaningful categories using low-level visual features is still a challenging and important problem in content-based image retrieval. Based on these groupings, effective indices can be built for an image database. In this paper, we cast the image classification problem in a Bayesian framework. Specifically, we consider city vs. landscape classification, and further, classification of landscape into sunset, forest, and mountain classes. We demonstrate how high-level concepts can be understood from specific low-level image features, under the constraint that the test images do belong to one of the delineated classes. We further demonstrate that a small codebook (the optimal size is selected using the MDL principle) extracted from a vector quantizer, can be used to estimate the class-conditional densities needed for the Bayesian methodology. Classification based on color histograms, color coherence vectors, edge direction histograms, and edge-direction coherence vectors as features shows promising results. On a database of 2,716 city and landscape images, our system achieved an accuracy of 95.3 percent for city vs. landscape classification. On a subset of 528 landscape images, our system achieves an accuracy of 94.9 percent for sunset vs. forest and mountain classification, and 93.6 percent for forest vs. mountain classification. Our final goal is to combine multiple 2- class classifiers into a single hierarchical classifier.
Partitioning video sequences into individual shots is one of the fundamental processes in video content parsing and content-based video retrieval. Up to now, a variety of algorithms and systems have been developed to perform this task. However, most of these algorithms exhibit their weakness when applied to detect gradual transitions such as dissolves, wipe, fade-in and fade-out. In this paper, we presented an integrated scheme to the detection of abrupt camera breaks and gradual scene changes using DCT coefficients and motion data encoded in the MPEG compression stream. The core of the proposed approach is a tree-like classifier. Three algorithms are organized in the classifier to deal with the complicated situation in real-world video sequences separately.
The large amount of video data makes it a tedious and hard job to browse and annotate them by just fast forward and rewind. Recent works in video parsing provide a foundation for building interactive and content based video browsing systems. In this paper, a generalized top-down hierarchial clustering process, which adopts partition clustering recursively at each level of the hierarchy, is studied and used to build hierarchical views of video shots. With the clustering processes, when a list of video programs or clips is provided, a browsing system can use either key-frame and/or shot features to cluster shots into classes, each of which consists of shots of similar content. After such clustering, each class of shots can be represented by an icon, which can then be displayed at the high levels of a hierarchical browser. As a result, users can know roughly the content of video shots even without moving down to a lower level of the hierarchy.
This paper presents an experimental evaluation of different image content representations, all of which are based on the use of color histograms to support indexing and searching schemes. We investigate the use of different color resolutions, restriction to dominant colors, and matching based on both global and local histograms. We also examine how suitable numerical index keys may be designed to support retrieval, and we assess the use of Self-Organizing Maps to guide structuring a database of images. All of our results are based on experimental studies, and our conclusions should lead to useful guidelines for developing image indexing and retrieval systems based on visual content.
As digital images are progressing into the mainstream of information systems, managing and manipulating them as images becomes an important issue to be resolved before we can take full advantage of their information content. To achieve content-based image indexing and retrieval, there are active research efforts in developing techniques to utilize visual features. On the other hand, without an effective indexing scheme, any visual content based image retrieval approach will lose its effectiveness as the number of features increases. This paper presents our initial work in developing an efficient indexing scheme using artificial neural network, which focuses on eliminating unlikely candidates rather than pin-pointing the targets directly. Experiment results in retrieving images using this scheme from a prototype visual database system are given.
Browsing is important for multimedia content retrieval, editing, authoring and communications. Yet, we are still lacking browsing tools which are user friendly and content-based, at least for video materials. In this paper, we present a set of video browsing tools which utilize video content information resulting from a parsing process. Video parsing algorithms are briefly discussed and a detail description of both sequential and time-space browsing tools are presented.
Computer-assisted content-based indexing is a critical enabling technology and currently a bottleneck in productive use of video resources. This paper presents the Video Classification Project, an effort toward automating content-based video indexing and retrieval, at the Institute of Systems Science of the National University of Singapore. We discuss in detail three goals of the project: image processing tools for video parsing, feature extraction and retrieval; a knowledge-based approach to representing video content; and stratified tools which allow greater flexibility in browsing a video resource, either before or after performing specific retrieval operations.
Color can be used as a very important cue for image recognition. In industrial and commercial areas, color is widely used as a trademark or identifying feature in objects, such as packaged goods, advertising signs, etc. In image database systems, one may retrieve an image of interest by specifying prominent colors and their locations in the image (image retrieval by contents). These facts enable us to detect or identify a target object using colors. However, this task depends mainly on how effectively we can identify a color and detect regions of the given color under possibly non-uniform illumination conditions such as shade, highlight, and strong contrast. In this paper, we present an effective method to detect regions matching given colors, along with the features of the region surfaces. We adopt the HVC color coordinates in the method because of its ability of completely separating the luminant and chromatic components of colors. Three basis functions functionally serving as the low-pass, high-pass, and band-pass filters, respectively, are introduced.
KEYWORDS: Video, Cameras, Video compression, Video processing, Image segmentation, Zoom lenses, Motion analysis, Image compression, Analog electronics, Semantic video
Parsing video content is an important first step in the video indexing process. This paper presents algorithms to automate the video parsing task, including video partitioning and video clip classification according to camera operations using compressed video data. We have studied and implemented two algorithms for partitioning video data compressed according to the MPEG standard. The first one is based on discrete cosine transform coefficients of video frames, and the other based on correlation of motion vectors. Algorithms to detect camera operations using motion vectors are presented.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.