PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
In this paper, we present MediaNet, which is a knowledge representation framework that uses multimedia content for representing semantic and perceptual information. The main components of MediaNet include conceptual entities, which correspond to real world objects, and relationships among concepts. MediaNet allows the concepts and relationships to be defined or exemplified by multimedia content such as images, video, audio, graphics, and text. MediaNet models the traditional relationship types such as generalization and aggregation but adds additional functionality by modeling perceptual relationships based on feature similarity. For example, MediaNet allows a concept such as car to be defined as a type of a transportation vehicle, but which is further defined and illustrated through example images, videos and sounds of cars. In constructing the MediaNet framework, we have built on the basic principles of semiotics and semantic networks in addition to utilizing the audio-visual content description framework being developed as part of the MPEG-7 multimedia content description standard. By integrating both conceptual and perceptual representations of knowledge, MediaNet has potential to impact a broad range of applications that deal with multimedia content at the semantic and perceptual levels. In particular, we have found that MediaNet can improve the performance of multimedia retrieval applications by using query expansion, refinement and translation across multiple content modalities. In this paper, we report on experiments that use MediaNet in searching for images. We construct the MediaNet knowledge base using both WordNet and an image network built from multiple example images and extracted color and texture descriptors. Initial experimental results demonstrate improved retrieval effectiveness using MediaNet in a content-based retrieval system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image classification into meaningful classes is essentially a supervised pattern recognition problem. These classes include indoor, outdoor, landscape, urban, faces, etc. The recognition problem necessitates a large set of labeled examples for training the classifier. Any stratagem, which reduces the burden of labeling, is therefore very important to the deployment of such classifiers in practical applications. In this paper we show that the labeled training set can be augmented by an unlabeled set of examples in order to boost the performance of the classifier. In general, the set of unlabeled examples is not guaranteed to improve the classifier performance. We show that if the actual examples to be labeled are automatically selected through an unsupervised clustering step, the performance is more likely to improve with the unlabeled set. In this paper, we first present a modified EM algorithm, which combined labeled and unlabeled sets for training. We then apply this algorithm to image classification. Using mutually exclusive classes we show that the clustering step is crucial to the improvement in classifier performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To facilitate easy access to rich information of multimedia over the Internet, we develop a knowledge-based classification system that supports automatic Indexing and filtering based on semantic concepts for the dissemination of on-line real-time media. Automatic segmentation, annotation and summarization of media for fast information browsing and updating are achieved in the same time. In the proposed system, a real-time scene-change detection proxy performs an initial video structuring process by splitting a video clip into scenes. Motional and visual features are extracted in real time for every detected scene by using online feature extraction proxies. Higher semantics are then derived through a joint use of low-level features along with inference rules in the knowledge base. Inference rules are derived through a supervised learning process based on representative samples. On-line media filtering based on semantic concepts becomes possible by using the proposed video inference engine. Video streams are either blocked or sent to certain channels depending on whether or not the video stream is matched with the user's profile. The proposed system is extensively evaluated by applying the engine to video of basketball games.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Techniques for content=based image or video retrieval are not mature enough to recognize visual semantic completely. Retrieval based on color, size, texture and shape are within the state of the art. Our experiments on human factors in visual information query and retrieval show that visual information retrieval based on the semantic understanding of visual objects and content are more demanding rather than visual appearance based retrieval. Therefore, it is necessary to use captions or text annotations to photos or videos in content access of visual data. In this paper, human factors in text and image searching are carefully investigated. Based on the resulting human factors, a framework for integrated querying of visual information and textual concept is presented. The framework includes ontology- based semantic query expansion through query term rewriting and database navigation within a conceptual hierarchy within multi modal querying environments. To allow similarity based concept retrieval, a new conceptual similarity distance measure between two conceptual entities in a given conceptual space is proposed. The dissimilarity metric is a minimum weighted path length in the corresponding conceptual tree.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we describe a multimedia annotation tool that allows the users to interactively create MPEG-7 descriptions. The MPEG-7 standard provides description structures for multimedia content in the form of Description Schemes (DS) and Descriptors with the goal of enabling interoperable searching and filtering. The MPEG-7 Visual Annotation Tool allows the user to manually link together MPEG-7 Dss and Descriptors as needed and enter description data into the fields of the description structures. The tool takes as input an MPEG-7 Schema definition file and an MPEG-7 package description file. The MPEG-7 Schema definition file defines the structure of the MPEG-7 description components using the MPEG-7 Description Definition Language (DDL). The Package description file organized the MPEG-7 description components in order to improve the ease of navigation in the MPEG-7 Visual Annotation Tool. The tool provides utilities for drag-and-drop copying and reusing of description elements and allows the output of the descriptions in XML to files. The initial implementation centers around manual entry of description data, however, in future work we plan to explore the integration of automatic and semi-automatic feature extraction methods with the goal of providing a complete system for MPEG-7 multimedia content annotation and query instruction.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The ongoing MPEG-7 standardization activity aims at creating a standard for describing multimedia content in order to facilitate the interpretation of the associated information content. Attempting to address a broad range of applications, MPEG-7 has defined a flexible framework consisting of Descriptors, Description Schemes, and Description Definition Language. Descriptors and Description Schemes describe features, structure and semantics of multimedia objects. They are written in the Description Definition Language (DDL). In the most recent revision, DDL applies XML (Extensible Markup Language) Schema with MPEG-7 extensions. DDL has constructs that support inclusion, inheritance, reference, enumeration, choice, sequence, and abstract type of Description Schemes and Descriptors. In order to enable multimedia systems to use MPEG-7, a number of important problems in storing, retrieving and searching MPEG-7 documents need to be solved. This paper reports on initial finding on issues and solutions of storing and accessing MPEG-7 documents. In particular, we discuss the benefits of using a virtual document management framework based on XML Access Server (XAS) in order to bridge the MPEG-7 multimedia applications and database systems. The need arises partly because MPEG-7 descriptions need customized storage schema, indexing and search engines. We also discuss issues arising in managing dependence and cross-description scheme search.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present a texture descriptor for multimedia contents description in MPEG-7. The current MPEG-7 candidate for the texture descriptor has been designed to be suitable for the human visual system (HVS). In this paper, the texture is described using perceptual channels that are bands in spatial frequency. Further, the MPEG-7 texture description method has employed Radon Transformation that is suitable for HVS behavior. By taking average energy and energy deviation of the HVS channels, the texture descriptor is generated. To verify the performance of the texture descriptor, experiments with the MPEG-7 database are performed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the last decade several methods for low level indexing of visual features appeared. Most often these were evaluated with respect to their discrimination power using measures like precision and recall. Accordingly, the targeted application was indexing of visual data within databases. During the standardization process of MPEG-7 the view on indexing of visual data changed, taking also communication aspects into account where coding efficiency is important. Even if the descriptors used for indexing are small compared to the size of images, it is recognized that there can be several descriptors linked to an image, characterizing different features and regions. Beside the importance of a small memory footprint for the transmission of the descriptor and the memory footprint in a database, eventually the search and filtering can be sped up by reducing the dimensionality of the descriptor if the metric of the matching can be adjusted. Based on a polygon shape descriptor presented for MPEG-7 this paper compares the discrimination power versus memory consumption of the descriptor. Different methods based on quantization are presented and their effect on the retrieval performance are measured. Finally an optimized computation of the descriptor is presented.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose a multimedia indexing/retrieval system which can provide content-based retrieval based on MPEG-7 description. In this paper, the database is built so that MPEG-7 descriptor and descriptor cheme are hid into video sequence using water marking technique. For a query by image example, similar images are extracted from the database by referring the hidden MPEG-7 metadata. To verify the effectiveness and efficiency of the proposed database, experiments with MPEG-7 texture descriptor were performed. Experimental results showed the proposed database gave fast and effective contents based retrieval.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we propose a conceptual model for describing the network conditions under which the multimedia data is being transmitted, as well as the modality of the device which receives the data. We present an architecture for adaptation of multimedia data to such wireless network conditions and device capabilities, under constraints imposed by user preferences and multimedia content, to ensure effective, meaningful, and acceptable delivery of video data to mobile users. The adaptability is achieved through careful application of a combination of off-line and on-line reductions to the video streams. In doing so, we make use of the concept of descriptor schemes for describing the content of video data, and its lendability to different kinds of reductions [1].
The architecture consists of an MPEG-7 server in the fixed network, and an MPEG-7 player on the mobile host. In addition, an encoder/decoder layer is used to perform physical frame bit reductions on transcoded video frames. The server also maintains index structures and a programmable two-dimensional matrix of reductions. An offline packager module is included with the server, which packages all the MPEG-7 data with the video when the video is first registered with the database.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Given a video stream encoded with a particular bit rate and compression scheme (C1), transcoding the video stream refers to the process of re-encoding the input compressed stream at a different bit rate and possibly with different spatial resolution, temporal resolution and compression scheme (C2). In this paper the special case of a transcoder which has a 4:2:2 format video stream as it's input and produces a 4:2:0 output stream at a lower bit rate is considered. Further more, it is assumed that both C1 and C2, like nearly all standard video compression algorithms, use 8 x 8 block DCT based compression. In this context, it will be shown how standard implementations of transcoders, as proposed by Ghanbari and others may be simplified to obtain a reduction in the transcoder implementation complexity. In particular the problem of simplifying the process of down conversion from 4:2:2 to 4:2:0, so that this process can take place directly in the DCT domain, will be addressed. The MPEG-2 video standard treats interlaced and progressive video differently for compression purposes; the proposed scheme is shown to be applicable to both cases. Simulation results show that this simplification does not result in performance degradation when compared with standard, high complexity implementations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
While in the area of relational databases interoperability is ensured by common communication protocols (e.g. ODBC/JDBC using SQL), Content Based Image Retrieval Systems (CBIRS) and other multimedia retrieval systems are lacking both a common query language and a common communication protocol. Besides its obvious short term convenience, interoperability of systems is crucial for the exchange and analysis of user data. In this paper, we present and describe an extensible XML-based query markup language, called MRML (Multimedia Retrieval markup Language). MRML is primarily designed so as to ensure interoperability between different content-based multimedia retrieval systems. Further, MRML allows researchers to preserve their freedom in extending their system as needed. MRML encapsulates multimedia queries in a way that enable multimedia (MM) query languages, MM content descriptions, MM query engines, and MM user interfaces to grow independently from each other, reaching a maximum of interoperability while ensuring a maximum of freedom for the developer. For benefitting from this, only a few simple design principles have to be respected when extending MRML for one's f\private needs. The design of extensions withing the MRML framework will be described in detail in the paper. MRML has been implemented and tested for the CBIRS Viper, using the user interface Snake Charmer. Both are part of the GNU project and can be downloaded at our site.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Given a high resolution compressed video stream this paper looks at the problem of extracting video from a small window at an arbitrary, but static location, in compressed form, without having to either decode the high resolution video stream or encode the window stream. Since state of the art video compression algorithms use motion compensation in one form or the other, the compressed video stream cannot be parsed to obtain the window stream without decoding the high resolution stream to some extent and subsequent re-encoding. On the other hand storing all possible windows as separate streams on the server is clearly infeasible, at least as of today, as it would lead to an explosion of the storage space requirement. In this paper, it will be shown that with a comparatively small increment in the server's storage requirement, it is possible to obtain compressed window streams out of the high resolution stream with minimal parsing, no decoding or re-encoding and little quality loss. Furthermore, ways to trade off the space required on the server with the compression efficiency of the window stream will be shown. Although the main emphasis of this paper will be on showing why and how this windowing problem is analytically feasible, results obtained from windowing into a standard definition, MPEG-2 compressed video stream using one of the suggested implementations will be shown to prove feasibility.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the rapid development of computer network, there appear so many new applications and multimedia data that the chasm, which between the singularity of data sources and the variation of network, terminal devices and users, grows wider and wider. In this paper, we propose a novel algorithm, known as Adaptive Multimedia Transport Model (AMTM), and implement an adaptive supporting platform for multimedia delivery, which can dynamically transcode the multimedia data to accustom them to the variation as said above without data redundancy, so it will become more convenient to develop new applications with the automatic adaptability.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper describes a relational graph matching with model-based segmentation for human detection. The matching result is used for the decision of human presence in the image as well as for posture recognition. We extend our previous work for rigid object detection in still images and video frames by modeling parts with superellipses and by using multi-dimensional Bayes classification in order to determine the non-rigid body parts under the assumption that the unary and binary (relational) features belonging to the corresponding parts are Gaussian distributed. The major contribution of the proposed method is to create automatically semantic segments from the combination of low level edge or region based segments using model-based segmentation. The generality of the reference model part attributes allows detection of human with different postures while the conditional rule generation decreases the rate of false alarms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, a 3D semantic object motion tracking method based on Kalman filtering is proposed. First, we use a specially designed Color Image Segmentation Editor (CISE) to devise shapes that more accurately describe the object to be tracked. CISE is an integration of edge and region detection, which is based on edge-linking, split-and-merge and the energy minimization for active contour detection. An ROI is further segmented into single motion blobs by considering the constancy of the motion parameters in each blob. Over short time intervals, each blob can be tracked separately and, over longer times, the blobs can be allowed to fragment and coalesce into new blobs as motion evolves. The tracking of each blob is based on a Kalman filter derived from linearization of a constraint equation satisfied by the pinhole model of a camera. The Kalman filter allows the tracker to project the uncertainties associated with a blob center (or with the coordinates of any other features) into the next frame. This projected uncertainty region can then be searched rot eh pixels belonging to the blob. Future work includes investigation of the effects of illumination changes and simultaneous tracking of multiple targets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Indexing, retrieval and delivery of visual and spatio-temporal properties of video objects requires efficient data models and sound operations on the model are mandatory. However, most object-based video data models address only a single aspect of those properties. In this paper, we present an efficient video object representation method that captures the visual, spatial and temporal properties of objects in a video in the form of a unified abstracted data type. The proposed data type is a polygon mesh, named video object mesh, which is defined in a spatio-temporal domain. Based on the application needs, a contour of an object is modeled with a polygonal contour. With the contour and color information of the object, content-based triangularization is performed. A video object in a frame is modeled with two dimensional-polygon mesh. Each vertex in the mesh, color information is embedded for further use. By using motion analysis, a corresponding vertex in the adjacent frame is identified connected to the vertex that is being analyzed. These processes are continued until a video object disappears. The result of these processes is a three dimensional polygon mesh hat models location variant motion and location invariant motion that can not be captured by traditional trajectory based motion model. The proposed model is also useful camera motion analysis. Since a surface shape of a video object mesh has partial information of camera motion.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Retrieval of videos from large databases using the inherent content as a key, is an important and challenging problem with many applications. The large volume of data associated with visual information presents challenges from the perspectives of storage, browsing, indexing and retrieval. The Moving Pictures Expert Group (MPEG) has addressed the issue of compression by proposing a family of video compression standards, namely MPEG 1, MPEG 2 and MPEG 4. In this paper, we propose a unified scheme for indexing the visual content in the MPEG 1, 2, and 4 domains. A video is first segmented into elemental units called shots. In the case of MPEG 1 and 2 videos containing simple camera operations (without significant object motion), we propose to generate a mosaic which is representative of the visual content of the entire shot in contrast to existing approaches where videos with both little and large motion employ one of the frames (say the first frame) of the shot as a representative key frame. In the case of MPEG 4 videos, sprites (proposed by the MPEG 4 standard) are used as the mosaic reflecting the background content of the shot. We propose a scheme for indexing the visual content by extracting features from the mosaic/key frame, which is tagged along with the temporal parameters obtained from the shot. The quantification and qualification of the color and texture information in the keyframes are obtained by using a supervised classifier. The shape information is extracted at the local and global levels using the concept of edge histograms. In addition, the shape information available from the binary alpha planes of the foreground video object in MPEG 4 is approximated by a B-Spline representation and used as a feature vector. A representation scheme has been developed which generates an XML file that contains the extracted content descriptors in accordance with the Data Description Language (DDL) of MPEG 7.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Tracking of moving objects is one of the application techniques with complex processing for understanding input images. In this paper, we have considered optical flow which is one of moving object tracking algorithms. We proposed a new method using the Combinatorial Hough Transform (CHT) and Voting Accumulation in order to find optimal constraint lines. Also, we used the logical operation in order to release the operation time. The proposed method can extract the optical flow of the moving object. Then, the moving information was computed using the extracted optical flow. We have simulated the proposed method using test images including the noise.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this work we focus on the indexed triangle strips that is an extended representation of triangle strips to improve the efficiency for geometrical transformation of vertices, and present a method to construct optimum indexed triangle strips using Genetic Algorithm (GA) for real-time visualization. The main objective of this work is how to optimally construct indexed triangle strips by improving the ratio that reuses the data stored in the cash memory and simultaneously reducing the total index numbers with GA. Simulation results verify that the average index numbers and cache miss ratio per polygon cold be small, and consequently the total visualization time required for the optimum solution obtained by this scheme could be remarkably reduced.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we present e-Clips, a framework for the evaluation of content-based indexing and retrieval techniques applied to music video clips. The e-Clips framework integrates different video and audio feature extraction tools, whether automatic or manual. Its goal is to compare the relevance of each type of feature for providing a structured index that can be browsed, finding similar videos, retrieving videos that correspond to a query, and pushing music videos to the user according to his preferences. Currently, over 100 distinct music video clips have been indexed. For each video, shot boundaries were detected and key frames were extracted from each shot. Each key frame image was segmented into visual objects. The sound track was analyzed for basic features. Textual data, such as a song title and its performer was added by hand. The e-Clips framework is based on a client-server architecture that can stream VHS-quality video through an 100 Mbs Intranet. It should help evaluate the relevance of the descriptors generated by content-based indexing tools and suggest appropriate graphical user interfaces for non-specialist end users.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Detecting and extracting commercial breaks from a TV program is important for achieving efficient video storage and transmission. In this work , we approach this problem by utilizing both visual and audio information. Commercial breaks have several special characteristics such as a restricted temporal length, a high cut frequency, a high level of actions, delimiting black frames and silences, etc, which can be used for their separation from regular TV programs. A feature-based commercial break detection system is thus proposed to fulfill this task. We first perform a coarse-level detection of commercial breaks with pure visual information, since the high activity and the high cut frequency will somehow manifest themselves in the statistics of some measurable features. At the second step, we proceed to refine detected break boundaries by integrating audio clues. That is, there is always a short period of silence between commercial breaks and the TV program. Two audio features, i.e. the short- time energy and short-time average zero-crossing rate, are extracted for the silence detection purpose. At the last step, we return to the visual information domain again to achieve a frame-wise precision by locating the black frames. Extensive experiments show that by combining both visual and audio information, we can obtain accurate commercial break results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Buffering the index structures is an important problem, because disk I/O dominates the cost of queries. In this paper, we compare existing algorithms for uniform, nonuniform static and nonuniform dynamic access patterns. We experimentally show that the LRU-2 method is better than the other methods. We also propose an efficient implementation of the LRU-2 algorithm. In the second part of the paper, we propose a new buffering algorithm for a distributed system where each machine has its own buffer. We show experimentally that this method performs better than other buffering techniques.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The problem of indexing high-dimensional data has received renewed interest because of its necessity in emerging multimedia databases. The limitations of traditional tree-based indexing and the dimensionality curse is well known. We have proposed a hierarchical indexing structure based on linear mapping functions, which is not tree-based, and not necessarily balanced. At each level of the hierarchy, a linear mapping function is used to distribute the data among buckets. The feasibility and the performance of the indexing structure is dependent on finding appropriate mapping functions for any given data set. In this paper we present the approach taken in arriving at a few classes of mapping functions. We have given a heuristic algorithm to determine the choice of the most appropriate mapping function for a given data set. The results of experiments with real life data are presented and they indicate that the proposed indexing structure with linear mapping functions is indeed practical.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose the computation of the color palette of each image in isolation, using Vector Quantization methods. The image features are, then, the color palette and the histogram of the color quantization of the image with this color palette. We propose as a measure of similitude the weighted sum of the differences between the color palettes and the corresponding histograms. This approach allows the increase of the database without the recomputation of the image features and without substantial loss of discriminative power.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The recent literature has shown that the principal difficulty in multimedia retrieval is the bridging of the semantic gap between the user's wishes and his ability to formulate queries. This insight has spawned two main directions of research Query By Example (QBE) with relevance feedback (i.e. learning to improve the result of a previously formulated query) and the research in query formulation techniques, like browsing or query by sketch. Browsing techniques try to help the user in finding his target image, or an image which is sufficiently close to the desired result that it can be used in a subsequent QBE query. From the feature space viewpoint, each browsing system tries to permit the user to move consciously in feature space and eventually reach the target image. How to provide this functionality to the user is presently an open question. In fact even obtaining objective performance evaluation and comparison of these browsing paradigms is difficult. We distinguish between deterministic browsers, which try to optimize the possibility for the user to learn how the system behaves, and stochastic browsers based on more sophisticated Monte-Carlo algorithms thus sacrificing reproducibility to a better performance. Presently, these two browsing paradigms are practically incomparable, except by large scale user studies. This makes it infeasible for research groups to evaluate incremental improvement of browsing schemes. Moreover, automated benchmarks in the current literature simulate a user by a model derived directly from the distance measures used within the tested systems. Such a circular reference cannot provide a serious alternative to real user tests. In this paper, we present an automatic benchmark which uses user- annotated collections for simulating the semantic gap, thus providing a means for automatic evaluation and comparison of the different browsing paradigms. We use a very precise annotation of few words together with a thesaurus to provide sufficiently smooth behavior of the annotation-based user model. We discuss the design and evaluation of this annotation as well as the implementation of the benchmark in n MRML-compliant script with pluggable modules which allow testing of new interaction schemes (Multimedia Retrieval Markup Language).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The application of images and video has increased significantly in recent years. It is crucial to develop indexing techniques for searching images and video based on their content. Recently, several indexing techniques have been proposed in both pixel and compressed domain. Due to its lower computational complexity, compressed domain indexing techniques are becoming popular. Among the compression techniques, discrete-wavelet-transform based techniques have become popular because of its excellent energy compaction and multi-resolution capability. The upcoming JPEG2000 image compression standard is also based on a wavelet coder. In this paper, a progressive bit-plane indexing scheme in the JPEG2000 framework is proposed. Here, a 2D significant0bit- map array and a 2D histogram of significant bits of wavelet coefficients are used as the image indices. Image retrieval is performed by matching the index of the query and candidate images from the database. Experimental results show that the proposed scheme provides a good indexing performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we present a novel approach for describing and estimating similarity of shapes. The target application is content-based indexing and retrieval over large image databases. The shape feature vector is based on the efficient indexing of high curvature (HCP) points which are detected at different levels of resolution of the wavelet transform modulus maxima decomposition. The scale information, together with other topological information of those high curvature points are employed in a sophisticated similarity algorithm. The experimental results and comparisons show that the technique isolates efficiently similar shapes from a large database and reflects adequately the human similarity perception. The proposed algorithm also proved efficient in matching heavily occluded contours with their originals and with other shape contours in the database containing similar portions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A novel technique for automatic analysis and classification of cells in peripheral blood images is presented. The purposes of this research are to analyze and classify morphological shapes of mature red-blood cells and white-blood cells in peripheral blood images. We first, identify red-blood cells and white-blood cells in a blood image captured from CCD camera attached to microscope. Feature extraction is the second step. Finally blood cells are classified using back propagation neural network. Fifteen different classification clusters including normal cells are in red blood cell. However, there are five different normal categories in discrimination of white blood cells. In other words, the system can tell whether a I've white cell belongs to one of the five normal classes or not. A novel segmentation method is presented for extraction of nucleus and cytoplasm which inherently posses valuable clues in white blood cell classification. Initially, seventy-six dimensions of a feature vector that includes UNL Fourier descriptor shape, and color are considered in red-blood cell classification. While 38 dimensions of a feature vector are considered in red blood cell classification. Based on the proposed method, a prototype system has implemented and evaluated with various classification algorithms such as LVQ-3 (Learning Vector Quantization) and K-NN (K- nearest neighbor). The experiment results show that the proposed method out performs on blood cell classification compared with other alternatives.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This article presents the results of a study on spatio-temporal images to evaluate their performances for video-to-shots segmentation purposes. Some shots segmentation methods involve spatio-temporal images that are computed by a projection of successive video frames over the X or Y-axis. On these projections, transition effects and motion are supposed to have different characteristics. Whereas cuts can be easily recognized, the main problem remains in determining a measure that discriminates motions from gradual transition effects. In this article, the quality of transition detections based on line similarity of spatio-temporal images is studied. The probability functions of several measures are estimated to determine which one produce the lowest detection error rate. These distributions are computed on four classes of events: intra shot sequences without motion, sequences with cuts, sequences with fades and sequences with motion. A line matching is performed, based on correlation estimations between projection lines. To separate these classes, we estimate first the density probability functions of the correlation between consecutive lines for each class. For different line segment sizes, the experimental results prove that the class separation can not be clearly produced. To take into account the evolution of the correlation and because we try to detect some particular types of boundaries, we then consider ratios between statistic moments. There are computed over a subset of correlation values. The results show that used measures, based on the matching of projection lines, can not discriminate between motion and fade. Only a subset of motions will be differentiated from gradual transitions. Therefore previous measures should be combined with methods that produce complementary results. Such a method could be a similar measure based on correlation between spatial-shifted segments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
An XML-based application was developed, allowing to access multimedia/radiological data over a network and to visualize them in an integrated way within a standard web browser. Four types of data are considered: radiological images, the corresponding speech and text files produced by the radiologist, and administrative data concerning the study (patient name, radiologist's name, date, etc.). Although these different types of data are typically stored on different file systems, their relationship (e.g., image file X corresponds to speech file Y) is described in a global relational database. The administrative data are referred to in an XML file, while links between the corresponding images, speech, and text files (e.g., links between limited text fragments within the text tile, the corresponding fragment in the speech file, and the corresponding subset of images) are described as well. Users are able to access all data through a web browser by submitting a form-based request to the server. By using scripting technology, a HTML document containing all data is produced on the fly, which can be presented within the browser of the user. Our application was tested for a real set of clinical data, and it was proven that the goals that were defined above are realized.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The emerging Multimedia Content Description Interface standard, MPEG-7, looks at the indexing and retrieval of visual information. In this context the development of shape description and shape querying tools become a fundamental and challenging task. We introduce a method based on non-linear diffusion of contours. The aim is to compute reference points in contours to provide a shape description tool. This reference points will be situated in the sharpest changes in the contour direction. Hence, they provide ideal choices to use as vertices of a polygonal approximation. If a maximum error between the original contour and the polygonal approximation is required, a scale-space procedure can help to find new vertices in order to meet this requirement. Basically, this method follows the non-linear diffusion technique of Perona and Malik. Unlike the usually linear diffusion techniques of contours, where the diffusion is made through the contour points coordinates, this method applies the diffusion in the tangent space. In this case the contour is described by the angle variation, and the non-linear diffusion procedure is applied on it. Perona and Malik model determines how strong diffusion will act on the original function, and depends of a factor K, estimated automatically. In areas with spatial concentration of strong changes of the angle this factor is also adjusted to reduce the noise effect. The proposed method has been extensively tested using the data- base contour of fish shapes in SQUID web site. A shape-based retrieval application was also tested using a similarity measure between two polygonal approximations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Stream schedule used in Video-On-Demand (VOD) system can largely enhance system service ability by eliminating the service latency. In this paper, we propose a novel stream schedule called PeriodPatch. On the basis of Patching, we introduce a PERIOD rule that the movie time in the main stream should be a linear function of system time, so that multicast streams are created regularly and fewer multicast streams are needed for the True Video-On-Demand (TVOD) service. Furthermore, PeriodPatch schedule ensures that the system can provide Near Video-On-Demand (NVOD) service with the predictive and acceptable latency to client if resources are exhausted out. To assess the benefit of our schedule, we perform simulation to compare the performance of PeriodPatch with that of FIFO and Patching schedule. Results show that PeriodPatch is more efficient than other schedules, with respects to both system resources required for TVOD service and average client waiting time (service latency). In our case, PeriodPatch only uses 38% streams of FIFO or 50% of Patching schedule to provide the same TVOD service. Results also show that PeriodPatch is not a buffer-related schedule unlike Patching and other schedules. Our simulation proves that the suitable size of client buffer can achieve better performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Video-On-Demand is a new development on the Internet. In order to manage the rich multimedia information and the large number of users, we present an Internet Video-On-Demand system with some E- Commerce flavors. This paper presents the system architecture and technologies required in the implementation. It provides interactive Video-On-Demand services in which the user has a complete control over the session presentation. It allows the user to select and receive specific video information by retrieving the database. For improving the performance of video information retrieval and management, the video information is represented by hierarchical video metadata in XML format. Video metadatabase stored the video information in this hierarchical structure and allows user to search the video shots at different semantic levels in the database. To browse the searched video, the user not only has full-function VCR capabilities as the traditional Video-On-Demand, but also can browse the video in a hierarchical method to view different shots. In order to perform management of large number of users over the Internet, a membership database designed and managed in an E-Commerce environment, which allows the user to access the video database based on different access levels.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Various relevance feedback techniques have been applied in content-based image retrieval. However, many are either heuristics-based, or computationally too expensive to be implemented in real-time, or limited to deal with only positive examples. We propose a fast and optimal linear relevance feedback scheme that takes both positive and negative examples from the user. This scheme can be regarded as a generalization of discriminant analysis on one hang, and on the other hand, it is also a generalization of an existing optimal scheme that takes only positive examples. We first define biased classification problem for the case where the data samples are labeled as positive or negative as to whether belonging to the target class (the biased class) or not; then biased discriminant analysis (BDA) is proposed as an optimal linear solution for dimensionality reduction. We also propose biased whitening transformation on the data when Euclidean distance is applied afterwards. Toy problems are designed to show the theoretical advantages of the proposed scheme over traditional discriminant analysis. It is implemented in real-time image retrieval for large databases and experimental results are presented to show the improvement achieved by the new scheme.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The query by example model has been extensively used to retrieve similar images in content-based image database management. The query is characterized by searching images with feature vectors similar to those of the example based upon either a default of a user-defined similarity metric. However, low level features often encounter a severe performance bottleneck as applied to natural image collections with complicated contents and great perceptual varieties. The feature-based similarity matching approach tends to retrieve many irrelevant images. This is not surprising since images different in semantic meanings but close enough in low level features can be returned as pertinent result. Such a query process lacks user involvement and therefore results in a gap between features and semantics.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Tarsys is a video archive system which combines the flexible organization of multimedia databases, the efficiency of real-time filesystems and the scalability of tertiary storage (magnetic tape libraries and optical jukeboxes). Heavy data transfers over the network are usual between video servers and their clients. Tarsys reduces network traffic through the use of a remote manipulation protocol, so that only required fragments of multimedia data are transferred. Tarsys provides a suitable platform for the automatic extraction of content information from multimedia data. It also provides management of content based queries and efficient access to video fragments found by queries. These facilities in the access to archived videos make it ideal for large TV digital archives and scientific databases where is constitutes a platform for quick development of custom video analysis.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Description Methodologies for multimedia content are gaining in interest. MPEG-7, which has been authorized by the International Organization for Standardization (ISO) is specifically targeted for such content. We developed an effective index structure for multimedia content descriptions such as MPEG-7. We developed a useful indexing method, called the position locator index that searches for the location of the description part in the description. Still image and full text search engines and most suitable feature indexes can be used without modification. Since these indexes are optimized for each requirement, they provide high retrieval performance. We developed an experimental retrieval system that combined these feature indexes and the position locator index. We evaluated the performance of the retrieval by using a linear-tracing search and the position index for the content descriptions. The result was satisfactory and retrieval speed was 200 times faster than linear tracing the descriptions, and the index size could be kept at a minimum.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image compression is an important area of multimedia investigation and neural network methods have attracted more and more attentions for using in image coding. Recently a random neural network model, which has the solutions with product form in steady state (i.e. the steady state probability distribution of network can always be expressed as the product of the probabilities of the states of each neuron) on some conditions, was brought forward. Among the diverse random neural network models, the feed-forward one is very practicable because its solutions exist and are unique. In this paper, a new learning method for feed-forward random neural network, which can be implemented easier than the learning algorithm of the RNN presented by Gelenbe, was presented. Using the new learning formulas we developed, we designed a new image coding method, which applies the random neural network method in classical DCT- based coding framework. The experimental results show that our new method could gain a lot in PSNR (1 approximately 2dB) compared with standard neural network coding methods. In conclusion, we stated that the DCT-based image compression method using random neural network is an efficient algorithm for image coding.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Still image coding techniques such as JPEG have been always applied onto intra-plane images. Coding fidelity is always utilized in measuring the performance of intra-plane coding methods. In many imaging applications, it is more and more necessary to deal with multi-spectral images, such as the color images. In this paper, a novel approach to multi-spectral image compression is proposed by using transformations among planes for further compression of spectral planes. Moreover, a mechanism of introducing human visual system to the transformation is provided for exploiting the psycho visual redundancy. The new technique for multi-spectral image compression, which is designed to be compatible with the JPEG standard, is demonstrated on extracting correlation among planes based on human visual system. A high measure of compactness in the data representation and compression can be seen with the power of the scheme taken into account.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The evolution of Television towards the digital domain is opening new opportunities but also new challenges both to users and system managers. Audiovisual television archives will be an essential component of the whole digital television operators systems, as archived information needs to be available to a wide range of users. This paper presents the work developed at INESC Porto within the VIDION project and the experiments on merging television, computer and telecommunications concepts and technologies by the use of software agents and CORBA to assist in solving problems of information and system configuration and management in a TV archive. Aspects such as definition of the problem, architecture proposed and current state of the work will be the focus of the paper.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Increasing congestion on roads and highways, and the problems associated with conventional traffic monitoring systems have generated an interest in new traffic surveillance systems, such as video image processing. These systems are expected to be more effective and more economical than conventional surveillance systems. In this paper, we describe the design of a traffic surveillance system, called Multimedia traffic Monitoring System. The system is based on a client/server model, with the following main modules: 1) video image capture module (VICM), 2) video image processing module (VIPM), and 3) database module (DBM). The VICM is used to capture the live feed from a digital camera. Depending on the mode of operation, VICM either: 1) sends the video images directly to the VIPM (on the same processing node), or 2) compresses the video images and sends them to the VIPM and/or the DBM on separate processing node(s). The main contribution of this paper is the design of a traffic monitoring system that uses image processing (VIPM) to estimate traffic flow. In the current implementation, VIPM estimates the number of vehicles per kilometer, while using 9 image sequences (at a rate of 4 frames per second). The VIPM algorithm generates a virtual grid and superimposes it on a part of the traffic scene. Motion and vehicle detection operators are carried out within each cell in the grid. Vehicle count is concluded based on the nine images of a sequence. The system is tested against a manual count of more than 40 image sequences (total of more than 365 traffic images) of various traffic situations. The results show that the system is able to determine the traffic flow with a precision of 1.5 vehicles per kilometer.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the near future broadband networks will become available to large groups of people. The amount of bandwidth available to these users in the future will be much more than it is now. The availability of bandwidth will give birth to a number of new applications. Application developers will need a framework that enable them to utilize the possibilities of these new networks. In this article we present a document type that will allow the addition of (meta-) information to data streams and the synchronization of a different data streams. It is called SXML (Streaming XML) and is based on the eXtensible Markup Language (XML). The SXML grammar is defined in a document type definition (SXML-DTD). The content of an SXML document can be processed real time or can be retrieved from disk. XML is being used in a complete new manner and in a totally different environment in order to easily describe the structure of the stream. Finally, a preliminary implementation has been developed and is being tested.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A common form of queries encountered in Content-Based Image Retrieval (CBIR) systems, such as QBIC and Virage, are query by example image (QBE). We encountered this problem in our implementation of a prototype CBIR system, called CHITRA. This system supports four layer image data model and enable high level concept definition such as SUNSET. Users can pose queries of the form retrieve all images that have SUNSET and MOUNTAINS. We are addressing the problem of processing queries of the form retrieve all images similar to I1, I2,..., In based on color. We refer to such queries as SF-QBME. Essentially the same problem is encountered in processing high level concept queries such as SUNSET and MOUNTAINS above. Processing of such queries has received some recent research attention. Processing SF-QBME queries involves dealing with multiple points in a single feature space. We first provide the motivation for use of such queries in the context of similarity based retrieval. We then define the exact low level semantics of such queries, and provide the corresponding processing strategies. The experimental performance results that demonstrate the capability of SF-QBME are also presented.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.