The paper presents the Argos evaluation campaign of video content analysis tools supported by the French Techno-
Vision program. This project aims at developing the resources of a benchmark of content analysis methods and
algorithms. The paper describes the type of the evaluated tasks, the way the content set has been produced, metrics and
tools developed for the evaluations and results obtained at the end of the first phase.
This paper deals with the proposition of a model for human motion analysis in a video. Its main caracteristic is to adapt itself automatically to the current resolution, the actual quality of the picture, or the level of precision required by a given application, due to its possible decomposition into several hierarchical levels. The model is region-based to address some analysis processing needs. The top level of the model is only defined with 5 ribbons, which can be cut into sub-ribbons regarding to a given (or an expected) level of details. Matching process between model and current picture consists in the comparison of extracted subject shape with a graphical rendering of the model built on the base of some computed parameters. The comparison is processed by using a chamfer matching algorithm. In our developments, we intend to realize a platform of interaction between a dancer and tools synthetizing abstract motion pictures and music in the conditions of a real-time dialogue between a human and a computer. In consequence, we use this model in a perspective of motion description instead of motion recognition: no a priori gestures are supposed to be recognized as far as no a priori
application is specially targeted. The resulting description will be made following a Description Scheme compliant with the movement notation called "Labanotation".
This article presents a Description Scheme (DS) to describe the audio-visual documents from the video editing work point of view. This DS is based on edition techniques used in the video edition domain. The main objective of this DS is to provide a complete, modular and extensible description of the structure of the video documents based on editing process. This VideoEditing DS is generic in the sense that it may be used in a large number of applications such as video document indexing and analysis, description of Edit Decision List and elaboration of editing patterns. It is based on accurate and complete definitions of shots and transition effects required for video document analysis applications. The VideoEditing DS allows three levels of description : analytic, synthetic and semantic. In the DS, the higher (resp. the lower) is the element of description, the more analytic (resp. synthetic) is the information. %Phil This DS allows describing the editing work made by editing boards, using more detailed descriptors of Shots and Transition DSs. These elements are provided to define editing patterns that allow several possible reconstructions of movies depending on, for example, the target audience. A part of the video description made with this DS may be automatically produced by the video to shots segmentation algorithms (analytic DSs ) or by editing software, at the same time the edition work is made.
This DS gives an answer to the needs related to the exchange of editing work descriptions between editing softwares. At the same time, the same DS provide an analytic description of editing work which is complementary to existing standards for Edit Decision Lists like SMPTE or AAF.
In this paper, we present e-Clips, a framework for the evaluation of content-based indexing and retrieval techniques applied to music video clips. The e-Clips framework integrates different video and audio feature extraction tools, whether automatic or manual. Its goal is to compare the relevance of each type of feature for providing a structured index that can be browsed, finding similar videos, retrieving videos that correspond to a query, and pushing music videos to the user according to his preferences. Currently, over 100 distinct music video clips have been indexed. For each video, shot boundaries were detected and key frames were extracted from each shot. Each key frame image was segmented into visual objects. The sound track was analyzed for basic features. Textual data, such as a song title and its performer was added by hand. The e-Clips framework is based on a client-server architecture that can stream VHS-quality video through an 100 Mbs Intranet. It should help evaluate the relevance of the descriptors generated by content-based indexing tools and suggest appropriate graphical user interfaces for non-specialist end users.
This article presents the results of a study on spatio-temporal images to evaluate their performances for video-to-shots segmentation purposes. Some shots segmentation methods involve spatio-temporal images that are computed by a projection of successive video frames over the X or Y-axis. On these projections, transition effects and motion are supposed to have different characteristics. Whereas cuts can be easily recognized, the main problem remains in determining a measure that discriminates motions from gradual transition effects. In this article, the quality of transition detections based on line similarity of spatio-temporal images is studied. The probability functions of several measures are estimated to determine which one produce the lowest detection error rate. These distributions are computed on four classes of events: intra shot sequences without motion, sequences with cuts, sequences with fades and sequences with motion. A line matching is performed, based on correlation estimations between projection lines. To separate these classes, we estimate first the density probability functions of the correlation between consecutive lines for each class. For different line segment sizes, the experimental results prove that the class separation can not be clearly produced. To take into account the evolution of the correlation and because we try to detect some particular types of boundaries, we then consider ratios between statistic moments. There are computed over a subset of correlation values. The results show that used measures, based on the matching of projection lines, can not discriminate between motion and fade. Only a subset of motions will be differentiated from gradual transitions. Therefore previous measures should be combined with methods that produce complementary results. Such a method could be a similar measure based on correlation between spatial-shifted segments.
The audiovisual library of the future will be based on computerized access to digitized documents. In this communication, we address the user interface issues which will arise from this new situation. One cannot simply transfer a user interface designed for the piece by piece production of some audiovisual presentation and make it a tool for accessing full-length movies in an electronic library. One cannot take a digital sound editing tool and propose it as a means to listen to a musical recording. In our opinion, when computers are used as mediations to existing contents, document representation-based user interfaces are needed. With such user interfaces, a structured visual representation of the document contents is presented to the user, who can then manipulate it to control perception and analysis of these contents. In order to build such manipulable visual representations of audiovisual documents, one needs to automatically extract structural information from the documents contents. In this communication, we describe possible visual interfaces for various temporal media, and we propose methods for the economically feasible large scale processing of documents. The work presented is sponsored by the Bibliotheque Nationale de France: it is part of the program aiming at developing for image and sound documents an experimental counterpart to the digitized text reading workstation of this library.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.