IBM Multimedia AnaLYSIS AND RETRIEVAL
IBM T. J. Watson Research Center
User Guide
http://www.alphaworks.ibm.com/tech/imars
IBM MULTIMEDIA ANALYSIS AND RETRIEVAL SYSTEM
User and Installation Guide
ã IBM T. J. Watson Research Center
19 Skyline Drive • Hawthorne, NY 10532 USA
Phone 914.784.7320
Contact:
John R. Smith (jrsmith@watson.ibm.com)
Table of Contents
Optional Software Installation
3. Collection Searching and Tagging
Introduction
IBM Multimedia Analysis and Retrieval System (IMARS) is an automated content indexing and multimodal search system for digital image and video collections.
I |
BM® Multimedia Analysis and Retrieval System addresses the problem of indexing, classifying, and performing queries on large volumes of images and videos. It can visually analyze images and videos, categorize them based on appearance and associated metadata, and make them searchable even without having any associated text metadata.
The IBM Multimedia Analysis and Retrieval System automatically indexes unlabeled image and video repositories with a set of classifiers, using machine learning approaches. The IBM Multimedia Analysis and Retrieval System provides a number of important multimedia browsing and searching functions in its dynamic search interface, which are based on content (such as color, texture, shape, and edges), content clusters, models (such as scenes, objects, and events), and text (such as speech, closed captions, textual metadata, and user-associated tags). Users can query image and video repositories using concept models, visual feature descriptors, extracted metadata or any combination of them, according to the needs of a particular query. Furthermore, the user can tag individual video shots or meaningful groups of shots created from the fused multi-modal index.
Existing solutions require users to describe visual content manually. Manual tagging is time-consuming and often subjective, leading to incomplete and inconsistent annotations of images and videos and thus preventing efficient cataloging of the growing deluge of multimedia data. The IBM Multimedia Analysis and Retrieval System technology is unique in its approach to analyzing and fusing visual information with text, speech transcript and various metadata in order to automatically annotate multimedia data. This tool is being developed at IBM Research.
The IBM Multimedia Analysis and Retrieval System consists of two components: the Indexing Tool and the Search Tool. When you click on IMARS.exe, the Indexing Tool is invoked. Once the indexing is complete, the Search Tool can be accessed through a web browser. You cannot run the Search Tool without having run the Indexing Tool at least once.
The Indexing Tool runs on Windows XP.
- The user needs to have a Web browser installed (i.e. Internet Explorer or Mozilla Firefox) to be able to access the Search Tool.
- HTTP Web Server - This installation is required if the machine does not have a HTTP web server previously installed. Afterwards, the user will need to modify the web server configuration file to support the IBM system feature.
By default, the IBM Multimedia Analysis and Retrieval System supports still images in JPEG, GIF, PNG, TIFF and PNM formats, as well as MPEG1 and MPEG2 videos. For all other video formats, please download and copy the FFmpeg.exe executable (free software) to the installation folder – see the “Optional Software Installation’’ section of this guide.
Chapter 1 |
Installation Guide
Download Apache Server Setup from http://httpd.apache.org/ if it is not already installed on the PC.
Install Apache into a directory of your choice (hereafter referred to as [APACHEROOT]; on Windows PCs, it is usually C:/Program Files/Apache Software Foundation.
Configure the httpd.conf file. If you already have the Apache HTTP server installed, and you are not sure where the conf directory is, try searching for the httpd.conf file. For example, httpd.conf file for Apache 2.2 is in the [APACHEROOT]/Apache2.2/conf/ directory.
Catalog Repository
Catalog Repository is a location where all subsequent processing results of various collections will be stored. In this step you need to determine the location of the Indexing Tool’s Catalog Repository folder CATALOGROOT. The system will use CATALOGROOT in the IMARS alias setup of the httpd.conf file. If the output directory CATALOGROOT is set to C:\WWW, and you want the current collection to be named Travels, you need to specify C:\WWW in this step of the setup and to make sure that the Catalog Repository folder in the Indexing tool is: C:\WWW and the Catalog Name is Travels.. Once indexed, the collection will be accessible via a browser at the following link: http://localhost/imars/Travels/
The following paragraph explains in detail how to perform such operations.
Step 1: Copy and paste the following segment to the end of the httpd.conf file:
(NOTE: remove trailing and preceding white spaces that may occur if you copy and paste this segment from the .pdf file)
# Add IMARS aliases
AliasMatch ^/IMARS/([^/]*)/(.*)$ "CATALOGROOT/$1/docs/$2"
AliasMatch ^/imars/([^/]*)/(.*)$ "CATALOGROOT/$1/docs/$2"
ScriptAliasMatch ^/cgi-imars-bin/([^/]*)/(.*)$ "CATALOGROOT/$1/cgi-bin/$2"
ScriptAliasMatch ^/cgi-IMARS-bin/([^/]*)/(.*)$ "CATALOGROOT/$1/cgi-bin/$2"
Alias /IMARS "CATALOGROOT"
Alias /imars "CATALOGROOT"
<Directory "CATALOGROOT">
Options Indexes MultiViews
AllowOverride All
Order allow,deny
Allow from all
</Directory>
<Directory "CATALOGROOT/*/docs">
Options Indexes MultiViews
AllowOverride All
Order allow,deny
Allow from all
</Directory>
# End IMARS aliases
Step 2: Replace CATALOGROOT with your Main Catalog Folder physical location i.e. C:\WWW
Step 3: Save the changes to the httpd.conf file
EXAMPLE: If CATALOGROOT is C:\WWW, the end of the httpd.conf file looks like:
#Add IMARS aliases
AliasMatch ^/IMARS/([^/]*)/(.*)$ "C:/WWW/$1/docs/$2"
AliasMatch ^/imars/([^/]*)/(.*)$ "C:/WWW/$1/docs/$2"
ScriptAliasMatch ^/cgi-imars-bin/([^/]*)/(.*)$ "C:/WWW/$1/cgi-bin/$2"
ScriptAliasMatch ^/cgi-IMARS-bin/([^/]*)/(.*)$ "C:/WWW/$1/cgi-bin/$2"
Alias /IMARS "C:/WWW"
Alias /imars "C:/WWW"
<Directory "C:/WWW">
Options Indexes MultiViews
AllowOverride All
Order allow,deny
Allow from all
</Directory>
<Directory "C:/WWW/*/docs">
Options Indexes MultiViews
AllowOverride All
Order allow,deny
Allow from all
</Directory>
# End IMARS aliases
Step 4: RESTART the Apache HTTP server for the changes to take effect.
§ Download the imars.zip file and unzip the file to a directory of your choice (i.e. C:\Program Files\Imars\) - hereafter referred to as <DIR>.
§ Download the classifiers.zip file. Unzip classifiers.zip to <DIR>/bin/
§ (OPTIONAL STEP) Download factory.zip. Unzip the files to <DIR>/bin/ (say yes to overwrite).
§ NOTE: at this point, the <DIR>/bin/classifiers/ folder should contain a hierarchy of subfolders, the topmost of them containing the string ``(facet)’’ in their name (i.e. “Setting (facet)” or “People (facet)”)
§ The IBM Multimedia Analysis and Retrieval System supports still images and MPEG1 and MPEG2 videos natively. The Indexing Tool will process other video formats only if FFmpeg is setup. FFmpeg is an open-source project that supports reading and writing of many different video formats. If the FFmpeg binary is installed in the IBM Multimedia Analysis and Retrieval installation folder, it will be used to decode non-native video formats.
§ Step 1: Download FFmpeg.exe. The official site for FFmpeg is http://ffmpeg.mplayerhq.hu/ . The user can download the source code from this site, but not the built. The user can either choose to build the FFmpeg.exe from the downloaded source code, or to search for sites that provide a FFmpeg.exe binary distribution, i.e. http://arrozcru.no-ip.org/ffmpeg_wiki/tiki-index.php?page=Links.
§ Step 2: Select the FFmpeg.exe executable and copy it to the installation folder <DIR>/bin ( i.e. C:/Program Files/IMARS/bin).
Next section will describe the use of the Indexing Tool.
PART 2 |
The indexing component of the IBM Multimedia Analysis and Retrieval System extracts multimodal metadata from the image and video collection
T |
he indexing tool is an integral part of the IBM® Multimedia Analysis and Retrieval System. It automatically indexes the collection of images and videos, extracts metadata and semantic information from the content, and uses all extracted information to organize data in a more accessible way.
To start the Indexing Tool, simply double click on IMARS.exe. The Indexing Tool will load the variables specified in prefdflt.txt in <DIR>/bin. The user needs to (i) specify the origin of the image and video collection to be indexed (Search Directory), Catalog repository and Catalog Name, to (ii) customize the Indexing Tool options and tabs, and (iii) to press START button to initialize the collection indexing. Each option in the IMARS.exe tool corresponds to a variable setting (in brackets) – see Manual Configuration Example section for more information on manual configuration.
1) Search Directory – Specifies the full path to the collection folder to be indexed. The Indexing Tool will not create any new files in this directory but will only read from this directory. The user can browse the hard drive(s) (using the browse button) to find the location of the collection (SEARCHDIR).
2) Search all subdirectories – (If selected) allows the user to search subdirectories within the Search Directory. IMARS will search subdirectories up to 12 levels deep (SUBDIRS).
3) Process prior catalog (skip file search) – (If selected), IMARS will not search in file system for images but will only re-index a catalog previously processed. NOTE: processing is incremental, but indexing is not. Enable all indexing and extraction options in the incremental run that you want to see in the final index. (NOCRAWL).
4) Batch process (skip previews) – (If selected) no preview images are displayed while processing. It saves processing time in case of large image sets (BATCH).
5) Catalog Name – unique repository name inside a given Catalog Repository. If the current Catalog Name matches with an existing one, then IMARS prompts a message to the user asking whether he wants to overwrite the existing one. If the user does not specify Catalog Name (i.e. leave it blank), the tool will also display a warning (CATALOGNAME).
6) Catalog Repository - is C:\WWW, by default (variable CATALOGROOT). Once indexed, all Collections will then be accessible via a browser at the following link: http://localhost/imars/. See ``Manual setup’’ section on how to change the values manually. Catalog Repository setting is connected with your Apache server configuration – if you change the value of CATALOGROOT variable make sure that the IMARS segment of http.conf file reflects the SAME change (CATALOGROOT in the Apache HTTP Web Server).
7) Catalog Directory - Location where the IBM Multimedia Analysis and Retrieval System will store and index data collected from the Search Directory. The Catalog Directory is built as <Catalog Repository>/<Catalog Name> (CATALOGDIR). The Catalog Directory should not be the same as the Search Directory , be read only , and be write protected.
IMARS settings
IMARS default variables are set in the configuration file <DIR>\bin\prefdflt.txt, and they are loaded in the first run of IMARS.exe. Once the tool completes the indexing, or if user chooses to exit, user specifications in the Indexing Tool is saved to <DIR>/bin/preflast.txt. All subsequent Indexing tool runs will load/write variable values from/to <DIR>/bin/preflast.txt file.
Manual Configuration:
IMARS current state is saved in the configuration file <DIR>\bin\preflast.txt. User can manually change the values of the variables, as shown in the following example:
Step 1: Close the IMARS.exe
Step 2: Change the default variables in preflast.txt
From:
SEARCHDIR=C:\Photos\
CATALOGDIR=C:\Www\Travels
CATALOGROOT=C:\Www\
CATALOGNAME=Travels
To:
SEARCHDIR=C:\Photos
CATALOGDIR= C:\myUser\myFolder \Holiday
CATALOGROOT= C:\myUser\myFolder
CATALOGNAME=Holiday
Step 3: Save preflast.txt
Step 4: If you change CATALOGROOT in preflast.txt, make sure to change the variable CATALOGROOT in http.conf in the Apache setting (see Apache HTTP Server setup section) to the same value i.e. CATALOGROOT is not C:\myUser\myFolder. Save httpd.conf and restart Apache server.
Step 5: Run IMARS.exe - changes will be reflected
Each collection needs to have a unique collection name, and the tool remembers and displays the last Catalog Repository and Catalog Name will be remembered and displayed. Note that, if user does not want to overwrite the previous collection results, he needs to CHANGE the subdirectory (catalog name) before running the tool.
8) Parameters tuning tabs - The Indexing Tool has a multiple tabs so the user can fine-tune the parameters used for collection indexing, metadata and semantic information extraction. Each tab is explained in detail in the following Sections.
The Options Tab allows the user to specify the data formats that will be processed, size of the dataset, and some advanced image processing methods.
Ingestion Limit – Allows an upper limit on the number of indexed images, up to 20000 (max) of the images in the collection. This option allows the user to limit the number of images to be indexed. Images are selected in the alphabetical order. (MAX)
Target Formats – This release of IBM Multimedia Analysis and Retrieval System will read JPEG, GIF, TIFF, PNG and PNM image formats by default, as well as MPEG1 and MPEG2 video formats. The user can choose which formats will be processed using the corresponding check boxes. IBM Indexing tool can index videos other than ones in MPEG1 and MPEG2 format, see optional software download. The ``Other videos’’ check box implies the installation of the external FFmpeg software. If the external FFmpeg software is installed, the system will also be able to process a wide variety of other video formats, such as WMV, AVI, and FLV.
Image Processing Options
IBM Indexing tool has two image processing options. These options allow the user to edit the appearance of the collection, not the source images or key-frames.
Cropping – Allows the user to apply universal cropping of image borders. It crops borders on all sides by x pixels, where x is determined by the sliding scale. This option is very useful if images and videos in the collection have damaged sides, black borders (i.e. converted videos, where top and bottom are padded with black stripes), or images and videos presenting header or footer (i.e. news channels) that are not relevant for the content extraction (CROP, CROPSIZE).
Color Processing – If images are too dark, or the results are not satisfactory, the user can pick any of the image pre-processing options to improve the appearance of images and videos.
- ``Auto’’ option lets system decide if range normalization is needed for an image histogram.
- ``Brighten’’ option performs brightness adjustment in the image histogram, and can clarify the content if the image collection is too dark (BRIGHTEN).
- ``Equalize’’ performs contrast adjustment in the collection using the image's histogram. This method usually increases the local contrast of many images, especially when the usable data of the image is represented by close contrast values (EQUALIZE).
- ``None’’ option is set as a default. Color Processing can significantly change the characteristics of the content. For typical user collection of digital photos and personal videos, no color processing is usually needed.
The Video Tab allows the user to specify video-specific parameters for key-frame extraction.
Video Limit – upper limit on the number of indexed videos. If it is set to None (extreme left), the tool will index up to 20000 (max) of videos in the collection (VIDEOLIMIT)
Frame Extraction Limit – limits number of frames extracted per video. The default option is All. The user might want to exercise this option if the collection has too many videos, and for a quick overview only a couple of frames per video are needed (VIDEOFRAMELIMIT).
Frame Skip Factor - set the number of frames to be skipped after each processed frame (higher number corresponds to faster processing). It influences the behavior of the Temporal Sampling method selected (see below). If the Frame Skip Factor is set to n , then Temporal Sampling is applied every n-th frame in the video. The default is set to 256 frames, as this gives a good balance between key-frame sampling and processing speed (VIDEOSKIPFACTOR).
Frame Extraction – allows the user to specify if he wants multiple key-frames or a single, most representative key-frame to represent an input video in the Search Tool. The default is set to multiple, as the level of detail is greater (VIDEOFRAMEEXTRACTION).
Frame Types – determines the type of frames to keep after MPEG decoding. Currently only the I-frames are used, since they are not compressed in the temporal domain and therefore provide a more accurate representation (VIDEOFRAMETYPE).
Temporal Sampling – allows the user to choose the method for selecting frames to be processed (VIDEOSAMPLING). Fixed frame sampling simply extracts every n-th frame in the video. This approach is faster than Key-frame sampling, but results in a larger number of frames, many of which may look very similar. Key-frame sampling compares successive frames and extracts a key-frame when the visual difference between them is above a threshold. Successive frames are spaced according to the Frame Skip Factor. (VIDEOSKIPFACTOR). The default value is set to key-frames. This processing is more complex, but the results are much better, as is selects a representative key-frame for each distinct shot in the video stream.
The Metadata Tab allows the user to specify what metadata should be extracted from images.
Duplicates – are defined as images with the same MD5 hash. They can have different names or come from different sources, but have the same content. This option gives the user a capability to identify duplicates (IDENTIFYDUPLICATES) and to remove duplicate files (REMOVEDUPLICATES) in its collection.
Near-duplicates - are images and key-frames that have almost the same appearance i.e. picture of the same scene from slightly different angle, 2 keyframes from a still video, etc. The Indexing tool can identify near-duplicates (IDENTIFYNEARDUPLICATES) and label them using existing set of classifiers (LABELNEARDUPLICATES) - more on labeling under ``Clusters’’ tab.
File metadata – allows the user to save the pathname of the folder where the original images (videos) are stored (“Folder names” option).The “Dates” option lets the system extract the date when each image was created. The Indexing tool extracts metadata information on the key-frame level of the video. Therefore, ``Date’’ will reflect the time and date the key-frame was extracted. (METADATAFILENAMES, METADATADATES)
Camera Metadata - extracts EXIF Camera information from the searched images, if such information exists. NOTE: Various image processing programs tend to corrupt original EXIF information (i.e. cropping, web publishing, red eye removal). IMARS will display whatever is in the camera info field of the source image specified. (METADATACAMERA, METADATAEXIF).
The Classifier Tab lets the user determine if the system needs to do an automatic semantic classification of content and if so, the user can select which classifiers should be applied.
Apply Classifiers – index the image or video collection with a list of semantic classifiers. If the ``Yes'' radio button is selected, the tool will automatically classify the multimedia set to allow browsing and searching of the collection with the selected classifiers. The classification output can be browsed by clicking on the ``Classifiers'' button in the IBM Search Tool. (EXTRACTCLASSIFIERS). Classification model evaluation can be sped-up by setting the speed-up and drop-off threshold value, as defined below:
Speedup (vs. accuracy) - if the speedup is 1x (default), accuracy is highest possible, as the system is using full classifier model in the evaluation step. If the speedup is set to a higher value (20x is highest) the system speeds-up the evaluation of the classifier model on the dataset by sampling the model. This can result in the reduced accuracy as full model is not used as a trade-off for the increased evaluation speed. (CLASSSPEEDUP).
Drop-off threshold – if the drop-off threshold is set to `None’ (far left), no drop-off threshold is used and the classification. If it has value other than none, it indicates that all intermediate scores in the classification model evaluation that are below the threshold should be discarded. 0.0 is the default threshold (PROGRESSIVETHRESHOLD).
Classifiers Taxonomy – if classification is selected, the indexing tool will evaluate the user’s collection against all concepts selected from the default concept taxonomy. The user can choose a subset of the classifiers to be applied by using the check boxes in front of the corresponding concepts. NOTE: Concept selection is propagated to the entire sub-tree of selected nodes. The automatic classification output can be browsed by clicking on the ``taxonomy'' option in the IBM Search Tool.
More on Classifiers
Fully-automatic approaches based on statistical modeling of low-level visual content features have been applied for detecting semantic concepts such as sky, party, etc. Statistical modeling requires large amounts of annotated examples for training. Since this scenario is not applicable to unlabeled content, we adopt an approach of automatic semantic tagging. We re-use existing semantic models, trained on various multimedia content, and automatically associate confidence scores to unseen data with the cross-domain concept models. To enable cross-domain usability, we chose the general semantic models from our lexicon, preserving consistent definitions of concepts across different multimedia and video domains (albums, blogs, web video).
The Clusters Tab allows the user to specify what kind of data clustering, if any; they want to see in the Search Tool.
Extract Clusters – the IBM Multimedia Analysis and Retrieval System allows users to cluster their image collection using visual similarity. The user must select the ``Yes'' radio button to extract clusters, as ``No’’ is the default. (EXTRACTCLUSTERS)
Number of clusters – the default number of clusters is determined based on data size, visual feature, and data point distribution in the visual space. If the collection is heterogenic in content, more clusters will be created (NUMCLUSTERS)
Clusters Taxonomy – if clustering is selected, the Indexing Tool will extract clusters based on visual descriptor listed in the clusters taxonomy by default.
More on Taxonomy Selection
User’s selection of classifiers, clusters, and features is remembered in IMARS run.
By default, the tool extracts a range of visual features, which can be used later on for a visual content-based similarity search. The user can use this tab to specify only a subset of the features to be extracted.
Extract Features – the IBM Multimedia Analysis and Retrieval System allows users to index their image collection with a list of visual features capturing visual characteristics of the content, such as color, texture and shape. The ``Yes'' radio button must be selected to extract features.
Feature Taxonomy – if feature extraction is selected, the IBM Multimedia Analysis and Retrieval System will extract all the features listed in the features taxonomy by default. The user can choose a subset of features to be extracted by using the corresponding check boxes.
On Low-level Visual Descriptors
The system extracts different visual descriptors at various granularities for each representative key-frame of the video shots. Relative importance of one feature modality vs. another may change from one concept/topic to another. Although the used visual descriptors are very similar to the MPEG-7 visual descriptors, they differ in that they have been primarily optimized for retrieval and concept modeling purposes. We performed extensive experiments to select the best feature type and granularity for content search and modeling. Here are some of the low-level visual descriptors (features) the IBM Indexing Tool extracts from the collection. Color Histogram - color represented as a 166-dimensional histogram in HSV color space. Color Correlogram - color and structure represented as a 166-dimensional single-banded auto-correlogram in HSV space using 8 radii depths. Color Moments - the first 3 color moments in Lab color space as a normalized 225-dimensional vector. Wavelet Texture - the normalized variances in 12 Haar wavelet sub-bands Edge Histogram - edge histograms with 8 edge direction bins and 8 edge magnitude bins, based on a Sobel filter
9) Start - Once the user has selected all the above options appropriately, he can hit the START button. This will begin the process of indexing the image collection. Progress will be displayed in the progress bar (see point 13) at the bottom of the screen. The system will first extract features and then extract clusters and concepts, as needed. This will be followed by preparing indices and web pages to be shown during the search phase.
10) Exit – Once indexing is completed and the user does not plan to index any more images or video, he can select the EXIT button to close the IBM Multimedia Analysis and Retrieval System indexing tool.
11) Default – Clicking on this button will initialize the IBM Multimedia Analysis and Retrieval System with default values for all configuration parameters and tab options.
12) About – This button visualizes an information screen with the tool version and copyright.
13) Progress bar – The Progress bar shows the status if the indexing progress.
When the indexing is completed
When the Indexing tool completes the collection processing, user will get a Run Time Screen notification in the separate window: To view the indexing results in a browser, click on the right-most image in the preview section, or open the following link directly in the browser http://localhost/imars/. See Part 3 of this manual for a description of the Search Tool.
3 |
The Search Tool
The Search Component of the IBM Multimedia Analysis and Retrieval System allows for a multimodal search and browsing of the indexed image and video collection
A |
Ccessibility of the extracted metadata helps to improve multimedia data management. The Search Tool of IBM® Multimedia Analysis and Retrieval System provides different views and ways to access to extracted information through content-based search, text-based search, visualization of the concept taxonomy through MediaNet, and visualization of the indexed data using multiple tabs and tag space or mosaic images. Through this multi-tiered data access users are able to search and view the collection from different perspectives.
IBM MULTIMEDIA ANALYSIS AND RETRIEVAL SYSTEM Search Tool has a web browser interface that allows you to browse your collection using multimodal metadata and many different views through tabs. Once any image or video collections are indexed with the Indexing Tool (Imars.exe completed the execution), they can be accessed through a web browser from the following location: http://localhost/Imars/.
The user can browse different views of the indexed collection. Images and key-frames are organized in buckets using extracted metadata, grouped by visual similarity into clusters, and sorted by semantic label confidence. Users can access all views using the search tool tabs, as described below.
1. Tags
The Home Tab of the Search interface displays a random set of images or video key-frames from the indexed collection. This collection overview shows a group of images, where each image is characterized as follows:
- thumbnail
- index of the image in the database
- [Similar] link for visual similarity search (see the ‘Content-based Search’ Section)
- [Related] link for metadata search – retrieves all images/key-frames that have similar metadata information as the image of interest
- [Add] link for likelihood option in the multi-modal search (see the ‘Content-based Search’ Section)
- [Tag] link to tag the data (see the ‘Tagging’ Section)
- Assign metadata that describes the key-frame or image with highest confidence.
If the user selects a thumbnail, it will lead him to a more detailed view of the image, as described in the `Detailed Image View’ Section.
If the user selected the YES radio button in the Classifiers tab of Indexing Tool, this tab points to the classification results. Clicking on the image or the link in the table leads to the list of available classifiers. Users have a choice of 3 (three) different ways to display the concept view of the collection: Tag space, Mosaic Images and Taxonomy.
Tag space view (below) allows the user to quickly identify how many images in the collection belong to a certain semantic concept, with respect to the whole collection. Each concept is represented by its name, and the size of the font is proportional to the amount of images belonging to it. When hovering over a tag within the cloud, a tool-tip appears informing how many pictures are associated with this tag. This feature enables smooth browsing and simplified view of a collection when there is a high number of elements in the list, and mosaic images contain too many small thumbnails to be intuitive.
Mosaic Images view (below) allows the user to visualize the effectiveness of a specific classifier on the indexed collection. Each mosaic contains thumbnails of the images classified as belonging to the specific concept. The more images belonging to a concepts, the smaller the size of each thumbnail in its mosaic image representation.
Taxonomy view (below) allows the user to browse classifiers sorted according to the Classifiers Taxonomy in the Indexing Tool. This feature enables a hierarchical view of a collection when there is a high number of elements in the list that are co-related.
By selecting an individual classifier in any of the available views, the user is prompted to a page containing images indexed in a ranked order. The order is such that the images most likely to belong to the selected concept will appear at the top of the sorted collection, while the least likely images will appear towards the bottom, as shown in ‘Beach’ example below: As you can see, the system can automatically select relevant images (i.e. representing a beach scene) from the collection. Under each thumbnail is reported a highlighted number, representing the confidence score on the likelihood of the image to belong to the concept.
Note on the Classifiers
The automatic semantic classification will not produce perfect results. There will be some misclassified examples. The classification are not fine-tuned to the data domain and collection specifics i.e. the same models apply to the professional videos, personal photos taken in different conditions and web videos. Generic models, such as Outdoors, Sky and Greenery are less sensitive to the data collection and thus produce a robust detection.
The Clusters Tab shows the whole dataset clustered using one or more visual features (listed on the entry page), based on user selection in the Clusters tab of the Indexing Tool (see the Clusters Tab Section of the Indexing Tool). Images are grouped into buckets based on their visual similarity within the cluster and dissimilarity with respect to other clusters in the visual descriptor domain (i.e. color, texture, edge, shape). The mosaic image of a specific cluster category shows the aggregated collection view where grouping is done with respect to visual similarity of the collection content.
Note
The functionality of the Metadata, Cluster and Classifiers tabs are essentially the same. The first page gives you an overview of available metadata/clusters and classifiers for the collection. By choosing the specific view within the tab, the user is led to a collection space overview that can be tag-based (`Tag Space’), image-based (`Mosaic Images’) or more general (‘Taxonomy’)
Duplicates are exact copies of the same image. The system allows the user to find such repetitions and eventually erase them.
Near-Duplicates Tab shows grouping of the images or key-frames in the collection which are visually similar, but not exact copies
Metadata Tab has the exact same functionalities like other Tabs (see Note). Metadata information about digital images and videos plays a crucial role in the management of multimedia repositories. It enables cataloging and maintaining large collections, and facilitates the search and discovery of relevant information. Moreover, describing a digital image with defined metadata schemes allows multiple systems with different platforms and interfaces to access and process image metadata. Depending of the selection in the Metadata tab of the Indexing Tool, the Metadata folder can contain information on the folders the collection belongs to, or time and date when images were taken. The Metadata tag space of years is shown in the image below:
Tags tab are somewhat different from the Metadata Tab as all the Tags are summarized together and there is only one view of the data. Tag space is automatically populated as users tag images or group of images (see more under system functionalities in the next section). The size of each tag is proportional to the number of images which were tagged accordingly within the collection.
The Random tag produces a display (similar to the Home tag one) of images randomly selected from the collection.
By clicking the Help tag, the user is presented with an html version of this Guide.
2. Detailed Image View
The user can access a detailed view of the image by clicking on an image thumbnail in any of the non-summarizing collection views in the Search Tool, presenting
- Image Info: the Name and Shot ID of the image
- Metadata
- Auto-tagged concepts with corresponding confidences, and images tagged through the interface
- Feature Search box: enables user to a more advanced option of searching in different visual feature spaces. The user can do a content-based search by clicking on any of the features listed.
Interactive search in Multimedia Search and retrieval System consists of searching by visual features, metadata, tags, and concept-based search. Although these techniques are very powerful, we want to enable the user to enrich the content with subjective interpretations of the content. The system offers multi-modal search capabilities (text based, visual appearance based, and classifier based)
Content-based search or ‘query by example’ search is a query technique that involves providing the system with an example image that it will then base its search upon. The system retrieves the closest images to the example image in low-level visual descriptor space and retrieves a result list where images are sorted by the ascending distance from the query example. Content-based search in IBM Multimedia Analysis and Search can be invoked in three ways.
In the first scenario, the user selects the [Similar] button under the example query. In the second scenario, the user can either type LIKE@<Shot ID> in the search bar, where <Shot ID> is the index of the desired example in the database, or click the [Add] button below the example query, and press search. In both scenarios, the user can select the specific visual space to conduct the similarity search in the feature box (“color histogram global” by default).
In the third scenario user can invoke a specific similarity search from the image view page by selecting a visual space he wants to query in. If the use selects a default feature space, the result page will be the same. In other cases the result will be different. Result page when search is done in wavelet texture feature space, where the relevant result set is ranked differently.
The Search Bar allows you to access and search substrings of all labels connected to the key-frame (this extends to metadata and classifiers). All Browsing functionalities can be accessed through a Search bar using the correct prefix, i.e. FOLDERS@ is the prefix for Folders Metadata CLASSIFIERS@ is the prefix for auto-tagged semantic models, and TAGS@ is the prefix for user-assigned tags. Search bar has an auto-complete functionality to enable easier browsing through possible matches, as illustrated in the screenshot below:
The search bar also enables content-based search: typing Like@<Shot_ID>, where <Shot ID> is the index of the desired example in the database and press search, will result in query-by-example search over the collection. In the same manner, the search bar enables multi-modal search using BOOLEAN type of expressions. In the example below, we filter the content-based search with classifier ‘Blue_Sky’, and final search expression Like@<Shot_ID> AND CLASSIFIERS@Blue_Sky gives us a more precise result set:
Regular expressions in the Search bar enable the user to combine different modalities of extracted information to model better the information need and to ultimately find the relevant shot within the collection.
Our system offers the possibility to improve the visualization of query results by grouping them using existing metadata and clusters. The groups are computed dynamically, by initiating on the result set which is currently displayed on the screen, and the following steps are taken:
- Select the grouping category i.e. metadata or visual cluster from the dropdown menu over the Search bar
- Collect group labels for every key-frame in the result set that matches the selected category
- Group images/shots in the result list by common label.
- Create visual containers for all images/shots labeled with the group label.
Groups are ordered based on their aggregate scores, and then items are ordered within groups based on their original query match scores. User can easily filter the meaningful result in the retrieved list by grouping the top matches using visual clusters
Tags are freely chosen labels that help to improve a search engine's effectiveness because content is categorized using a familiar, accessible and shared vocabulary. IBM Search Tool offers several ways to retrieve items that match the topic of interest. The tagging feature enables the user to assign subjective tags to multimedia content. Specific key-frames can be tagged from the collection view using the button [Tag] under the thumbnails. The system allows for a group tagging i.e. simultaneous assignment of the same label to a group of key-frames. This allows the user to define specific events across different content sources that were not specified in the system. In the single item view, we use different confidence values, to distinguish such “group tags” from tags that were assigned to a single shot. Once a reasonable number of tags have been assigned to the multimedia collection, the collection can be viewed as a Tag cloud using Tags tab in the search system. Assigning the same tag to different group of items can describe an event that one can search for later on.
4 |
1. What is the IBM Multimedia Analysis and Retrieval System?
The IBM Multimedia Analysis and Retrieval System (IMARS) is an automated content indexing and multimodal search system for digital image and video collections.
2. Does IBM Multimedia Analysis and Retrieval System work on videos?
Yes. The IBM Multimedia Analysis and Retrieval System natively supports still images, MPEG1 and MPEG2 videos. For all other video formats, please download and copy the FFmpeg.exe executable (free software) to the installation folder (by default C:\Program Files\Imars) – see ``Optional Software Installation’’ section of the guide.
3. What does IBM Multimedia Analysis and Retrieval System consist of?
IBM Multimedia Analysis and Retrieval System consists of two components: Indexing and Search. When you click on Imars.exe, it brings up the indexing and ingestion tool. Once the indexing and ingestion is complete, the search tool can be accessed at http://localhost/imars/ via any browser. You cannot use the search tool without running the indexing and ingestion tool at least once.
4. Why is there a limitation on the number of images and videos that the system can process in a single batch?
This is not a software limitation, but merely a trial license limitation for this release of IBM Multimedia Analysis and Retrieval System. The tool can be run however on an unlimited number of image and video collections (limited only by the available disk storage), and the collection size limitation should be generous enough to allow most non-commercial user applications.
5. If I check the Search subdirectory box in the IBM Multimedia Analysis and Retrieval System collector, what is the maximum depth of subdirectories that will be visited?
Maximum depth for crawling subdirectories is 12.
6. Why does IBM Multimedia Analysis and Retrieval System work only on the Windows® XP platform?
The indexing tool of the IBM Multimedia Analysis and Retrieval system has been extensively tested on Windows® XP platform. It will run on any 32-bit Windows® Operating System, but we recommend Windows® XP. However, the IBM Multimedia Analysis and Retrieval System Search Tool can be accessed from any Web browser on any platform (Windows, Apple, Linux®, etc.)
7. Do I need to install a Web server?
Yes. In order to use the IBM Multimedia Analysis and Retrieval System Search Tool, you do need to install the Apache http server, please refer to the ``Apache HTTP Server Setup’’ section of the IMARSGuide for alphaWorks.pdf
8. Can I append another image collection to the one ingested in IBM Multimedia Analysis and Retrieval System?
Not at this time. This release allows you to run IBM Multimedia Analysis and Retrieval System on separate directories only. You can put two different photo collections in the same folder and run IBM Multimedia Analysis and Retrieval System on them (make sure to check the include subfolders option).
9. My instance of the IBM Multimedia Analysis and Retrieval System exits without warning; what should I do?
Please download the latest version of IBM Multimedia Analysis and Retrieval System from IBM alphaWorks website. If you cannot find answers in this guide, please send a note to
10. Why do I need to download and unzip classifiers.zip and/or factory.zip files?
The Indexing tool uses semantic classifiers located in <DIR>/bin/classifiers to evaluate against user’s multimedia collection. If these files are missing, user will not be able to explore the full capability of the IBM Multimedia Analysis and Retrieval system that relies on the automatic semantic classification.
11. I have installed IMARS system with default CATALOGROOT location C:\WWW. Now I want my http://localhost/imars to point to a different location. How can I change this?
By default, the CATALOGROOT variable is C:\WWW. Refer to Indexing Tool Interface, Manual setup for more information how to change CATALOGROOT variable in the Indexing Tool setup, and refer to the Apache HTTP Web Server setup to learn how to change CATALOGROOT in the http.conf to the new value.
12. I played with manual setup in preflast.txt and now the tool does not run correctly. How do I reset the tool to load default variables?
If you ran IMARS.exe, it will save the current state in <DIR>\bin\preflast.txt, and it will use it for subsequent runs. Do the following:
a) EXIT Imars.exe
b) Erase <DIR>\bin\prefdlast.txt so that the default options in <DIR>\bin\prefdflt.txt can take effect next time Imars.exe is run.