![]() |
XML maker/flattener documentation |
This is the documentation for the XML Maker and Flattener software. The first part (manual) provides directives and examples for running the applications. The others parts give a deeper description of the software.
sh bin/xmlflattener -mapping <mapping-file.xml> -xmlDocument <your PSI1.0 XML document> -o <output file>
sh bin/xmlflattener-gui
sh bin/xmlmaker -mapping <mapping-file.xml> -o <output xmlDocument> -dictionaries <dictionaries> -flatfiles <flat files>
sh bin/xmlmaker-gui
bin/xmlflattener.bat -mapping <mapping-file.xml> -xmlDocument <your PSI1.0 XML document> -o <output file>
bin/xmlflattener-gui.bat
bin/xmlmaker -mapping <mapping-file.xml> -o <output xmlDocument> -dictionaries <dictionaries> -flatfiles <flat files>
bin/xmlmaker-gui.bat
XML Maker and XML Flattener are two applications that allow to convert tab delimited files to XML documents and XML documents to tabdelimited files according to an XML schema.
Both application can be used either with or without graphical interface. The graphical interface allows to load an XML schema and to create a mapping between flat (tab delimited) files and XML document. Once a mapping has been created, it can be reused directly on the command line.
To create a mapping file, an XML schema should first be loaded in the GUI . A graphical tree representation of this schema is then created. On this tree it is possible first to choose the 'main node', i.e. the node that contains all information that will be displayed on a single line of the output tab-delimited file. Then it is possible to select the elements and attributes that will be exported. The application will automaticaly calculate the number of columns necessary according to the number of sub-elements found.
To create a mapping file, an XML schema should first be loaded in the GUI, then one (or more) flat file. A graphical tree representation of this schema is created. On this tree it is possible first to associate a node to a flat file. An element corresponding to this node will be created in the output XML document for each line of the flat file. At this point the fields of the file can be associated to the nodes of the schema.
Other types of associations are possible:
Both applications have been develop on and required a Java 1.4 environment (or newer) (http://www.java.com/en/download/index.jsp).
The applications often require extra memory allocation. You can specified how much memory has to be reserved by java with -Xms and -Xmx options, for instance: java -Xms256M -Xmx512M
Some files are available in the data directory. those files are relative to the Protein Standard Initiative (http://psidev.sourceforge.net/) for which this software has been created.
sh bin/xmlflattener -mapping data/flattener-mapping-psi10.xml -xmlDocument <your PSI1.0 XML document> -o <output file>
sh bin/xmlflattener -mapping data/flattener-mapping-psi25.xml -xmlDocument <your PSI2.5 XML document> -o <output file>
sh bin/xmlmaker -mapping data/maker-mapping-psi10.xml -flatfiles <your flat file> -xmlDocument <your PSI2.5 XML document> -o <output XML document>
sh bin/xmlmaker -mapping data/maker-mapping-psi25.xml -flatfiles <your flat file> -xmlDocument <your PSI2.5 XML document> -o <output XML document>
all sources of this software are available in the src/main/java directory
you can build the software using maven (http://maven.apache.org):
mvn clean install appassembler:assemble assembly:assembly
classes will be compiled into the target/classes directory, a jar file will be added to the target directory. A compressed (zip) package
containing automatically generated scripts and libraries is also added to the target directory.
This software has been developed for the HUPO Proteomis Standards Initiative (http://psidev.sourceforge.net/). It can be downloaded from SourceForge: http://psidev.cvs.sourceforge.net/psidev/psi/mi/tools/
Two tutorials are available:
The window is divided into 3 parts:
This panel displays the flat files that have been opened. It is possible to open more than one file by creating a new tab and then opening a file.
The dictionnary panel is used to load a file that associates values in the flat file to a new set of values. The dictionnary can be used to replace values in the flat file with the new corresponding values, as defined in the dictionnary file.
A dictionnary file contains on each line a first word (the key) followed by a list of other words (the replacement values). Each word is separate from the others by a separator that can be specified while loading the file. A dictionnary can be loaded from a flat file, a tab delimited file...
an example:
Delition|deletion analysis|MI:0033
Mutation|mutation analysis|MI:0074
The dictionnary tool can be used, for instance to replace values by an identifier, for example, in the case of PSI, the species names by their taxId. This dictionnary would be loaded from a file in which each line contains a name and an identifier.
When associating a node to a dictionnary, the user will the choice to replace the values from the associated field in the flat file with the values in the second or in the third column of the dictionnary. When a dictionnary does not find a value, it behaves as if the field were empty.
The main frame displays a tree that represents the loaded XML schema.
Two different icons are used to represent an attribute
or an element
. Text colors in node names give some
indications about the association status of the nodes:
The node names also provide some information. They take the form name (type, max: maxOccur) where name is the name of the element or attribute, type is the XML type and maxOccurs the maximum amount of this element allowed by the schema (only for elements).
When a choice is possible, (for instance beetwen an element description and an element reference), it is displayed as (choice1|choice2|choice3...). When clicked, this type of node opens a window that allows to select an element.
The "flattener" applicationwas developed to give the opportunity to organize a subset of the elements of an XML document in a flat file. The flattener can reckon the number of columns that are needed to represent the information in the XML document. For example, if an element named list can contain, according to the XML schema, an amount unbounded of another element called child, the "flattener" will first check in every list for the maximum number of child elements (and references to this type of element) and The output flat file will then contain have on each line the appropriate amount of fields (even empty) (example: for a node describing an interaction, if each interaction in the XML documents are interactions between two proteins, but one is an interaction between three proteins, each line in the flat file will have the number of fields necessary to describe three interactors.
When, according to the schema, a choice is possible it is displayed as (choice1|choice2|choice3...). When clicked, all possible choices are expanded, offering the possibility to get each of them in the flat file (if the same choice is not made for each element in the XML document).
When the flattener encounters an element of type "refType", it behaves as if it had encountered the element the "refType" is referring to. Thus when an element is selected, the flat file will contain all those elements and all those that are referenced.
Lot of documentation about regular expressions can be found on the web. I will give here only some basic rules and examples of regular expressions that could be used to define the separators.
This software has been created at the University of Roma "Tor Vergata" by Arnaud Ceol and the Mint Group. For any information you can contact me at arnaud@cbm.bio.uniroma2.it.
PSI: the Proteomics Standards Initiative