XML Maker tutorial: how to create your PSI file

The purpose of this tutorial is to is to lead you step by step through the creation of a the PSI XML file describing protein interactions according to the PSI format.

  1. Open a schema

    The first thing to do is to load the PSI schema. This file is called MIF.xsd and is available at the PSI web page. You can also find it in the directory called data. Once the schema is loaded, the root node should be displayed in the main frame. It is named entrySet and is colored in red, indicating that something is missing (that's not a surprise, since do not have done any mapping yet).

  2. Set defaults values

    Set your prefix

    By pressing the button set your prefix, you can enter a prefix value that describes your institute (e.g. EBI, MINT...). This value will be used later if you want the application to generate automatically some values, usually used as ids.

    Associate first fields to default values:

    We click for the first time on the node entrySet. This node is now extended and contains two attributes (level and version), and an element named entry.

    After the name of this element we can read max: unbounded; it means that there is no limitation in the number of elements contained in an entrySet. In our case, we just need one that will contain all our data.

    We can already give a default value to the attributes level and version. After selecting the node level by clicking on it, we set the radio button association to default value, and press the button Associate . A window is opened where we can enter the value for this field (in our case it should be 1). We can do it again for the attribute version (with the value 1).

    Now we care ready to process.

  3. Open a file

    We can open the flat file that contains the description of the interaction.

  4. Associate the file to a node

    Now that the flat file describing the interactions is loaded, we have to tell to the application to what node the file has to be mapped. The flat file describes a list of interactions (an interaction on each line) so we can map it to the element interactionList (entrySet, entry, interactionList). In order to do it we select the node interactionList, select the radio button association to flat file and click on Associate.

    Once the file associated to a node, the tab that displays this file takes the name of the node associated (interactionList).

  5. Choose the separator (for the flat file)

    By default, no separator is selected, so a whole line of the flat file is displayed in a cell from the list. So we can click on choose the separator and type the one we want to use. For example if the separator is |, we can write \| (see the documentation for more about the regular expressions), if the separator is ; we write ; and if it can be ; , or : we write ;|,|:. The line has now been splited and each field is displayed in a cell.

  6. Do the mapping

    It is now time to do the mapping beetween the flat file and the schema. I will just give some examples of what can be done.

  7. Other files

    We can do it again with other flat files. If we choosed to do a mapping of normalized interactions, we are supposed to map another flat file to the interactorList. In that case we click on add tab, a new tab is created that can be selected by clicking on it (it doesn't have any name at this time as we do not have associated it to a node, whereas the first one is called interactionList, as the node it is associated to). Once selected, we can choose a file, associate it to the node interactorList, and continue with the choice of a separator and the mapping as we have done with the interactions.

    The same process can be repeated, for example for a list of experiments.

  8. Checkings:

    Once all the mappings done, there should not be anymore red elements, it means that the minimum mapping is done. At any moment we can click on check and a new window will display all errors or warnings found. We can also select a node and click on about the node to have some information about it, or select a node and click on preview, a window will display the XML code that would be generated for this node if data is taken in the lines currently displayed from the flat file.

  9. Print

    It is almost over, it remains to click on print and choose the name of the file to print, and the name of the file where will be printed errors and warning messages.

Contacts

This software has been created at the University of Roma "Tor Vergata" by Arnaud Ceol with help of the Mint Group. For any information you can contact me at arnaud@cbm.bio.uniroma2.it.

PSI: the Proteomics Standards Initiative