XML and Bioinformatics

Data Management in Bioinformatics

The idiosyncrasies of bioinformatics data:
1. Data are complex to model
      -There are many different types of data
2. New types of data emerge regularly
      -Data analysis generates new data that also have to be modeled and integrated.
3. Raw data must be archived
      -The terabyte of bioinformatics data consists of a large number of objects.
4. Data are updated very frequently, accessed intensively and exchanged very often by researches
5. All kinds of users(biologists, programmers, database managers ..) need to issue complex queries
6. Technical issues of bioinformatics data
      -The volume of data grows exponentially
      -Data are disseminated in a myriad of different databases that are duplicated in several repositories vThese databases have heterogeneous formats

The advantages of XML in Bioinformatics

1. XML is highly flexible
It is simple to modify a DTD. The XML and DTD files are human readable and then can be easily edited by people with only few computer skills
2. XML is Internet-oriented and has very rich capabilities for linking data
      -This can be used for interconnecting databases
3. XML provides an open framework for defining standard specifications.
      -This is an important point because bioinformatics clearly lacks standardization.

The disadvantages of XML in Bioinformatics

1. The expressiveness of the XML data model would probably not be sufficient for molecular biology
      -XML has no inheritance mechanism
      -XML has no support for numerical values, tables and matrices

Following are the some relevant links

XML for Molecular Biology
XML-biology links
Bioinformatics XML resources
FAQ about XML in Bioinformatics
USING XML FOR BIOINFORMATICS: AN INTEGRATED, VISUAL, INTERACTIVE RESEARCH ENVIRONMENT
Reference on XML bioinformatics paper
A sample Project descrition and useof XML
XML Formats for Bioinformatics