Workflow and Web Processes in Bioinformatics

John A. Miller, Krys J. Kochut, Zhiming Wang and Amrita Basu

LSDIS Lab, Computer Science Department
415 GSRC
University of Georgia
Athens, GA  30602
jam@cs.uga.edu

Abstract

Over the last two decades there has been substantial development
of two related technologies for managing bioinformatics processes
that access multiple data sources, extract and transform the
data, store them in databases, facilitate analysis and even
support human interaction.  Traditionally, this was done with
Perl pipelines.  The newer technologies of workflow and Web
service processes offer a higher level way to do this.

Still, the impact of these technologies on actual practice is
less than one might expect.  We examine this situation by
considering the evolution of workflow and Web service
technologies, particularly as applied to bioinformatics.
Workflow technology initiated in the early 1990's and made
substantial progress through the decade.  Today, there are
many fully functional Workflow Management Systems available,
many of them open source.  In the current decade, much of
the research effort has switched over to Web service processes.
The engines are less mature than workflow engines, but have
the advantage that they are based on open Web standards (e.g.,
SOAP, WSDL and BPEL).  Although, Web service standards continue
to expand, the technology currently lags workflow technology
in terms of usability and particularly human interaction with
the process.  Perhaps, these two technologies will become more
similar over time.  Already, several workflow engines support
the invocation of Web services.  Very recently, jBPM has added
support for BPEL, an OASIS standard for Web process orchestration.

These technologies will be examined in more detail by looking
at three case studies or projects: GeneFlow, ProPreO and ApiFlow.
These projects are ideal in the sense that each uses multiple
engines.  They also include a good mix of workflow and Web
service process engines.  These are all bioinformatics projects
carried out at the University of Georgia.  The projects used
(at least to some extent) the following process engines:

1. GeneFlow engines: METEOR's WebWork and OrbWork
2. ProPreO engines: Taverna, METEOR-S and jBPM
3. ApiFlow engines: Taverna and ActiveBPEL

Two of the engines were fully developed by us (WebWork and OrbWork),
two by a third party (Taverna and ActiveBPEL) and one is a
hybrid (METEOR-S).

Acknowledgements:
We would like to thank the following students for creating figures
for the slides.  The slide from the ActiveBPEL designer was produced
by Pablo Mendes.  The corresponding slide from Taverna was
produced by Rui Wang.  Finally, a slide in the appendix illustrating
the use of Taverna in the Complex Carbohydrates Research Center (CCRC)
was produced by Satya Sahoo.  Note that due to space limitations,
the appendix was not part of the actual poster presentation.  We
would also like to talk our faculty colleagues, Eileen Kraemer,
Jessica Kissinger, Amit Sheth and Will York.