Workflow and Web Processes in Bioinformatics John A. Miller, Krys J. Kochut, Zhiming Wang and Amrita Basu LSDIS Lab, Computer Science Department 415 GSRC University of Georgia Athens, GA 30602 jam@cs.uga.edu Abstract Over the last two decades there has been substantial development of two related technologies for managing bioinformatics processes that access multiple data sources, extract and transform the data, store them in databases, facilitate analysis and even support human interaction. Traditionally, this was done with Perl pipelines. The newer technologies of workflow and Web service processes offer a higher level way to do this. Still, the impact of these technologies on actual practice is less than one might expect. We examine this situation by considering the evolution of workflow and Web service technologies, particularly as applied to bioinformatics. Workflow technology initiated in the early 1990's and made substantial progress through the decade. Today, there are many fully functional Workflow Management Systems available, many of them open source. In the current decade, much of the research effort has switched over to Web service processes. The engines are less mature than workflow engines, but have the advantage that they are based on open Web standards (e.g., SOAP, WSDL and BPEL). Although, Web service standards continue to expand, the technology currently lags workflow technology in terms of usability and particularly human interaction with the process. Perhaps, these two technologies will become more similar over time. Already, several workflow engines support the invocation of Web services. Very recently, jBPM has added support for BPEL, an OASIS standard for Web process orchestration. These technologies will be examined in more detail by looking at three case studies or projects: GeneFlow, ProPreO and ApiFlow. These projects are ideal in the sense that each uses multiple engines. They also include a good mix of workflow and Web service process engines. These are all bioinformatics projects carried out at the University of Georgia. The projects used (at least to some extent) the following process engines: 1. GeneFlow engines: METEOR's WebWork and OrbWork 2. ProPreO engines: Taverna, METEOR-S and jBPM 3. ApiFlow engines: Taverna and ActiveBPEL Two of the engines were fully developed by us (WebWork and OrbWork), two by a third party (Taverna and ActiveBPEL) and one is a hybrid (METEOR-S). Acknowledgements: We would like to thank the following students for creating figures for the slides. The slide from the ActiveBPEL designer was produced by Pablo Mendes. The corresponding slide from Taverna was produced by Rui Wang. Finally, a slide in the appendix illustrating the use of Taverna in the Complex Carbohydrates Research Center (CCRC) was produced by Satya Sahoo. Note that due to space limitations, the appendix was not part of the actual poster presentation. We would also like to talk our faculty colleagues, Eileen Kraemer, Jessica Kissinger, Amit Sheth and Will York.