Overview of My Research

For the past several years, my research has focused on topics related to:

  • the Semantic Web, including ontologies and ontology engineering, as well as ontology querying,
  • specification and execution of workflow processes and web services,
  • text analytics, including entity and relationship extraction, topic modeling, text categorization, often aided by knowledge represented in ontologies, and
  • Bioinformatics.

Semantic Web and Ontologies

I am interested in ontologies, ontology languages, and ontology engineering. Together with my students, I have been working on a system for easy construction of domain ontologies from available data sources, such as ontologies in the Linked Open Data cloud, and other data sources, including XML documents, spreadsheets, and relational databases.

I am also interested in ontology querying, more specifically in optimization of SPARQL query processing and enhancing expressive power of this language. Especially, I am interested in distributed SPARQL processing and studying the impact of RDF data set partitioning on query processing performance. In the past, I worked on adding regular expression-constrained path queries to SPARQL and, together with my Ph.D. student Maciej Janik, created SPARQLer, an extension of SPARQL including path queries. I have designed and implemented a successful proof-of-concept SPARQLer query processor, as well. We have also created BRAHMS, a highly efficient main memory-based triple store for testing and benchmarking Semantic Web systems.

Some of my current projects include ontology graphical querying as well as ontology verification and evaluation, especially for ontologies regularly re-populated from external data sources.

In addition, I am very interested in leveraging the power of ontologies and the Semantic Web techniques in other areas, such as text mining, workflow specification, text processing, and bioinformatics, as described below.

Some of my older collaborative projects the area of the Semantic Web, in which I participated, are described on our old Large Scale Distributed Information Systems (LSDIS) website, .

Workflow Systems and Web Services

I have been very interested in process automation, including workflow specification, workflow management, and Web Service compositions. Recently, I have been working with my students on creating executable, end-to-end specifications of process constraints, aided by ontology specification of the constraints vocabulary. With my students, we have created a Process Constraints Language (PCL), which is applicable to both compositions of Web Services and workflow processes.

In mid-2000's, I worked on the specification and verification of complex conversation protocols for Web Service compositions. Our specification language was based on Color Petri-Nets. This work was done with my Ph.D. student, Xiaochuan Yi (currently at AT&T Research).

Workflow Management Systems (WFMS) have been the main focus of my research, starting in late 1994. I have studied various aspects of modeling, designing, and implementing of (a) scalable workflow enactment systems, (b) enactment systems capable of running dynamic and adaptive workflows, and (c) security in workflow processes. In recent past, I designed and developed OrbWork, a workflow system, which could be used as a vehicle for developing and testing various aspects of my research (OrbWork was part of the Meteor Workflow Management System, on design and development of which I have been collaborating with Drs. Amit Sheth and John Miller). The OrbWork system was used in implementing a variety of workflow applications in areas ranging from healthcare to multi-level security workflows in defense applications. One of the later versions of OrbWork supported multi-level (multi-domain) security, incorporating state-of-the-art encryption standards. Further information is available on the project's web page.

Text Analytics

I have been interested in ontology-aided text processing. In particular, together with my students, Maciej Janik and Mehdi Allahyari, I have created an ontology-based text categorization method, which requires no classifier training (and, consequently, no need for a training set), in contrast to many traditional text categorization methods. Furthermore, in our recent work in this area, we have created a method which allows dynamic specification of user-defined classification categories.

Recently, one of my Ph.D. students, Mehdi Allahyari, we have been focusing on the development of ontology-based topic modeling and topic labeling methods.

Furthermore, I have been working on ontology-aided text mining, especially on entity and relationship extraction in the area of bioinformatics, as described below.

I have been also interested in ontology-based text summarization and directed theses in this topic.

Bioinformatics

Bioinformatics has been one my main interests since the early 90's. Currently, I have been working on development of the Protein Kinase Ontology, ProKinO, which has been a collaborative project with Dr. Natarajan Kannan from Biochemistry and Molecular Biology at UGA. Protein kinases are an important, diverse family of enzymes that are genomically altered in many human cancers. ProKinO represents a wealth of knowledge and data related to protein kinases and allows for formulating and executing integrative, hypothesis-driven queries. We have written a software system to automatically repopulate ProKinO at regular intervals, with relevant data from a variety of sources, including UniProt, KinBase, COSMIC, and Reactome. Furthermore, together with my Ph.D. student, Shima Dastgheib, we have developed a graphical query builder, suitable for biologists who are not familiar with the SPARQL query language and the ProKinO schema.

Currently, I have been working identification of impacts somatic mutations in protein kinases. It is an important project, since mutations in kinases have been experimentally linked to many different types of cancer. We have been text mining all full-text articles downloadable from PubMed Central ftp site (nearly one million articles). All of the automatically identified impacts are later curated by researchers in order to eliminate any false positives. The final mutation impact results collected and curated in this text mining project will be incorporated into the ProKinO ontology.

I have also been working on a variety of projects related to the biology of glycans, and more specifically to glycomics, computational aspects processing data and knowledge about glycans. More information is available on the glycomics Web site.

In the 90's, I have worked on the project to create an information system for the genome mapping of the fungus Aspergillus nidulans, a joined project with of Dr. Jonathan Arnold of the Genetics Department at the University of Georgia. I have designed and implemented a software system called Fungal Genome Data Base (FGDB) capable of constructing and editing of physical maps. The system, capable of generating various maps, editing genome data (for chromosomes, clones, probes, sequences, genes, etc.), and connecting to other genome and protein databases (for example, GenBank), has been successfully used to create and fine-tune the physical maps of Aspergillus nidulans and Nectria Haematococca. Subsequently, it was used to create the physical maps of Neurospora crassa and Pneumocystis carinii.