Query Processing in Graph Databases

by

Supriya Ramireddy

(Under the Direction of John A.Miller)

Abstract

Graph data are extensively associated with state-of-the-art applications in a variety of domains which include Linked Data and Social Media. This drives the need to have graph databases that can effectively store and manage graph data. Relational query processing has become efficient due to many decades of research in the field of data management and processing, among which translating SQL into relational algebra operations plays a key role in query processing. Based on relational algebra, many graph algebras have been defined that can be used for query processing and optimization in graph databases. We propose a graph algebra which operates on graph databases, for processing queries. We have implemented a graph algebra as a part of ScalaTion and compared it with Neo4j and MySQL with respect to query processing times. Various queries are tested on datasets with a few vertices to a large number of vertices. Graph databases perform well when the database gets larger compared to relational databases. Increase in the number of joins in queries, decreases the performance of relational databases, whereas equivalent queries in graph databases comparatively exhibit good performance. Among graph databases compared in the study, ScalaTion shows better performance.

Index words: Graph Databases; Graph Query Language; Graph Algebra; Query Processing; Pattern matching; Query optimization.