Analytics Databases: A Comparative Study

by

Yang Fan

(Under the Direction of John A.Miller)

Abstract

With the emergence of the Big Data era, high performance analytics databases are highly in need in areas such as business intelligence and predictive analytics. Column-oriented databases are created as a type of NoSQL (Not only SQL) databases to fulfill those needs. ScalaTion is an open-source Scala based tool for simulation, optimization and analytics, and it includes an implementation of column-oriented in-memory database that can handle high performance analytics. The database provides an easy way to transform a table into a matrix which may be used as input for other advanced machine-learning models that are also available in ScalaTion. Fifteen different experiments are conducted to evaluate the performances of five databases: ScalaTion, MySQL, SQLite, SparkSQL and ClickHouse. The performance of ScalaTion is for the most part on par with those of open-source column-oriented databases and at times can be significantly better.

Index words: Analytics database, Column-oriented database, Big Data analytics pipeline