Nicholas Sterling

I am a PhD student at the University of Georgia in the department of Computer Science.
My research interests include big data analytics and machine learning, time-series forecasting, and knowledge representation and reasoning.

Research

My main research focus is on big data analytics, in particular knowledge representaiton and reasoning as well as time-series forecasting with statistical and machine learning models.

Time Series Classification and Forecasting

Time series classification is a multinomial labeling task on time series data with uses in traffic forecasting, econometrics and other domains. Time series forecsasting, on the other hand, is the task of producing target value estimates for future time points from past observations. Accurate time series forecasting (i.e. - predicting the future) is arguably the holy grail of predictive data analytics. Forecasting from time series data is invaluable in a range of domains: economists are interested in forecasting economic cycles (i.e. - recessions) using econometric time series data; stock traders want to use stock market time series data to find arbitrage opportunities on the trading floor; and infectious disease experts mitigate and manage pandemics with time series forecasting. Statisticians have applied well-understood statistical theory to create a plethora of probabalistic models for time series forecasting, including the AutoRegressive (AR) and MovingAverage (MA) models, as well as the combined ARMA model and its variants: AR Integrated MA (ARIMA) and Seasonal ARIMA (SARIMA). The AutoRegressive Conditional Heteroscedasticity (ARCH) family of models was devloped to model non-stationary data common to econometrics and includes the Generalized ARCH (GARCH), the Exponential GARCH (EGARCH), Integrated GARCH (IGARCH), and the Fractionally IGARCH (FIGARCH) models. While the statistics community has developed a number of time-series models with strong theoretical foundations and a history of literature to support them, the machine learning (ML) community has developed its own models which, while frequently more opaque than their traditional counterparts, nonetheless achieve impressive results in a variety of domains. A thorough survey of ML models applied to time series forecasting can be found here. Long Short Term Memory (LSTM) Neural Networks are a type of neural network model which are specifically designed for time series analysis. Other ML models may be succesfully applied to time series forecasting which were nonetheless not originally designed for the task; for instance, in my research I have investigated the application of random foreest, gradient boosting, and Convolutional Neural Network models to time series forecasting. A "non-technical" survey of convolutional neural networks can be found here, and one of the first examples of applying CNN to time series data may be found here.

Knowledge Representation and Reasoning

Knowledge Representation and Reasoning (KRR) is the subset of artificial intelligence that deals with, firstly, representing human knowledge in a format that machines can interpret and, secondly, algorithmically reasoning from that representation for various outcomes: producing novel insights, recommending action plans, predicting trends and relationships, etc. One difference between statistical machine learning (ML) methods and KRR is interpretability; whereas traditional ML models are usually "black box" models - in that their decision making process is at best opaque - KRR models are more scrutable to human beings and thus their results are more likely to advance human understanding in a holistic way as opposed to generating impressive results for a narrow problem domain. Recently, interest in combining the fields of KRR and ML more thoroughly has gained traction among researchers; one of the most promising and interesting developments in this vein is the development of the Hinge-Loss Markov Random Field" (HLMRF) mode and Probabilistic Soft Logic (PSL) - a declarative semantics for defining HLMRF for a variety of ML problems. I am currently researching how to apply HLMRF to time series classification. In addition to research on HLMRF, I am currently researching the application of Knowledge Graphs (KG) to time series forecasting as well. KG are a KRR model which organizes relational data into a graphical structure relating nodes to one another via edges representing their relationships to one another. A very interesting survey of knowledge graph research can be found here, and the details of the HLMRF and PSL can be found here.

Vehicle Traffic Modeling

In my research I have used Vehicle Traffic Forecasting as a case study for applying KRR and ML to time series classification and forecasting, though improving vehicle traffic forecasting models is a worthy goal in and of itself. Traffic congestion is not just a nuissance but also one of the main sources of lost productivity in the urban landscape; improved traffic forecasting models can not only ameliorate the psychic burden of traffic congestion on urban residents and visitors but also improve overall productivity. In addition to alleviating traffic congestion, improved vehicle traffic models will benefit first-responders and EMS workers as well as emergency management agencies by providing more reliable and efficient route planning during emergency and disaster scenarios.

Publications

  • Peng, Hao, et al. "Knowledge and Situation-Aware Vehicle Traffic Forecasting." 2019 IEEE International Conference on Big Data (Big Data). IEEE, 2019.
  • Peng, Hao; Klepp, Nicholas; Miller, John. "Traffic Flow Forecasting under Typical and Atypical Conditions." International Journal of Data Science and Analytics. [Submitted for Publication]
  • Sterling N., Miller J.A. (2020) Traffic Incident Detection from Massive Multivariate Time-Series Data. In: Yang Y., Yu L., Zhang LJ. (eds) Cognitive Computing – ICCC 2020. ICCC 2020. Lecture Notes in Computer Science, vol 12408. Springer, Cham. https://doi.org/10.1007/978-3-030-59585-2_10

Teaching

  • CSCI 1301 - Introduction to Computing and Programming
  • CSCI 2610 - Discrete Mathematics for Computer Science
  • CSCI 2670 - Introduction to the Theory of Computing

Graduate Asistant

  • CSCI 1730 - Systems Programming
  • CSCI 2720 - Data Structures

Student

  • CSCI 6070 - Introduction to Game Programming
  • CSCI 6360 - Data Science II
  • CSCI 6380 - Data Mining
  • CSCI 6730 - Operating Systems
  • CSCI 6360 - Data Science II
  • CSCI 8050 - Knowledge-Based Systems
  • CSCI 8360 - Data Science Practicum
  • CSCI 8370 - Advanced Database Systems

Contact

Please feel free to reach out to me at nickbk_at_uga_dot_edu