Introduction

Research in this topical area will focus on various techniques in prediction interpretation for large-scale, deep learning using multi-source integrated data sets. The goals of this research theme will allow dynamic and unstructured covariate information to be incorporated into a tree-based method, combine the marked temporal point processes with LSTM to model dynamic event data without assuming any parametric forms, use unsupervised learning to build a deep network from a series of shallow networks, each having simpler and more interpretable objective functions to cope with the stochastic nature real-life events, address the high dimensionality of data and action spaces, explore topological and group compression approaches, and investigate design and interpretability issues in deep reinforcement learning. Additionally, new objective loss functions incorporated in any deep networks to solve million-scale problems to automatically and efficiently cluster easy- and hard- samples to optimize deep learning models and better model the distributions of deep features extracted from hard samples in more accurate ways will be developed along with investigating a framework for utilizing transaction data for improved prediction and decision making in medicine and business with a particular goal to improve the identification of opioid use disorder and intervention timing by enhancing prediction and utilizing valuable information from transaction data.


Goals:

  1. Statistical Learning – Random Forests (RF) for Recurrent Event Analytics
    • Faculty Lead: Alan Vasquez
    • Objectives
      • Create the Random Forests for Recurrent Event Analytics, which integrates the RF algorithm with classical statistical methods allows dynamic feature information to be incorporated into a tree-based method.
      • Create the Gradient Boosting method for Recurrent Event Analytics, which integrates the boost trees with classical statistical methods allows dynamic feature information.
      • Perform comparison study between the methodologies above and identify future research directions.
  2. Statistical Learning – Marked Temporal Point Process Enhancements via Long Short-Term Memory Networks
    • Faculty Lead: Chase Rainwater
    • Objectives
      • Develop methodology integrating the marked temporal point process (MTPP) with long short-term memory networks (LSTM)
      • Develop unsupervised and dynamic degradation labeling strategy for remaining life modeling
      • Evaluate and assess approach on real-world discrete data sets
  3. Deep Learning – Novel Approaches
    • Faculty Lead: Md Karim
    • Objectives
      • Extract explanatory features from Deep Network
      • Address high dimensionality issues in Deep Reinforcement Learning (DRL) using algebraic and topological methods
      • Designing a novel rewarding model, and addressing interpretability issues in DRL
  4. Deep Learning – Efficiency and Specification
    • Faculty Lead: Khoa Luu
    • Objectives
      • Create Novel Deep Learning Networks Executable with Reduced Computational Resources and Assess Performance
      • Address Low-cost Deep Learning Algorithmic Analysis and Challenges
      • Explore Low-cost Deep Learning Applications in Natural Images and Medical Images
  5. Harnessing Transaction Data through Feature Engineering
    • Faculty Lead: Shengfan Zhang
    • Objectives
      • Design advanced feature engineering techniques for high-dimensional temporal data
      • Create an improved prediction and decision-making framework incorporating feature engineering with health transaction data
      • Employ and validate the new framework for prediction and decision making with business transaction data

Advancing the State of Knowledge

A major challenge in building secure and widely adopted deep learning systems is that they sometimes make wrong, unexplainable, and/or unpredictable misclassifications. In addition to confusing examples of very different classes, they are also vulnerable to adversarial examples. These systems are often trained as large feed-forward error-back propagating black boxes and thus we have no way of interpreting the meanings of their features and understanding the causes of misclassifications, a situation that can be exploited by attackers. Research in this theme will focus on applying statistical learning techniques alongside more advanced deep learning techniques to address three major challenges.

  1. Violation of fundamental statistics principles
  2. Mode specification and interpretation
  3. Computing in big data environments

We will investigate these challenges surrounding high-dimensional, dynamic, and unstructured data sets and explore solutions in the domains of genomics, transaction scenarios in eCommerce, and supply chain logistics.