About

With support from a $1.5 million, three-year Transdisciplinary Research in Principles of Data Science (TRIPODS) grant from the National Science Foundation, a multi-disciplinary team of researchers at Johns Hopkins’ Mathematical Institute of Data Science (MINDS) has created the TRIPODS Institute for the Foundations of Graph and Deep Learning at Johns Hopkins University to boost data-driven discovery.

The new institute will bring together mathematicians, statisticians, theoretical computer scientists, electrical and biomedical engineers to develop the foundations for the next generation of data-analysis methods, which will integrate model-based and data-driven approaches. Data science fellows will be trained at the institute, where they will be jointly supervised by faculty members who have complementary expertise in model-based and data driven approaches.

The mission of this new TRIPODS institute comprises both research and education. The team will develop a multidisciplinary research agenda around the foundations of model-based and data-driven approaches to data science, with a focus on the foundations of deep neural and generative models, as well as integrated models that derive strength from both types of models. In addition, the institute will become a regional designation for collaborative work as a result of organizing and staging semester-long focused research themes and workshops, an annual symposium, and a research intensive summer school and workshop on the foundations of data science.

TRIPODS researchers will also develop a unified curriculum for a new minor and master’s program, which will be offered jointly by the departments of Computer Science and Applied Mathematics and Statistics.


People


Raman Arora
Assistant Professor
Department of Computer Science

Amitabh Basu
Assistant Professor
Department of Applied Mathematics and Statistics

Vladimir Braverman
Assistant Professor
Department of Computer Science

Donald Geman
Professor
Department of Applied Mathematics and Statistics

Mauro Maggioni
Bloomberg Distinguished Professor
Department of Mathematics
Department of Applied Mathematics and Statistics

Enrique Mallada
Assistant Professor
Department of Electrical and Computer Engineering

Carey Priebe
Professor
Department of Applied Mathematics and Statistics

Jeremias Sulam
Assistant Professor
Department of Biomedical Engineering

Soledad Villar
Assistant Professor
Department of Applied Mathematics and Statistics

Rene Vidal
Herschel Seder Professor
Department of Biomedical Engineering

Research

TRIPODS will develop a multidisciplinary research agenda on the foundations of model-based and data-driven approaches to data science, with a focus on the foundations of deep neural models (e.g., feed-forward networks, recurrent networks, generative adversarial networks) and generative models (e.g., attributed graphs, dynamical systems) of complex, structured data (e.g., images, shapes, networks), as well as integrated models that benefit from the strengths of both types of models.

Theme I: Foundations of Deep Learning

Recently, deep neural networks (DNNs) have led to dramatic improvements in the performance of pattern recognition systems. For instance, DNNs have revolutionized computer vision, enabling the development of powerful new technologies for face and object recognition in images and videos. However, the mathematical understand ing of DNNs remains shallow. This TRIPODS research theme will focus on developing a mathematical framework based on principles from statistics, optimization, and learning theory for understanding generalization, optimization, and approximation properties of DNNs. 

Theme II: Foundations of Graph Learning

In many modern applications, ranging from network analysis to social networks to information extraction from large data sets in high dimensions, large graphs and processes on them (random walks, epidemics, etc.) play a fundamental role. These graphs are often noisy, being derived from measurements or partial observations, and they often evolve in time, with the number of vertices and edges all changing, in a stochastic fashion. Ever-richer statistical models and machine learning algorithms are needed to model graphs and their dynamics. The study of graphs increased substantially in the last decade, across multiple disciplines, attracting interest in various communities, including statistical signal processing, statistics, computer science, and computational mathematics. There are strong interconnections among multiple related areas of research and problems. In the brief space at our disposal, we organize our discussion by distinguishing questions about the analysis on graphs from questions about the analysis of graphs.


Education and Training

  • The institute will train Data Science Fellows on the foundations of data science, who will be jointly supervised by faculty with complementary expertise in model-based and data-driven approaches. 
  • The institute will organize a series of collaborative events, including a Seminar Series, a hackathon, and an Annual Symposium on the foundations of data science. 
  • The institute will also fund an Annual Summer Research School and Workshop on the foundations of data science, where a team of 3 to 4 faculty, 2 to 3 graduate students and 2to 3 undergraduates work on their dream research topic for a period of 8 weeks. 
  • The institute will also create a new Master in Data Science, which will be jointly offered by the departments of Computer Science and Applied Mathematics and Statistics.

Publications

  1. Ambar Pal, Connor Lane, René Vidal, Benjamin Haeffele. On the Regularization Properties of Structured Dropout. IEEE Conference on Computer Vision and Pattern Recognition, 2020. [pdf]
  2. Poorya Mianjy and Raman Arora, “On Convergence and Generalization of Dropout Training,” In Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2020. [pdf]
  3. Raman Arora, Peter Bartlett, Poorya Mianjy, Nathan Srebro. “Dropout: Explicit Forms and Capacity Control,” 2020. [pdf]
  4. Amitabh Basu, Tu Nguyen and Ao Sun, “Admissibility of solution estimators in stochastic optimization”, to appear in SIAM Journal on Mathematics of Data Science, 2020.
  5. Anirbit Mukherjee, Ramchandran Muthukumar, “A study of neural training with non-gradient and noise assisted gradient methods”, 2020. [pdf]
  6. Anirbit Mukherjee, Ramchandran Muthukumar, “Guarantees on adversarial robustness of training depth-2 neural networks with a stochastic algorithm,” 2020. [pdf]
  7. Jason Miller, Sui Tang, Ming Zhong Mauro Maggioni, “Learning Theory for Inferring Interaction Kernels in Second-Order Interacting Agent Systems,” https://arxiv.org/pdf/2010.03729.pdf.
  8. Zhongyang Li, Fei Lu, Mauro Maggioni, Sui Tang, Cheng Zhang, “On the identifiability of interaction functions in systems of interacting particles,” to appear in Stochastic Processes and their Applications, https://arxiv.org/pdf/1912.11965.pdf.
  9. Hancheng Min and Enrique Mallada. “Dynamics Concentration of Large-Scale Tightly-Connected Networks.” IEEE 58th Conference on Decision and Control (CDC), pp. 758-763, 2019 https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9029796
  10. Vittorio Loprinzo, Laurent Younes and Donald Geman, “A neural network generative model for random dot product graphs,” in preparation, 2020.
  11. Joshua Agterberg, Minh Tang, and Carey Priebe, “On Two Forms of Nonidentifiability in Latent Position Random Graphs,” submitted, 2020
  12. Joshua Agterberg, Minh Tang, and Carey Priebe, “Consistent Nonparametric Hypothesis Testing for Low Rank Random Graphs with Negative and Repeated Eigenvalues,” in preparation, 2020.
  13. Eli Sherman, David Arbour, and Ilya Shpitser. “General Identification of Dynamic Treatment Regimes Under Interference.” Conference on Artificial Intelligence and Statistics (AISTATS), in PMLR 108:3917-3927, 2020 [pdf]
  14. Eli Sherman, David Arbour, and Ilya Shpitser; “Policy Interventions Under Interference.” NeurIPS Workshop on Machine Learning and Causal Inference for Improved Decision Making, 2019
  15. David Arbour, Eli Sherman, Avi Feller, and Alex Franks. “Multitask Gaussian Processes for Causal Inference with Panel Data.” Under Review 2020
  16. Ravi Shankar, Hsi-Wei Hsieh, Nicolas Charon, Archana Venkataraman. “Multi-speaker Emotion Conversion via Latent Variable Regularization in Chained Encoder-Decoder-Predictor Network,” InterSpeech 2020
  17. Ravi Shankar, Jacob Sager, Archana Venkataraman. “Unsupervised Emotion Conversion via Cycle-GAN and Pair Discriminator,” InterSpeech 2020
  18. Jingfeng Wu, Difan Zou, Vladimir Braverman, Quanquan Gu. “Direction Matters: On the Implicit Regularization Effect of Stochastic Gradient Descent with Moderate Learning Rate,” submitted.