This course provides an introduction to geometric and topological methods in data science. Our starting point is the manifold hypothesis: that high dimensional data live on or near a much lower dimensional smooth manifold. We introduce tools to study the geometric and topological properties of this manifold in order to reveal relevant features and organization of the data. Topics include: metric space structures, curvature, geodesics, diffusion maps, eigenmaps, geometric model spaces, gradient descent, data embeddings and projections, and topological data analysis (TDA) in the form of persistence homology and their associated "barcodes." We see applications of these methods in a variety of data types.
Deep neural networks have gained immense popularity within the last decade due to their success in many important machine learning tasks such as image recognition, speech recognition, and natural language processing. This course will provide a principled and hands-on approach to deep learning with neural networks. By the end of the course, students will have mastered the principles and practices underlying neural networks including modern methods of deep learning, and will have applied deep learning methods to real-world problems including image recognition, natural language processing, and biomedical applications. The course will be based on homework, a final exam, and a final project (either group or individual, depending on the total number enrolled). The project will include both a written and oral (i.e. presentation) component. Students' grades will be based on their homework scores and the quality of the written and oral component of their projects. The course assumes basic prior knowledge in linear algebra and probability.
An overview of advances in the past decade in machine learning and automatic data-mining approaches for dealing with the broad scope of modern data-analysis challenges, including deep learning, kernel methods, dictionary learning, and bag of words/features. This year, the focus is on a broad scope of biomedical data-analysis tasks, such as single-cell RNA sequencing, single-cell signaling and proteomic analysis, health care assessment, and medical diagnosis and treatment recommendations. The seminar is based on student presentations and discussions of recent prominent publications from leading journals and conferences in the field.
This course focuses on machine-learning methods well-suited to tackling problems associated with analyzing high-dimensional, high-throughput noisy data including: manifold learning, graph signal processing, nonlinear dimensionality reduction, clustering, and information theory. Though the class goes over some biomedical applications, such methods can be applied in any field.