Research
I study learning from data with structures, motivated by applications in healthcare and social science. Specifically, my research focuses on high-dimensional data that are supported on structures such as graphs/networks, low-dimensional manifolds, or sparsity, and explores how our knowledge about those structures can be exploited to solve challenging inference problems more accurately and quickly.
Mathematically, I am interested in developing algorithms for nonconvex optimization with provable performance guarantees using data geometry, for which I leverage signal processing, machine learning, optimization, information theory, network science, spectral graph theory, and (high-dimensional) statistics.
I am interested in data science (statistical signal processing and machine learning) with
- high-dimensional data supported on graph structures (networks)
- high-dimensional data with low-dimensional underlying structure (subspaces, manifold)
- machine learning applied to clinical time series data
- science of science and data science for social good
- nonconvex optimization; multitask learning; optimal transport
Publications
In author list, * denotes equal contribution, and † indicates alphabetical ordering.
Data Theory and Methods
Graph-structured data
Graph regularization for trend filtering (applied to transportation data), matrix factorization (applied to remote sensing), and federated multi-task learning (applied to census data).
H Lee, AL Bertozzi, J Kovačević and Y Chi, Privacy-Preserving Federated Multi-task Linear Regression: A One-Shot Linear Mixing Approach Inspired by Graph Regularization, ICASSP, 2022.
J Qin, H Lee, J Chi, L Drumetz, J Chanussot, Y Lou and AL Bertozzi, Blind Hyperspectral Unmixing Based on Graph Total Variation Regularization, IEEE Transactions on Geoscience and Remote Sensing, 2021. Related work presented at WHISPERS.
R Varma*, H Lee*, J Kovačević and Y Chi, Vector-Valued Graph Trend Filtering With Non-Convex Penalties, IEEE Transactions on Signal and Information Processing over Networks, 2020. Partial results presented at ICASSP.
Data supported on low-dimensional structure
A Lee†, H Lee†, JA Perea†, N Schonsheck† and M Weinstein†, \(O(k)\)-Equivariant Dimensionality Reduction on Stiefel Manifolds, Preprint, 2023.
H Lee, Multitask Principal Components Analysis (PCA), Chapter 4.5 in Better Inference with Graph Regularization (CMU PhD Thesis), 2021.
Optimal transport
- A Blob Method for Dynamic Optimal Transport with State and Control Constraints. Ongoing work with Katy Craig and Karthik Elamvazhuthi.
Healthcare Applications
Speech and audio
Deep learning for non-semantic speech and audio signals including coughing and sneezing.
H Lee and A Saeed, Distilled Non-Semantic Speech Embeddings with Binary Neural Networks for Low-Resource Devices, Preprint, 2022.
H Lee, A Saeed and AL Bertozzi, Active Learning of Non-semantic Speech Tasks with Pretrained Models, ICASSP, 2023.
Sleep health
Pediatric sleep scoring.
H Lee and A Saeed, Automatic Sleep Scoring from Large-scale Multi-channel Pediatric EEG, Learning from Time Series for Health (NeurIPS Workshop), 2022.
H Lee, B Li, S DeForte, ML Splaingard, Y Huang, Y Chi and SL Linwood, A Large Collection of Real-world Pediatric Sleep Studies, Nature Scientific Data, 2022.
Cardiology
Risk model from long-term EEG signals.
- H Lee, Building improved risk stratification models for patients post non ST-segment elevation acute coronary syndrome using ambulatory ECG data, MIT Master’s Thesis, 2017.