I am a Ph.D. candidate in Computer Engineering at Virginia Tech, advised by Prof. Lingjia Liu, and a visiting researcher at Argonne National Laboratory within the Mathematics and Computer Science (MCS) division, where I work closely with Robert Underwood, Sheng Di, and Franck Cappello. My research lies at the intersection of information theory, machine learning, statistical modeling, and large-scale computing systems, with a focus on developing mathematically grounded methods for representing, compressing, and learning from high-dimensional data at scale.
During my Ph.D., I developed the first information-theoretic frameworks for characterizing the fundamental limits of scientific lossy compression under realistic high-performance computing (HPC) constraints. This work bridges a long-standing gap between classical rate-distortion theory and the practical design of production scientific compressors used in exascale computing systems. Building on these foundations, my current research extends rate–distortion principles to foundation models, including large language models (LLMs) and large-scale vision models, through tokenized representation learning frameworks that study the tradeoffs between data, compression, model capacity, and compute in modern machine learning systems.
Data Compression
Rate–distortion bounds for codec design. Compression theory under HPC constraints. Error-bounded lossy compressors for scientific floating-point data.
AI Research
Information-theoretic foundations of LLM scaling laws. Predicting compute-optimal data/model allocation. Token-level rate-distortion for training efficiency.
High-Performance Computing
Exascale I/O optimization. Parallel data reduction. Theory-guided parameter selection for HPC compression pipelines.
Quantitative Research
Probabilistic modeling. Statistical inference for high-dimensional systems. Closed-form analysis of stochastic processes, scaling behavior, and random fields.
Developed mathematical and statistical frameworks for analyzing compression, representation, and scaling in scientific and AI systems.
Developed deep learning frameworks for large-scale cardiac MRI segmentation and EEG analysis that achieved state-of-the-art performance across heterogeneous clinical datasets.