About

I am a Ph.D. candidate in Computer Engineering at Virginia Tech, advised by Prof. Lingjia Liu, and a visiting researcher at Argonne National Laboratory within the Mathematics and Computer Science (MCS) division, where I work closely with Robert Underwood, Sheng Di, and Franck Cappello. My research lies at the intersection of information theory, machine learning, statistical modeling, and large-scale computing systems, with a focus on developing mathematically grounded methods for representing, compressing, and learning from high-dimensional data at scale.

During my Ph.D., I developed the first information-theoretic frameworks for characterizing the fundamental limits of scientific lossy compression under realistic high-performance computing (HPC) constraints. This work bridges a long-standing gap between classical rate-distortion theory and the practical design of production scientific compressors used in exascale computing systems. Building on these foundations, my current research extends rate–distortion principles to foundation models, including large language models (LLMs) and large-scale vision models, through tokenized representation learning frameworks that study the tradeoffs between data, compression, model capacity, and compute in modern machine learning systems.

Data Compression AI for Science Power-Law Modeling Random Field Theory High-dimensional Stochastic Modeling Probability Theory
What I Specialize In

Data Compression

Rate–distortion bounds for codec design. Compression theory under HPC constraints. Error-bounded lossy compressors for scientific floating-point data.

Rate-Distortion Lossy Compression Codec Design Scientific Data

AI Research

Information-theoretic foundations of LLM scaling laws. Predicting compute-optimal data/model allocation. Token-level rate-distortion for training efficiency.

LLM Scaling Laws Foundation Models Compute Efficiency Representation Learning

High-Performance Computing

Exascale I/O optimization. Parallel data reduction. Theory-guided parameter selection for HPC compression pipelines.

Exascale I/O Parallel Computing Checkpoint Storage Data Reduction

Quantitative Research

Probabilistic modeling. Statistical inference for high-dimensional systems. Closed-form analysis of stochastic processes, scaling behavior, and random fields.

Stochastic Processes Convex Optimization Non-Stationarity Covariance Estimation
Publications
2026
Bridging Information Theory and Practice for Scientific Lossy Compression
S. Sinha, S. Di, V. Rao, R. Underwood, D. Lenz, Z. Jian, Z. Yang, K. Zhao, L. Liu, F. Cappello
ACM HPDC 2026 Best Paper Track
2026
S. Sinha, V. Rao, R. Underwood, D. Lenz, S. Di, F. Cappello, L. Liu
arXiv preprint Under Review
2026
R. Gaurav, S. Sinha, C. Lin, T. C. Stewart, L. Liu, Y. Yi
IEEE Transactions on CASAI
2023
S. Sinha, A. Soysal
IEEE MILCOM
2021
J. Zheng, M. Liang, S. Sinha, L. Ge, W. Yu, A. Ekstrom, F. Hsieh
IEEE JBHI
2020
S. Sinha, T. Denney, Y. Zhou, J. Zheng
IEEE ICML-Applications
Experience
2026 — Present
Visiting Student Researcher (AI for Science)
Argonne National Laboratory · Mathematics & Computer Science Division · Lemont, IL
Developed information-theoretic frameworks for compute-optimal scaling in foundation models, analyzing tradeoffs between training data, model capacity, and compute in large-scale AI systems.
2025
Visiting Student Researcher (Data Compression & HPC)
Argonne National Laboratory · Mathematics & Computer Science Division · Lemont, IL
Developed the first theoretical compression limits for scientific data under realistic HPC constraints, bridging information theory with large-scale AI and scientific computing systems.
2021 — Present
Graduate Research Assistant, Computer Engineering
Virginia Tech · BRICCS Lab · Advisor: Prof. Lingjia Liu · Alexandria, VA
Dissertation: Learning, Compression and Scaling Laws: Fundamental Limits under Finite-Resource and Structural Constraints.
Developed mathematical and statistical frameworks for analyzing compression, representation, and scaling in scientific and AI systems.
2019 — 2021
Graduate Research Assistant, Computer Science
Auburn University · Statistical Learning Lab · Advisor: Dr. Jingyi Zheng · Auburn, AL
Dissertation: Development of Deep Learning Models for Biomedical Applications.
Developed deep learning frameworks for large-scale cardiac MRI segmentation and EEG analysis that achieved state-of-the-art performance across heterogeneous clinical datasets.
2016 — 2018
Co-Founder
Pulse Prognostics · Startup
Co-founded a health engineering startup focused on commercializing hardware-based diagnostic technologies, contributing to product development, technical infrastructure, and early-stage business operations. The venture received recognition through the VG Startup Summit and India’s national startup ecosystem initiatives.
Education
2021 — Present
Ph.D. Candidate, Computer Engineering
Virginia Tech · Bradley Department of ECE · Alexandria, VA
2019 — 2021
M.S., Computer Science
Auburn University · Auburn, AL
2014 — 2018
B.Tech., Electronics & Communication Engineering
Uttar Pradesh Technical University · India

Technical Skills
Programming & ML
Python, NumPy, SciPy, Scikit-learn, PyTorch, TensorFlow, Git
Information Theory & Mathematics
Rate-Distortion Theory, Finite Blocklength Theory, Stochastic Processes, Convex Optimization, Spectral Analysis, Bayesian Model Selection
Data Compression
Lossy Compression, Error-Bounded Compression, Codec Design, Quantization
Statistical Modeling
Gaussian Random Fields, Spatial Statistics, Covariance Estimation, Gaussian Mixture Models, Hypothesis Testing, AIC/BIC
HPC
Parallel Computing, Exascale I/O Optimization, Cluster Computing, Scientific Dataset
AI / ML Research
LLM Scaling Laws, Token-Level Rate-Distortion Theory, Foundation Models, Compute-Optimal Training, GANs, Adversarial ML