Sujata Sinha | PhD Candidate — Data Compression, AI Systems, HPC, Quantitative Research

About

I am a Ph.D. candidate in Computer Engineering at Virginia Tech, advised by Prof. Lingjia Liu, and a visiting researcher at Argonne National Laboratory within the Mathematics and Computer Science (MCS) division, where I work closely with Robert Underwood, Sheng Di, and Franck Cappello. My research lies at the intersection of information theory, machine learning, statistical modeling, and large-scale computing systems, with a focus on developing mathematically grounded methods for representing, compressing, and learning from high-dimensional data at scale.

During my Ph.D., I developed the first information-theoretic frameworks for characterizing the fundamental limits of scientific lossy compression under realistic high-performance computing (HPC) constraints. This work bridges a long-standing gap between classical rate-distortion theory and the practical design of production scientific compressors used in exascale computing systems. Building on these foundations, my current research extends rate–distortion principles to foundation models, including large language models (LLMs) and large-scale vision models, through tokenized representation learning frameworks that study the tradeoffs between data, compression, model capacity, and compute in modern machine learning systems.

Data Compression AI for Science Power-Law Modeling Random Field Theory High-dimensional Stochastic Modeling Probability Theory

What I Specialize In

Data Compression

Rate–distortion bounds for codec design. Compression theory under HPC constraints. Error-bounded lossy compressors for scientific floating-point data.

Rate-Distortion Lossy Compression Codec Design Scientific Data

AI Research

Information-theoretic foundations of LLM scaling laws. Predicting compute-optimal data/model allocation. Token-level rate-distortion for training efficiency.

LLM Scaling Laws Foundation Models Compute Efficiency Representation Learning

High-Performance Computing

Exascale I/O optimization. Parallel data reduction. Theory-guided parameter selection for HPC compression pipelines.

Exascale I/O Parallel Computing Checkpoint Storage Data Reduction

Quantitative Research

Probabilistic modeling. Statistical inference for high-dimensional systems. Closed-form analysis of stochastic processes, scaling behavior, and random fields.

Stochastic Processes Convex Optimization Non-Stationarity Covariance Estimation

Publications

2026

Bridging Information Theory and Practice for Scientific Lossy Compression

S. Sinha, S. Di, V. Rao, R. Underwood, D. Lenz, Z. Jian, Z. Yang, K. Zhao, L. Liu, F. Cappello

ACM HPDC 2026 Best Paper Track

2026

Rate-Distortion Bounds for Heterogeneous Random Fields on Finite Lattices

S. Sinha, V. Rao, R. Underwood, D. Lenz, S. Di, F. Cappello, L. Liu

arXiv preprint Under Review

2026

Multivariate Legendre-SNN on Loihi-2 for Time Series Classification and 5G Jamming Detection

R. Gaurav, S. Sinha, C. Lin, T. C. Stewart, L. Liu, Y. Yi

IEEE Transactions on CASAI

2024

Adversarial Attacks and Defenses for Wireless Signal Classifiers Using CDI-aware GANs

S. Sinha, A. Soysal

IEEE ICC

2023

Channel Aware Adversarial Attacks Are Not Robust

S. Sinha, A. Soysal

IEEE MILCOM

2021

Time-Frequency Analysis of Scalp EEG With Hilbert-Huang Transform and Deep Learning

J. Zheng, M. Liang, S. Sinha, L. Ge, W. Yu, A. Ekstrom, F. Hsieh

IEEE JBHI

2020

Automated Semantic Segmentation of Cardiac Magnetic Resonance Images with Deep Learning

S. Sinha, T. Denney, Y. Zhou, J. Zheng

IEEE ICML-Applications

Experience

2026 — Present

Visiting Student Researcher (AI for Science)

Argonne National Laboratory · Mathematics & Computer Science Division · Lemont, IL

Developed information-theoretic frameworks for compute-optimal scaling in foundation models, analyzing tradeoffs between training data, model capacity, and compute in large-scale AI systems.

2025

Visiting Student Researcher (Data Compression & HPC)

Argonne National Laboratory · Mathematics & Computer Science Division · Lemont, IL

Developed the first theoretical compression limits for scientific data under realistic HPC constraints, bridging information theory with large-scale AI and scientific computing systems.

2021 — Present

Graduate Research Assistant, Computer Engineering

Virginia Tech · BRICCS Lab · Advisor: Prof. Lingjia Liu · Alexandria, VA

Dissertation: Learning, Compression and Scaling Laws: Fundamental Limits under Finite-Resource and Structural Constraints.
Developed mathematical and statistical frameworks for analyzing compression, representation, and scaling in scientific and AI systems.

2019 — 2021

Graduate Research Assistant, Computer Science

Auburn University · Statistical Learning Lab · Advisor: Dr. Jingyi Zheng · Auburn, AL

Dissertation: Development of Deep Learning Models for Biomedical Applications.
Developed deep learning frameworks for large-scale cardiac MRI segmentation and EEG analysis that achieved state-of-the-art performance across heterogeneous clinical datasets.

2016 — 2018

Co-Founder

Pulse Prognostics · Startup

Co-founded a health engineering startup focused on commercializing hardware-based diagnostic technologies, contributing to product development, technical infrastructure, and early-stage business operations. The venture received recognition through the VG Startup Summit and India’s national startup ecosystem initiatives.

Education

2021 — Present

Ph.D. Candidate, Computer Engineering

Virginia Tech · Bradley Department of ECE · Alexandria, VA

2019 — 2021

M.S., Computer Science

Auburn University · Auburn, AL

2014 — 2018

B.Tech., Electronics & Communication Engineering

Uttar Pradesh Technical University · India

Technical Skills

Programming & ML

Python, NumPy, SciPy, Scikit-learn, PyTorch, TensorFlow, Git

Information Theory & Mathematics

Rate-Distortion Theory, Finite Blocklength Theory, Stochastic Processes, Convex Optimization, Spectral Analysis, Bayesian Model Selection

Data Compression

Lossy Compression, Error-Bounded Compression, Codec Design, Quantization

Statistical Modeling

Gaussian Random Fields, Spatial Statistics, Covariance Estimation, Gaussian Mixture Models, Hypothesis Testing, AIC/BIC

HPC

Parallel Computing, Exascale I/O Optimization, Cluster Computing, Scientific Dataset

AI / ML Research

LLM Scaling Laws, Token-Level Rate-Distortion Theory, Foundation Models, Compute-Optimal Training, GANs, Adversarial ML