About

I am an AI researcher currently working on developing on-device LLMs at Meta, where I also previously led model development on text encoder foundation models for recommendation and retrieval systems.

My research has explored ways to leverage structure and invariances in data to build more efficient and effective machine learning models: for instance, by exploiting the structure of language with Tree-LSTMs, the various symmetries in images with Equivariant Transformers, and the underlying physical constraints of the Earth for ML-driven earthquake detection methods.

I hold a Ph.D. in Computer Science from Stanford University, where I was advised by Peter Bailis and Gregory Valiant. While at Stanford, I was a member of the Future Data Systems group and the DAWN project. During my time as an MS student, I was also affiliated with the Stanford NLP Group.

Selected Publications

uCAP: An Unsupervised Prompting Method for Vision-Language Models
A. Tuan Nguyen, Kai Sheng Tai, Sirius Chen, Satya Narayan Shukla, Hanchao Yu, Philip Torr, Taipeng Tian, and Ser-Nam Lim
ECCV 2024 (oral)

Spartan: Differentiable Sparsity via Regularized Transportation
Kai Sheng Tai, Taipeng Tian, and Ser-Nam Lim
NeurIPS 2022
[code]

An End-to-End Earthquake Monitoring Method for Joint Earthquake Detection and Association using Deep Learning
Weiqiang Zhu*, Kai Sheng Tai*, S. Mostafa Mousavi, Peter Bailis, Gregory C. Beroza
* Equal contribution
Journal of Geophysical Research: Solid Earth, 2022

Sinkhorn Label Allocation: Semi-Supervised Classification via Annealed Self-Training
Kai Sheng Tai, Peter Bailis, and Gregory Valiant
ICML 2021
[code]

Equivariant Transformer Networks
Kai Sheng Tai, Peter Bailis, and Gregory Valiant
ICML 2019
[code]

Compressed Factorization: Fast and Accurate Low-Rank Factorization of Compressively-Sensed Data
Vatsal Sharan*, Kai Sheng Tai*, Peter Bailis, and Gregory Valiant
* Equal contribution
ICML 2019
[code]

Moment-Based Quantile Sketches for Efficient High Cardinality Aggregation Queries
Edward Gan, Jialin Ding, Kai Sheng Tai, Vatsal Sharan, and Peter Bailis
VLDB 2018
[code]

Sketching Linear Classifiers over Data Streams
Kai Sheng Tai, Vatsal Sharan, Peter Bailis, and Gregory Valiant
SIGMOD 2018
[code] [extended abstract] [long version] [slides]

Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks
Kai Sheng Tai, Richard Socher, and Christopher D. Manning
ACL 2015
[code] [slides]

Open Source Projects

MobileLLM-Pro (2025): A 1B parameter LLM developed at Meta for efficient on-device inference.

index-baselines (2017): A library for comparing learned index structures to classical data structures like cuckoo hash tables.

neuralart (2016): An early reimplementation of the paper 'A Neural Algorithm of Artistic Style' by Gatys et al.

torch-ntm (2015): A Neural Turing Machine implementation using LuaTorch.