I am Xinyi Wang (王心怡), a Postdoctoral Researcher at the Princeton Language and Intelligence Lab, working closely with Danqi Chen. I received my Ph.D. degree from the University of California, Santa Barbara (UCSB), where I was advised by William Yang Wang. I’ve also interned at MIT-IBM Watson AI Lab and Microsoft Research before. I am honored to have received the J.P. Morgan AI Ph.D. Fellowship and the UCSB Computer Science Outstanding Publication Award. My research focuses on developing a principled understanding of large foundation models from their pretraining data distribution, with the goal of improving their capabilities, addressing their limitations, and optimizing their application across diverse domains. You can download my CV here.
Selected Publications
* indicates equal contribution
-
Hubs or Fringes? Pretraining Data Selection via Web Graph Centrality [paper]
Vedant Badoni, Danqi Chen, Xinyi Wang
To Be Released
-
Finding the Minimal Parameter Budget for Implicit Reasoning: A Data Complexity Driven Scaling Law for Language Models
Xinyi Wang, Shawn Tan, Shenbo Xu, Mingyu Jin, William Yang Wang, Rameswar Panda, Yikang Shen
-
Gödel Agent: A Self-Referential Agent Framework for Recursive Self-Improvement
Xunjian Yin, Xinyi Wang, Liangming Pan, Xiaojun Wan, William Yang Wang
-
Generalization v.s. Memorization: Tracing Language Models’ Capabilities Back to Pretraining Data
Xinyi Wang*, Antonis Antoniades*, Yanai Elazar, Alfonso Amayuelas, Alon Albalak, Kexun Zhang, William Yang Wang
-
Guiding Language Model Math Reasoning with Planning Tokens
Xinyi Wang, Lucas Caccia, Oleksiy Ostapenko, Xingdi Yuan, William Yang Wang, Alessandro Sordoni
Proceedings of COLM 2024, Philadelphia (poster) [paper][code]
-
Understanding the Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation
Xinyi Wang, Alfonso Amayuelas, Kexun Zhang, Liangming Pan, Wenhu Chen, William Yang Wang
-
Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning
Xinyi Wang, Wanrong Zhu, Michael Saxon, Mark Steyvers, William Yang Wang
Proceedings of NeurIPS 2023, New Orleans (poster) [paper][code]
-
Causal Balancing for Domain Generalization
Xinyi Wang, Michael Saxon, Jiachen Li, Hongyang Zhang, Kun Zhang, William Yang Wang
Talks
- Talk at PLI lunch, May 2026: [slides]
- Talk at PLI lunch, September 2025: [slides]
- Talk at ICML 2025 MOSS workshop, July 2025: [slides]
- My academic job talk given at multiple institutes, Feb-April 2025/PhD defense presentation given, May 2025: [slides]
- My PhD proposal presentation, March 2024: [slides]
- Talk at Tsinghua University and Peking University, October 2023: [slides]
- Talk at Hong Kong University of Science and Technology, May 2023: [slides]
- My PhD major area exam presentation, March 2023: [slides]
Services
- Reviewer: NeurIPS, AAAI, NeurIPS, ICLR, ICML (2026 gold reviewer), COLM, AISTATS, TPAMI, TMLR
- Organizer: ICLR 2025 Open Science for Foundation Models Workshop, ICLR 2026 Latent & Implicit Thinking – Going Beyond CoT Reasoning Workshop