I am Xinyi Wang (王心怡), a Postdoctoral Researcher at the Princeton Language and Intelligence Lab, working closely with Danqi Chen. I received my Ph.D. degree from the University of California, Santa Barbara (UCSB), where I was advised by William Yang Wang. I’ve also interned at MIT-IBM Watson AI Lab and Microsoft Research before. I am honored to have received the J.P. Morgan AI Ph.D. Fellowship and the UCSB Computer Science Outstanding Publication Award. My research focuses on developing a principled understanding of large foundation models from their pretraining data distribution, with the goal of improving their capabilities, addressing their limitations, and optimizing their application across diverse domains. You can download my CV here.

Selected Publications

* indicates equal contribution

  • Hubs or Fringes? Pretraining Data Selection via Web Graph Centrality [paper]

    Vedant Badoni, Danqi Chen, Xinyi Wang

    To Be Released

  • Finding the Minimal Parameter Budget for Implicit Reasoning: A Data Complexity Driven Scaling Law for Language Models

    Xinyi Wang, Shawn Tan, Shenbo Xu, Mingyu Jin, William Yang Wang, Rameswar Panda, Yikang Shen

    Proceedings of ICML 2026, Seoul (poster) [paper][code]

  • Gödel Agent: A Self-Referential Agent Framework for Recursive Self-Improvement

    Xunjian Yin, Xinyi Wang, Liangming Pan, Xiaojun Wan, William Yang Wang

    Proceedings of ACL 2025, Vienna (poster) [paper][code]

  • Generalization v.s. Memorization: Tracing Language Models’ Capabilities Back to Pretraining Data

    Xinyi Wang*, Antonis Antoniades*, Yanai Elazar, Alfonso Amayuelas, Alon Albalak, Kexun Zhang, William Yang Wang

    Proceedings of ICLR 2025, Singapore (poster) [paper][code]

  • Guiding Language Model Math Reasoning with Planning Tokens

    Xinyi Wang, Lucas Caccia, Oleksiy Ostapenko, Xingdi Yuan, William Yang Wang, Alessandro Sordoni

    Proceedings of COLM 2024, Philadelphia (poster) [paper][code]

  • Understanding the Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation

    Xinyi Wang, Alfonso Amayuelas, Kexun Zhang, Liangming Pan, Wenhu Chen, William Yang Wang

    Proceedings of ICML 2024, Vienna (poster) [paper][code]

  • Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning

    Xinyi Wang, Wanrong Zhu, Michael Saxon, Mark Steyvers, William Yang Wang

    Proceedings of NeurIPS 2023, New Orleans (poster) [paper][code]

  • Causal Balancing for Domain Generalization

    Xinyi Wang, Michael Saxon, Jiachen Li, Hongyang Zhang, Kun Zhang, William Yang Wang

    Proceedings of ICLR 2023, Rwanda (poster) [paper][code]

Talks

  • Talk at PLI lunch, May 2026: [slides]
  • Talk at PLI lunch, September 2025: [slides]
  • Talk at ICML 2025 MOSS workshop, July 2025: [slides]
  • My academic job talk given at multiple institutes, Feb-April 2025/PhD defense presentation given, May 2025: [slides]
  • My PhD proposal presentation, March 2024: [slides]
  • Talk at Tsinghua University and Peking University, October 2023: [slides]
  • Talk at Hong Kong University of Science and Technology, May 2023: [slides]
  • My PhD major area exam presentation, March 2023: [slides]

Services