I am Xinyi Wang (王心怡), a final year computer science PhD candidate at University of California, Santa Barbara (UCSB). I am advised by professor William Yang Wang. I have also worked with Yi Yang, Kun Zhang, Alessandro Sordoni, Yikang Shen, and Rameswar Pandas. I have interned at MSR Montreal in 2023 summer, and MIT-IBM Watson lab in 2024 summer. I’m honored to be awarded a J.P. Morgan AI PhD Fellowship. My research focuses on developing a principled understanding of deep learning models, especially large language models, with the goal of improving their capabilities, addressing their limitations, and optimizing their application across diverse domains. My CV can be downloaded here.

I’m on the job market right now. Please feel free to reach out to me if you think I could be a good fit!

Education

  • University of California, Santa Barbara, Oct 2020 - Present
    • Ph.D. in Computer Science
  • Hong Kong University of Science and Technology, Sep 2016 - Jul 2020
    • B.Sc. in Applied Mathematics and Computer Science

* indiacts equal contribution

Preprints

  • Generalization v.s. Memorization: Tracing Language Models’ Capabilities Back to Pretraining Data

    Xinyi Wang*, Antonis Antoniades*, Yanai Elazar, Alfonso Amayuelas, Alon Albalak, Kexun Zhang, William Yang Wang

    Arxiv Preprint [paper]

  • Gödel Agent: A Self-Referential Agent Framework for Recursive Self-Improvement

    Xunjian Yin, Xinyi Wang, Liangming Pan, Xiaojun Wan, William Yang Wang

    Arxiv Preprint [paper]

  • Understanding the Interplay between Parametric and Contextual Knowledge for Large Language Models

    Sitao Cheng, Liangming Pan, Xunjian Yin, Xinyi Wang, William Yang Wang

    Arxiv Preprint [paper]

(Co)-First authored publications

  • Guiding Language Model Math Reasoning with Planning Tokens

    Xinyi Wang, Lucas Caccia, Oleksiy Ostapenko, Xingdi Yuan, William Yang Wang, Alessandro Sordoni

    Proceedings of COLM 2024, Philadelphia [paper][code]

  • Understanding the Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation

    Xinyi Wang, Alfonso Amayuelas, Kexun Zhang, Liangming Pan, Wenhu Chen, William Yang Wang

    Proceedings of ICML 2024, Vienna (poster) [paper][code]

  • Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning

    Xinyi Wang, Wanrong Zhu, Michael Saxon, Mark Steyvers, William Yang Wang

    Proceedings of NeurIPS 2023, New Orleans (poster) [paper][code]

  • Causal Balancing for Domain Generalization

    Xinyi Wang, Michael Saxon, Jiachen Li, Hongyang Zhang, Kun Zhang, William Yang Wang

    Proceedings of ICLR 2023, Rwanda (poster) [paper][code]

  • Counterfactual Maximum Likelihood Estimation for Training Deep Networks

    Xinyi Wang, Wenhu Chen, Michael Saxon, William Yang Wang

    Proceedings of NeurIPS 2021, Virtual (poster) [paper][code]

  • RefBERT: Compressing BERT by Referencing to Pre-computed Representations

    Xinyi Wang*, Haiqin Yang*, Liang Zhao, Yang Mo and Jianping Shen

    Proceedings of IJCNN 2021, Virtual (oral) [paper]

  • Neural Topic Model with Attention for Supervised Learning

    Xinyi Wang, Yi Yang

    Proceedings of AISTATS 2020, Virtual (poster) [paper][code]

Coauthored publications

  • T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback

    Jiachen Li, Weixi Feng, Tsu-Jui Fu, Xinyi Wang, Sugato Basu, Wenhu Chen, William Yang Wang

    Proceedings of NeurIPS 2024, Vancouver [paper]

  • A Survey on Data Selection for Language Models

    Alon Albalak, Yanai Elazar, Sang Michael Xie, Shayne Longpre, Nathan Lambert, Xinyi Wang, Niklas Muennighoff, Bairu Hou, Liangming Pan, Haewon Jeong, Colin Raffel, Shiyu Chang, Tatsunori Hashimoto, William Yang Wang

    TMLR 2024 [paper]

  • Position Paper: Understanding the Role of Social Media Influencers in AI Research Visibility

    Iain Xie Weissburg, Mehir Arora, Xinyi Wang, Liangming Pan, William Yang Wang.

    Proceedings of ICML 2024, Vienna [paper]

  • Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies

    Liangming Pan, Michael Saxon, Wenda Xu, Deepak Nathani, Xinyi Wang, William Yang Wang

    TACL 2024 [paper]

  • Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks

    Wenhu Chen, Xueguang Ma, Xinyi Wang, William W. Cohen

    TMLR 2023 [paper][code]

  • Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning

    Liangming Pan, Alon Albalak, Xinyi Wang, William Yang Wang

    Findings of EMNLP 2023, Singapore [paper][code]

  • TheoremQA: A Theorem-driven Question Answering dataset

    Wenhu Chen, Ming Yin, Max Ku, Elaine Wan, Xueguang Ma, Jianyu Xu, Tony Xia, Xinyi Wang, Pan Lu

    Proceedings of EMNLP 2023, Singapore [paper][code]

  • Collaborative Generative AI: Integrating GPT-k for Efficient Editing in Text-to-Image Generation

    Wanrong Zhu, Xinyi Wang, Yujie Lu, Tsu-Jui Fu, Xin Eric Wang, Miguel Eckstein, William Yang Wang

    Proceedings of EMNLP 2023, Singapore [paper]

  • PECO: Examining Single Sentence Label Leakage in Natural Language Inference Datasets through Progressive Evaluation of Cluster Outliers

    Michael Saxon, Xinyi Wang, Wenda Xu, William Yang Wang

    Proceedings of EACL 2023, Croatia [paper][code]

  • A Dataset for Answering Time-Sensitive Questions

    Wenhu Chen, Xinyi Wang, William Yang Wang

    Proceedings of NeurIPS 2021 Datasets and Benchmarks Track, Virtual (poster) [paper][code]

  • Modeling Discolsive Transparency in NLP Application Descriptions

    Michael Saxon, Sharon Levy, Xinyi Wang, Alon Albalak, William Yang Wang

    Proceedings of EMNLP 2021, Virtual (oral) [paper][code]

* indiacts equal contribution

Talks

  • My PhD major area exam presentation in March 2023: [slides]
  • Talk at Hong Kong University of Science and Technology in May 2023: [slides]
  • Talk at Tsinghua University on October 19, 2023 and at Peking University on October 23, 2023: [slides]
  • My PhD proposal presentation in March 2024: [slides]

Services

  • Reviewer: NeurIPS Datasets and Benchmarks Track (2021), AAAI (2022, 2023), NeurIPS (2023,2024), ICLR (2024), ICML (2024), COLM (2024), TPAMI(2024)