Part of research theme Large Language Models for Mathematics and Programming of the Chalmers Artificial Intelligence Research Centre.
No registration is needed for attendance (subject to venue limitations).
Lunch and coffee breaks are free for registered participants and invited speakers.
Wed, April 23 Invited talks | Thu, April 24 Invited talks | Fri, April 25 Discussions | |
---|---|---|---|
before 11:00 | Arrival | Discussions | Discussions |
11:00–12:00 | Josef Urban | Emily Riehl | Aarne Ranta |
12:15–13:00 | Lunch | Lunch | Lunch |
13:15–14:15 | Konstantinos Kogkalidis | Fabian Gloeckle | Discussions |
14:15–15:00 | Coffee break | Coffee break | |
15:00–16:00 | Nicolas Tabareau | Cyril Cohen | |
after 16:00 | Discussions | Discussions |
Josef Urban, Czech Technical University in Prague
I would like to discuss several experiments and systems combining learning and proving over formal mathematics. I will also discuss methods that try to directly synthesize mathematical objects such as OEIS explanations, using various learning-based approaches.
Konstantinos Kogkalidis, Aalto University
Sequential autoregressive models have become a popular backend for automated theorem proving due to their compatibility with pretrained language models. However, this approach reduces the inherently structured nature of theorems and proofs to sequential text, misrepresenting their syntax and semantics. In this work, we introduce a structural alternative tailored for dependent type theory. Our contributions include the first dataset of Agda program-proofs, extracted to capture fine-grained type-level details rather than surface-level representations. Using this dataset, we propose QUILL, a neural architecture designed to model the structural and relational aspects of dependently-typed programs. Despite its small size and limited training data, QUILL achieves strong results, surpassing traditional Transformer-style models by up to an order of magnitude on standard performance metrics.
Nicolas Tabareau, INRIA
Interactive theorem provers (ITPs) are becoming essential for certifying critical software and formalizing mathematics, offering rigorous proof verification. However, the widespread adoption of ITPs is constrained by the substantial human effort required for formalization. Meanwhile, large language models (LLMs) have transformed automation across various domains, demonstrating remarkable capabilities but also a tendency for hallucinations—generating plausible yet incorrect outputs.
The natural next step is to combine these technologies, leveraging LLMs for automation while ensuring correctness through ITPs. Despite growing interest, recent progress in this direction has been limited. The primary challenge lies in the need for a joint evolution of AI techniques and the complex inner workings of ITPs. A step-by-step approach is required, involving modifications to ITP engines and the construction of a robust dataset of formal proofs for machine learning.
In this talk, I will present how we aim to bridge this gap by integrating LLM techniques into ITPs through multiple research directions:
I will explain how those challenges require both improvement and better alignements between ITPs (nation of equality, conversion, inference of algebraic structure, tactic automation) as well as the use new impressive LLM techniques (fine tuning, backward translation, …).
Emily Riehl, Johns Hopkins University
As Thurston describes in his famous essay "On proof and progress in mathematics," the answer to the question "What is it that mathematicians accomplish?" is multifaceted. Inspired by Turing's "Computing machinery and intelligence," we propose a series of tests to help identify whether a generative AI system can meaningfully contribute to the process of doing mathematics.
Fabian Gloeckle, Meta AI (FAIR)
In the effort to scale test-time computation for language models on mathematical benchmarks, two prominent paradigms have emerged: large-scale search with reinforcement learning, exemplified by methods like AlphaProof, and long chain-of-thought reasoning with emergent self-verification, as seen in models like o1. For the future of reinforcement learning in formal theorem proving, this opens up a spectrum of hybrid methods. These range from line-level tree search with environment feedback to multi-turn iterative whole proof generation, with and without integrated informal reasoning, to hierarchical problem decompositions and recombination of partial proofs. I will explain these methods as inference methods and discuss the challenges faced when applying reinforcement learning to them.
Cyril Cohen, INRIA
Proof assistants, such as Rocq, Agda, Lean, Isabelle/HOL, etc., are tools to state and check the correctness of mathematical results. For now, these tools are mostly checkers and barely assistants. In this talk, I will sketch ways in which large language models may help in offering more assistance, through explaining, documenting, refactoring, comparing, suggesting, etc. I will review the different objectives and approaches taken by various actors and sketch prerequisites for achieving the remaining directions.
Aarne Ranta, University of Gothenburg and Chalmers University of Technology
The project Informath develops conversions between formal and informal languages of mathematics, at the moment targeting Dedukti, Agda, Coq, and Lean on the formal side and English, French, and Swedish on the informal side. The core of Informath is a two-way conversion between Dedukti and MathCore, an abstract syntax of natural language in GF (Grammatical Framework). Dedukti is used as an interlingua for the formal systems, whereas MathCore is an interlingua for natural languages. By extending the abstract syntax, Informath aims to cover larger fragments of actual mathematical language as found in articles and books.