Workshop: Deep-Learning Models for Mathematics and Type Theory (2025)

Part of research theme Large Language Models for Mathematics and Programming of the Chalmers Artificial Intelligence Research Centre.

No registration is needed for attendance (subject to venue limitations).

Lunch and coffee breaks are free for registered participants and invited speakers.

Location

OTOO, room 4 on Campus Johanneberg, Gothenburg.

Schedule

	Wed, April 23 Invited talks	Thu, April 24 Invited talks	Fri, April 25 Discussions
before 11:00	Arrival	Discussions	Discussions
11:00–12:00	Josef Urban	Emily Riehl	Aarne Ranta
12:15–13:00	Lunch	Lunch	Lunch
13:15–14:15	Konstantinos Kogkalidis	Fabian Gloeckle	Discussions
14:15–15:00	Coffee break	Coffee break
15:00–16:00	Nicolas Tabareau	Cyril Cohen
after 16:00	Discussions	Discussions

Invited talks

Some experiments with machine learning, reasoning and discovery

Josef Urban, Czech Technical University in Prague

I would like to discuss several experiments and systems combining learning and proving over formal mathematics. I will also discuss methods that try to directly synthesize mathematical objects such as OEIS explanations, using various learning-based approaches.
LLM-free representation learning from type structure

Konstantinos Kogkalidis, Aalto University

Sequential autoregressive models have become a popular backend for automated theorem proving due to their compatibility with pretrained language models. However, this approach reduces the inherently structured nature of theorems and proofs to sequential text, misrepresenting their syntax and semantics. In this work, we introduce a structural alternative tailored for dependent type theory. Our contributions include the first dataset of Agda program-proofs, extracted to capture fine-grained type-level details rather than surface-level representations. Using this dataset, we propose QUILL, a neural architecture designed to model the structural and relational aspects of dependently-typed programs. Despite its small size and limited training data, QUILL achieves strong results, surpassing traditional Transformer-style models by up to an order of magnitude on standard performance metrics.
Preparing ITPs for LLM Integration: Key Challenges and Fixes

Nicolas Tabareau, INRIA

Interactive theorem provers (ITPs) are becoming essential for certifying critical software and formalizing mathematics, offering rigorous proof verification. However, the widespread adoption of ITPs is constrained by the substantial human effort required for formalization. Meanwhile, large language models (LLMs) have transformed automation across various domains, demonstrating remarkable capabilities but also a tendency for hallucinations—generating plausible yet incorrect outputs.

The natural next step is to combine these technologies, leveraging LLMs for automation while ensuring correctness through ITPs. Despite growing interest, recent progress in this direction has been limited. The primary challenge lies in the need for a joint evolution of AI techniques and the complex inner workings of ITPs. A step-by-step approach is required, involving modifications to ITP engines and the construction of a robust dataset of formal proofs for machine learning.

In this talk, I will present how we aim to bridge this gap by integrating LLM techniques into ITPs through multiple research directions:
- automatic translation of libraries between ITPs,
- automated proof refactoring for standardization,
- inference of machine-checked proofs from informal inputs.
I will explain how those challenges require both improvement and better alignements between ITPs (nation of equality, conversion, inference of algebraic structure, tactic automation) as well as the use new impressive LLM techniques (fine tuning, backward translation, …).
Testing artificial mathematical intelligence

Emily Riehl, Johns Hopkins University

As Thurston describes in his famous essay "On proof and progress in mathematics," the answer to the question "What is it that mathematicians accomplish?" is multifaceted. Inspired by Turing's "Computing machinery and intelligence," we propose a series of tests to help identify whether a generative AI system can meaningfully contribute to the process of doing mathematics.
Search, reason or recombine? Paradigms for scaling formal proving

Fabian Gloeckle, Meta AI (FAIR)

In the effort to scale test-time computation for language models on mathematical benchmarks, two prominent paradigms have emerged: large-scale search with reinforcement learning, exemplified by methods like AlphaProof, and long chain-of-thought reasoning with emergent self-verification, as seen in models like o1. For the future of reinforcement learning in formal theorem proving, this opens up a spectrum of hybrid methods. These range from line-level tree search with environment feedback to multi-turn iterative whole proof generation, with and without integrated informal reasoning, to hierarchical problem decompositions and recombination of partial proofs. I will explain these methods as inference methods and discuss the challenges faced when applying reinforcement learning to them.
Towards AI-assisted proof assistants?

Cyril Cohen, INRIA

Proof assistants, such as Rocq, Agda, Lean, Isabelle/HOL, etc., are tools to state and check the correctness of mathematical results. For now, these tools are mostly checkers and barely assistants. In this talk, I will sketch ways in which large language models may help in offering more assistance, through explaining, documenting, refactoring, comparing, suggesting, etc. I will review the different objectives and approaches taken by various actors and sketch prerequisites for achieving the remaining directions.

Contributed talks

Informath: Interlingual autoformalization and informalization with Dedukti and GF

Aarne Ranta, University of Gothenburg and Chalmers University of Technology

The project Informath develops conversions between formal and informal languages of mathematics, at the moment targeting Dedukti, Agda, Coq, and Lean on the formal side and English, French, and Swedish on the informal side. The core of Informath is a two-way conversion between Dedukti and MathCore, an abstract syntax of natural language in GF (Grammatical Framework). Dedukti is used as an interlingua for the formal systems, whereas MathCore is an interlingua for natural languages. By extending the abstract syntax, Informath aims to cover larger fragments of actual mathematical language as found in articles and books.

Workshop: Deep-Learning Models for Mathematics and Type Theory (2025)

Location

Schedule

Invited talks

Some experiments with machine learning, reasoning and discovery

LLM-free representation learning from type structure

Preparing ITPs for LLM Integration: Key Challenges and Fixes

Testing artificial mathematical intelligence

Search, reason or recombine? Paradigms for scaling formal proving

Towards AI-assisted proof assistants?

Contributed talks

Informath: Interlingual autoformalization and informalization with Dedukti and GF

Organizers