Reasoning under Cached Address Translation
To develop a formal methodology for reasoning about functional correctness of programs in the presence of software-visible details of underlying memory system such as cached address translation.
Operating system (OS) kernels achieve isolation between user-level processes using multi-level page tables. Page tables encode the address translation from virtual to physical addresses and in most widely deployed architectures, they are hardware-defined data structures that reside in main memory, typically as two- to four-level sparse lookup tables. Traversing a page table can cost up to five memory accesses, therefore, the hardware-implemented Translation Lookaside Buffer (TLB) caches the results of page table lookups.
The OS kernel manages page tables, e.g. by adding, removing, or changing mappings. Since the TLB caches address translation, each of these operations may leave the TLB out of date w.r.t. the page table in memory, and the OS kernel must flush (or invalidate) the TLB before that lack of synchronisation can affect program execution. If this management is done correctly, the TLB has no effect other than speeding up execution. If it is done incorrectly, machine execution will diverge from the semantics usual program logics assume, e.g. wrong memory contents will be read/written, or unexpected memory access faults might occur.
While program logics for reasoning in the presence of address translation exist, reasoning in the presence of a TLB has so far remained hard, and is left as an assumption in all large-scale operating system (OS) kernel verification projects such as seL4 and CertiKOS.
Reasoning about programs under TLB-cached address translation is hard, because
We have approached program verification in the presence of TLB-cached address translation with these two modules:
Sound abstraction of ARMv7-style MMU: We have developed a formal model of an ARMv7-style MMU including TLB and page tables, and have integrated it with the Cambridge ARM model. This model establishes the ground truth of how the hardware behaves in the presence of a TLB, but is hard to reason about. Therefore, we have applied step-wise data refinement in order to arrive at an abstract model that is simple, and easier to reason about.
Our refinement stack has three levels: first we abstract away the TLB's non-determinism, then we remove caching behaviour, eliminating unnecessary state change, and finally we arrive at our abstract TLB model. This abstract model captures the functionality of a TLB by only keeping record of inconsistent virtual addresses. Addresses can be inconsistent within the TLB and with the page tables in memory. We show that this abstract model is well behaved for standard use cases such as user-level programs and OS code under fixed address translation.
TLB-aware program logic: Based on the abstract memory model, we have defined an operational semantics of a simple heap based WHILE language with TLB management primitives. In addition, we have derived Hoare logic rules for the language, including the TLB primitives, and proved soundness w.r.t. the semantics.
The strength of this work apart from the logic itself and its soundness is that it can be used to reason effectively and efficiently about kernel code, that it reduces to simple side-condition checks on kernel code that does not modify page tables, and that the logic reduces to standard Hoare logic for user-level code.
The logic is generic and can easily be adapted to, for instance, the shallow embedding the seL4 specifications use, or the more deeply embedded C semantics of the same project.
Publications and thesis are available at Publications and Talks