At Meta, we’ve been working to incorporate privacy into different systems of our software stack as part of our Privacy Aware Infrastructure (PAI) initiative. PAI offers efficient and reliable first-class privacy constructs embedded in Meta infrastructure to address complex privacy issues. In this talk, we will describe Policy Zones: an Information-Flow Control system that is deployed across our infrastructure to address privacy restrictions on data, such as using data only for allowed purposes, providing strong guarantees for limiting the purposes of its processing.
In this talk, we describe how we model the restrictions on data through a mix of toy examples and a real-world case study. Our approach to enforcing restrictions on data involves using annotations to represent different aspects of data and its processing and using these annotations to apply policy checks across data flows. Equipped with privacy-relevant annotations, we show how Policy Zones enforces high-level data restrictions across two paradigms that, together, encompass the common lifecycle of data: general-purpose programming languages where the data is initially collected, and data warehouse systems where the data is processed in batch.
There are several challenges in designing Policy Zones, including: translating high-level privacy restrictions to code; handling different data granularities to avoid label creep; maintaining homogeneity of data annotations across heterogeneous data processing systems; managing reclassification in practice; and the scale of applying this tech to large companies such as Meta.
Modular programming is a key concept in software development where the program consists of code modules that are designed and implemented independently. This approach accelerates the development process and enhances scalability of the final product. Modules, however, are often written by third parties, aggravating security concerns such as stealing confidential information, tampering with sensitive data, and executing malicious code.