• Ambiguity detection in relational data

    Almost all real-world applications involving large amounts of text data also involve problems of ambiguity. At our industrial partner Recorded Future, this manifests itself as ambiguity in the names of entities like people, places and companies. For example, the name Chris Anderson refers to both the former editor-in-chief of Wired Magazine and to the curator of TED, to name just two. Apart from the information contained in the text itself, many databases contain relational information, such as co-occurrences of names in articles, or friendships in a social network.

    We have developed a system used to detect ambiguous identifiers, such as names, in a dataset where the identifiers is connected in a network. We pose this problem as one of graph classification and have developed both an application to Recorded Future's data, and the theory of this field.

  • Entity Linking

    A demonstrator showing the workings of some of the ideas from the ambiguity detection and the word sense induction work of our group.

    Please send an email to to obtain a runnable demo.

  • Automatic summarization Summarization illustration.

    Efforts in our group on automatic multi-document summarization has resulted in the following methods and software for automatically summarizing text documents.

    • Live Demo Multsum, or download the python source code.

      MULTSUM obtains state-of-the-art performance on standard benchmark datasets by a clever way of simultaneously using more than one way of comparing the contents of sentences.

    • Live demo at Findwise:

      This demo shows several summarization systems at the same time, side by side. Development took place in our group in collaboration with Findwise.

  • Data Minimisation

    DataMin. Proof of concept accompanying the conference submission (Antignac 16). The data minimiser takes a Java program and its associated JML specification in order to generate minimisers for the inputs. The code is available here.

  • Accountability-Aware Surveillance Protocol

    This code represents a model of ASSP (Accountability-Aware Surveillance Protocol) expressed in (a Proverif-based variant of the) applied-pi calculus. It allows to run a verification of the authentication and confidentiality properties elicited in the companion paper. The code is available here.

  • Privacy Policies

    PPF in Diaspora*. We have a prototype implementation of the Privacy Policy Framework PPF presented in (Pardo 14) and (Pardo 16) in the open source social network Diaspora*. The code is available here.

    In particular, we have implemented two types of privacy policies that can be expressed in PPF but are not available in any social network.

    • Protecting against implicit disclosure of a location.

      In the following demo we show how it possible to protect a user's location even when it is disclosed by another user.

    • A user's Location can only be disclosed 2 times every 40 seconds.

      This is an example of new type of privacy polices which we call "Evolving Privacy Policies". It means that privacy policies can change their state depending on the executed events in the social network or the time. In the following demo we show how we can control that the location of a user can only be disclosed up to 2 times every 40 seconds.