Almost all real-world applications involving large amounts of text data also involve problems of ambiguity. At our industrial partner Recorded Future, this manifests itself as ambiguity in the names of entities like people, places and companies. For example, the name Chris Anderson refers to both the former editor-in-chief of Wired Magazine and to the curator of TED, to name just two. Apart from the information contained in the text itself, many databases contain relational information, such as co-occurrences of names in articles, or friendships in a social network.
We have developed a system used to detect ambiguous identifiers, such as names, in a dataset where the identifiers is connected in a network. We pose this problem as one of graph classification and have developed both an application to Recorded Future's data, and the theory of this field.
A demonstrator showing the workings of some of the ideas from the ambiguity detection and the word sense induction work of our group.
Please send an email to mogren@chalmers.se to obtain a runnable demo.
Efforts in our group on automatic multi-document summarization has resulted in the following methods and software for automatically summarizing text documents.
MULTSUM obtains state-of-the-art performance on standard benchmark datasets by a clever way of simultaneously using more than one way of comparing the contents of sentences.
This demo shows several summarization systems at the same time, side by side. Development took place in our group in collaboration with Findwise.
DataMin. Proof of concept accompanying the conference submission (Antignac 16). The data minimiser takes a Java program and its associated JML specification in order to generate minimisers for the inputs. The code is available here.
This code represents a model of ASSP (Accountability-Aware Surveillance Protocol) expressed in (a Proverif-based variant of the) applied-pi calculus. It allows to run a verification of the authentication and confidentiality properties elicited in the companion paper. The code is available here.
PPF in Diaspora*. We have a prototype implementation of the Privacy Policy Framework PPF presented in (Pardo 14) and (Pardo 16) in the open source social network Diaspora*. The code is available here.
In particular, we have implemented two types of privacy policies that can be expressed in PPF but are not available in any social network.
In the following demo we show how it possible to protect a user's location even when it is disclosed by another user.
This is an example of new type of privacy polices which we call "Evolving Privacy Policies". It means that privacy policies can change their state depending on the executed events in the social network or the time. In the following demo we show how we can control that the location of a user can only be disclosed up to 2 times every 40 seconds.