Suggested Projects 2019 for DAT300, DIT668

This page will regularly receive updates as students pick projects, we add new ones, so visit often.

Background

The project is a core part of the course. You should form a group with other participants in the course and then choose one of the example projects below suggested by faculty, or suggest your own. It is important to be ready to start on your project ASAP when the course starts, so please browse the suggestions as early as you can. At the end of August, we will also send out an email to all registered students to start to choose their group / project. Send us an email if you plan to take the course, so you also get that information. Note that it is first come, first serve for the projects.

At project start, we will share access with a github for you to look at (and possibly extend earlier projects) as well as discuss available hardware that you can use in your projects, and available data. During the first lecture, we will share a slack channel from which you can sign up to github, box, and other course related resources.

Below we list a number of suggested projects for DAT300. Of course we would be open to hear your suggestions too -- but the projects must relate to using ICT in some interdisciplinary way (such as Energy and ICT). The suggestions below are just seed ideas from faculty (the ideas here are brief(!)). If you like an idea, we will connect you with the faculty to develop it further. It is then up to you to work out the details and the work plan with the project demonstration at the end of the course. It should be your project and you will need to drive its development.

2019:A Projects related to IoT and Security

The Internet of Things is a paradigm-shift where we can have nodes that collect information, process the information (either locally, at the edge, or in the cloud) to then actuate changes in our physical environment. IoT offers many advantages, but there are many concerns regarding cyber security as many parts (end nodes) can easily be attacked and used for massive attacks as we recently have seen:

This project has three paths forward. In all cases, NVIDIAS new powerful IoT nodes will be used, often credited as being high-performance at the edge:

2019:A1 Misuse detection at the edge, “Snort pattern matching”, A multi-parallel, heterogeneous framework for accelerating Network Intrusion Detection in IoT networks

Due to the severe computation and energy constrains on platforms at the edge, it is difficult to run traditional code / security mechanisms at the edge. For this reason, detection algorithms and tools, such as, e.g. Snort, need to be adjusted to perform well under these constrains. On the other hand, IoT hardware is evolving and provides new features that have not been explored before (e.g. multiple cores, programmable GPUs).

In this project, you will adapt pattern matching code to run on the NVIDIA platform. The goal is to get the code to run and perform benchmarks against equivalent code on Odroids written in opencl.

This is a challenging project. You should either know opencl, cuda, or be willing to spend a lot of effort to learn these programming paradigms. We will not be able to help you with debugging your programming on cuda. We will provide you with code in a github, a scientific paper with benchmarks for the odroid. You need to develop the project from there.

2019:A2 Anomaly Detection at the edge

Given the powerful GPUs in Jetson, the idea with this project is to perform anomaly detection at the edge and especially for time series data from critical infrastructures. The goal is to implement a machine learning algorithm, preferably with deep learning, and test it using a dataset provided by us (time series of sensor measurements, where attacks are sometimes introduced). You should use the features of the pascal architecture but might be able to program in Matlab or other languages (you need to check the development kits for the options). One of your starting points will be the survey article:

The goal should be to have a benchmark and an understanding of strengths and weaknesses of such metods for intrusion detection systems.

2019:A3 Reinforcement learning to improve fuzzing and analysis of code of IoT

The software / firmware of IoT devices have vulnerabilities that allow attacks such as MIRAI. Fuzzers have gained in reputation to find vulnerabilities in code. The goal of this project (but it will need to be simplified) is to

This is a very challenging project but fun as you will be at the edge of research. Students taking this project should be very familiar with programming and preferably assembly, fuzzers, and such tools or willing to learn. The project should be implemented on one of the Jetson boards. A key goal is to early set the scope of the project and make sure it is extensible for future work.

2019:B Projects related to Critical Infrastructure and Security

One approach to detecting attacks on industrial control systems propose to monitor sensor signals for changes in behavior. PASAD is a sensor-level attack detection method that monitors time series of sensor measurements for disturbances. The method first embeds the time series into a vector space and then tracks the distance between the most recent vectors and a cluster of vectors, defining the normal behavior, determined during a training phase. The aim of this project is to investigate the possibility of improving the execution of PASAD. There will be two variants for the project.

2019:B1 The algorithm PASAD on the device

Your goal is to implement both the detection and learning of PASAD on , NVIDIAS new powerful IoT nodes, often credited as being high-performance at the edge:

These systems have a GPU of the PASCAL architecture. The goal is to see if features can be used to speed up the operations needed for PASAD using this architecture and benchmark it against a desktop computer. It may be possible to program in Matlab, but you might also have to go to C and libraries in C. Also see 2019:B3 below.

2019:B2 PASAD in the cloud

In this project, your goal is to design and implement a version of PASAD that can run in the Google cloud infrastructure (using a free account). You need to provide an API / GUI to control the algorithm, and see this as an extension of the detection that could happen locally. For example, how is data submitted to the cloud analysis? How would one connect it from a local network?

2019:B3 An intrusion detection system for IoT

IoT contains the nodes, the edge, and the cloud. Future intrusion detection systems will need to encompass all these levels. This project is in some way a merge of 2019:B1 2019:B2 where the design and implementation (with clear motivations) of a truly cloud-based comprehensive IDS with comprehensive IDS with one part on the device and one part in the cloud.

2019:C Projects on Distributed, Parallel, Stream Processing for Data Analysis and Learning in Digitalized/IoT-based systems

The increasing presence of high-volume, high-rate data implies the need for processing infrastructure to analyse them and distill useful information. The following projects are about Parallel, Distributed, Stream Processing for analysis including learning and optimization, associated with the group's research and synergies on the topics.

2019:C1 Streaming prediction for the smart grid

The main use of consumption data of electricity meters is billing and long term network planning. However now that electricity meters have become digital and remotely readable, new uses such as validation and real time distribution grid monitoring are starting to develop. Missing and unreliable data can cause problems for these new use cases, which is where prediction comes in. Predicted consumption values can be used to validate the measured values or to replace the missing data.

In this project you will work with implementations of known prediction methods such as ARIMA or multiple linear regression, with the Stream Processing Engine Apache Flink. You can investigate how well the method works for predicting consumption of single customers versus predicting the consumption of a group of customers (in the same neighborhood for example).

The project will be co-supervised by Joris van Rooij, industrial PhD student with Göteborg Enegi and Chalmers; your implementation can be tested with real data at Göteborg Energi.

The project will be co-supervised by Joris van Rooij, industrial PhD student with Göteborg Enegi and Chalmers.

2019:C2 Accelerated parallel machine learning

The interest in Machine Learning (ML) methods is growing in recent years, both in the research community and the industry (Big data analysis, Autonmous vehicles, etc). One of the reasons for the growing interest is the availability of vast amounts of data due to the increasingly digitalized industry, for instance sensory data in Internet Of Things (IoT) networks and Autonomous Vehicles (AV). In order to cope with the growing demands on machine learning methods and the amountof data to be analyzed, recent research efforts propose parallelmachine learning (PML) methods for acceleration. Various PML approaches have been studied, and proven to bevery useful in industrial applications, and significantly reduce the time for systems to learn. In this project, the students will in the initial stage get an overview of parallel machine learning approaches. The students will choose several such methods for closer familiarization, implement these and compare their performance for different applications.

References (1) An overview of gradient descent optimization algorithms http://ruder.io/optimizing-gradient-descent/ (2) An Introduction to Distributed Deep Learning http://seba1511.net/dist_blog/ (3) Chen, Jianmin, et al. "Revisiting distributed synchronous SGD." arXiv preprint arXiv:1604.00981(2016)

The project will be co-supervised by Karl Bäckström, PhD student at Chalmers, in the Wallenburg WASP programme.

2019:C3 Online visualization of live queries on vehicular data

Data stream processing is an important tool for coping with large amounts of data arriving in real time. Just as well, data visualization is paramount for gaining insights from such data. The combination of data stream processing with Apache Flink and related systems as the ELK --Elastic Search and Kibana stack, represents a framework for performing real-time analysis of a data stream, saving those results to a search index and visualizing them in a dashboard. (1) The main aim of this project is to be able to replay historical GPS data from vehicles and display a live heatmap of the vehicles's position in a city, (2) then more queries can be implemented on the data (e.g. speed visualization, proximity to certain landmarks).

The project will be co-supervised by Romaric Duvignau and Bastian Havers, postdoc and industrial PhD student respectively, with Volvo Cars Corporation and Chalmers.

2019:C4 5G Mobile Video Streaming Simulator

With the upcoming transition to the fifth generation of mobile infrastructure taking place right now, it is more than ever necessary to simulate realistic internet traffic from/to mobile users to test the new infrastructure being set. Traffic from packet networks is mainly characterized by the packet length and packet inter-arrival time distributions, which when both are well understood allow us to generate realistic simulated traffic. Considering that video streaming is the highest consumer of mobile bandwidth, we would like to generate huge amount of realistic video traffic at very high rate (up to 40 Gbs) to be able to load test the 5G infrastructure. The project involves simulating high-rate (5G) traffic by understanding and modeling 4G/multimedia network traffic.

The project will be co-supervised by Romaric Duvignau, postdoc at Chalmers, and can involve synergies with Ericsson's packet-core group.

2019:C5 Predicting house prices with regression techniques

Several factors contribute to the value of real estate properties, for instance area, neighbourhood, style, construction year, and etc. The complexity and interplay of such factors make negotiating a fair price tricky. Therefore, it can be useful to employ AI to decide/estimate the value of a house based on the contributing factors. This project is designed to address the aforementioned problem. Through working on this project, you will get familiarised with fundamental concepts of machine learning in a hands-on fashion. To that end, a data-set of house prices is provided. For every listed house, in addition to its price, about 80 relevant features are provided, e.g. lot area, original construction date, type of foundation, and etc.

Problem: Each house in the data set is represented by several variables (23 nominal, 23 ordinal, 14 discrete, and 20 continuous) explaining almost every relevant aspect of a property. You can find more information regarding the data-set in the following link: http://jse.amstat.org/v19n3/decock.pdf Given the training-set, the goal is to implement and test an accurate predicting model that achieves a mean square error as low as possible on the test data-set. Challenges: There are several challenges you have to address in order to find a proper solution. You have to get acquainted with the data and explore it. You may want to apply feature selection, feature extraction, and/or visualisation techniques. You may accordingly need to properly transform your data, i.e. as a preprocessing step. As the end goal is to derive an accurate regression model, you have to explore among available techniques and pick one that gives you a reasonable result. Keep in mind that you should be able to reason your choices.

This project will be co-supervised by Amir Keramatian, PhD student at Chalmers.

2019:X1 Unmanned aerial vehicle (UAV) based sensors for facade investigation

(this project will most likely not be given 2019 unless students with very specific experience take the course)Imaging spectrometry can be used for fast, frequent and objective identification of façade types and conditions. Recent development of high quality but lightweight hyperspectral imaging hardware makes it possible to analyze the components in real time.

This project may be divided in two parts: i) control the UAV based on information of coordinates, optimization of flights path, ii) interpret information from the sensor, normal video or other sensor such as IR, to find façade material, color, damages.

Projects from earlier years

Earlier projects are listed for 2017 and for 2018. They might not be given this year but you can still browse them to get an inspiration, or approach us to ask if it is possible to do something similar.