EDA122/DIT061 Fault-Tolerant Computer Systems, 2012 (7,5 hp)

 

Welcome!

Information about the course can be found in the Course PM, see link below.

Most of the material on these pages is password protected.
The username and password will be announced at the first lecture.
If you missed the first lecture, please ask a fellow student for the username and password.

Last updated: 2012-11-01


Messages from the examiner

2012-11-12

Today we extended the last deadline for the approval of the laboratory report to 27th November midnight. The extension is done due to the high working load of the teaching assistants as they are not able to give you feedbacks in time. We apologize for any inconveniences.

2012-11-01

We have today opened a new hand-in for revised lab reports in the Study portal.  The previous hand-in was closed at midnight today.
Notice: You don't need to resubmit revised reports submitted yesterday or earlier via the previous hand-in.

The results of the exam will become available in Ladok no later than November 13.

2012-10-26

The deadline for the first submission of the laboratory report is now extended to 1st of November at 00:00.

2012-10-24

The solutions for the exam problems are now uploaded and can be found in the old exams.

2012-10-23

We have corrected an error in the solution to problem 3 in the exam given 17 August 2010. A # sign was missing before the repair rate in the GSPN model. Many thanks to Attila N. for informing us of the error.

2012-10-22

We have today at 18.15 uploaded a corrected version of the solutions for the exam given 9 January 2012.

We found yet another error in the solution for the exam given 9 January 2012. The calculation of the MTTF in problem1 is wrong, although the answer happens to be the correct one. Many thanks to Carl  E for informing us of the error. We will post an updated version of the solution shortly. 

We have now also found two minor errors in the exam given 9 January  2012. Problem 2 is missing a statement saying that it is allowed to simplify the solution by disregarding non-covered faults when only one module is working. This  assumptions makes it possible to use the birth-death model shown in the solution. The second error is that the solution for problem 3 is unnecessarily complex. The place P4 can be removed. Many thanks to Christoffer K for pointing out these errors to us.

We have discovered an error in the solution to Problem 3 in the exam given 10 January 2011.  The solution to problem 3a is obviously incorrect since an immediate transition mark is missing on the arrow from "p_spare" to "p_active". This transition should be blocked when p_active contains two tokens. Many thanks to Sabina F for informing us of the error. We will post an updated solution later today.

2012-10-18

I have updated the slides for Lecture 16, Lars Holmlund's guest lecture. I received the slides from Lars today.

We have discovered an error in the solution to Problem 2 in the exam given 9 January 2012. We will post the correct solution shortly.
Many thanks to the student who found the error!

2012-10-17

The slides for Lecture 17 are now available.

2012-10-16

I have added reading instructions for "A Large-Scale Study of Failures in High-Performance Computing Systems". I don't expect to make any further changes to the reading instructions. However, I will not "freeze" them until after the last lecture on Thursday.

2012-10-15

There will be an extra lab session this week on Wednesday Oct 17, 17.15 - 21.00. Please make sure that you have all results and plots needed for the lab report. Please attend the lab session if you miss some data or haven't  finished the labs. 

2012-10-12

I have updated the slides for Lecture 10, 13 and 14.  The figure showing the process model for ISO26262 in lecture 10 now has a higher resolution in order to improve readability. Despite this, the figure is still hard to read in the handouts with two or six slides per page. Please use the handouts with one slide per page for studying this figure. For lecture 13 and 14, I have removed the slides that I did not go through during the lecture.

2012-10-11

The slides for Lecture 15 are now available.

2012-10-10

The slides for Lecture 14 are now available. I have decided to move the presentation of fault tolerance in time-triggered systems to Friday, Oct 12 (Lecture 15). Lecture 14 will address error detection and hardware reliability trends. The recommended reading for Lecture 14 is Chapter 6.4 in the course book. I also recommend you to reading the paper entitled Designing Reliable Systems from Unreliable Components - The Challenges of Transistor Variability and Degradation. (Reading this paper is optional. There will be NO questions about this paper on the exam.)

Lecture 14 will include a presentation of  an experimental study conducted  in my research group. The aim of the study was to evaluate the effectiveness of a set of error detection mechanisms included in a jet-engine control system . We presented the results of the study at the SafeComp 2012 conference two weeks ago.

2012-10-07

The slides for lecture 13 are now available.

2012-10-05

The guest lecture by Lars Holmlund has been moved to Monday October 15.

The final version of the slides for lecture 12 is now available.

2012-10-04

Templates for the laboratory report in MS Word and Latex are now available under course material. We apologize for the delay in providing the templates.

A preliminary version of slides for Lecture 12 is now available. I will add more slides about the field failure data study tomorrow.

2012-10-01

I have uploaded a second version of the slides for Lecture 10. This version have 2 slides per page for improved readability. Some of the slides are difficult read in the standard format using six slides per page.

2012-09-30

The lecture slides for Lecture 10 and 11 are now available.

2012-09-20

The slides for lecture 8 are now available.

The PM  for laboratory class 1 is now available under Course material.

I have uploaded a new version of the slides for lecture 7. The previous version was an old one that I uploaded by mistake. The main difference between the two versions is that the old version contained some slides that I have presented before.

2012-09-15

The slides for Lecture 7 are now available.

I have changed the lecture plan.. The lecture "More on N-version programming and Recovery blocks" has been moved back to Oct 5, while  the lectures on Safety Assessment have been moved forward to Sept 24 and Oct 1.

You can now register for the laboratory classes in the Student portal. The labs are done in groups of two. You and your lab partner should first register for a group number and then sign-up for one lab slot. The available lab slots are Tuesday evening, Wednesday evening and Friday morning. The labs start study week 4. Negin will go through lab 1 on the exercise on Friday Sept 21.

2012-09-10

The slides for Lectures 5 and 6 are now available.

I am looking for student representatives for the course evaluation. So far only one student have expressed an interest in serving as a student representative. Please send me an e-mail if you are interested. For more info, click on "Course evaluation" above

2012-09-08

The slides for Lecture 4 are now available.

2012-09-05

The slides for Lecture 3 are now available.

I uploaded a slightly updated version of the slides for Lecture 2 at 16:20 today.

2012-09-04

The slides for Lecture 2 are now available.

2012-09-01

The first lecture is in lecture hall HC1 at 08.00, Tuesday, September 4.


Teachers

Johan Karlsson, ext. 1670, office 4107, johan(at)chalmers(dot)se (examiner and lecturer)

Negin Fathollah Nejad Asl, ext. 5404, office 4127, negin(at)chalmers(dot)se (teaching assistant)