EDA122 / DIT061 Fault-Tolerant Computer Systems, 2011 (7,5 hp)

DAT270 Dependable Computer Systems, 2011 (7,5 hp)

Course PM

Content

 

Change history:

2011-10-21: Solutions to exam is now available under the Old exams folder.

2011-10-13: Extra exercise added Friday Oct 14, 15.15 - 17.00.

2011-10-05: Wrong time for the lectures on Thursdays Oct 6 and Oct 13 corrected.  The lectures are given the 10.00 - 11.45.

2011-09-25: New topics for lecture 10, 13  and 16.

2011-09-20: The topics for lecture 9 and 10 have been swapped.

2011-09-18: Lecture halls changed to HB1 Monday 3 Oct and HB4 Monday 10 Oct.  (Note: there is no lecture Monday 10 Oct, 13-15.)

2011-09-10: Exercise plan changed.  Lecture halls updated.

2011-09-07: Lecture plan updated. Lecture halls changed.

Created: 2011-08-24

Teachers

Lecture and Exercises

Note: there is no lecture Monday 13-15, Oct 10 (week 7).


Course description

The course gives an introduction to fault-tolerant and safety-critical computer systems. Fault-tolerance is used in a wide range of critical embedded, enterprise and server applications. The course covers four major areas: 1) Design  principles for centralized and distributed fault-tolerant computer systems, 2) Dependability analysis of fault-tolerant systems, 3) Techniques and processes for assessment of safety critical systems, and  4)  Standards and terminology. The design principles are illustrated through system examples from areas such as space, aviation, road vehicles and transaction processing.


Course literature

The course book is available at Cremona.  All other course literature will be made available on the course homepage.


Lecture plan (preliminary)

Lecture slides will, if possible, be posted on the course homepage no later than 24 hours before the lecture.
 
Lecture no. Course Week Date Time Room Content
1 1 Thursday, Sept 1 10.00-11.45 HC3 Introduction: Basic principles in fault-tolerant computing, hardware redundancy, voting redundancy, basic terminology.
2 1 Friday, Sept 2 13.15-15.00 HB3 Reliability modeling: Basic concepts in probability theory, reliability block diagrams.
3 2 Monday, Sept 5 13.15-15.00 HB4 Hardware redundancy: Voting redundancy, Standby redundancy, Active redundancy
4 2 Thursday, Sept 8 10.00-11.45 HB4 Reliability modeling: Markov chain models
5 3 Monday, Sept 12 13.15-15.00 HC3 Availability modeling: Markov chains, Birth-death processes. Safety modeling 
6 3 Friday, Sept 16 13.15-15.00 HB4 System examples: HP Non-stop Architecture.

Software redundancy: Design diversity, N-version programming, Recovery blocks.

Case study: Ariane 501 disaster.

7 4 Monday, Sept 19 13.15-15.00 HB4 Generalized Stochastic Petri Net Models

Design diversity in the flight control system for Airbus A330/A340

8 4 Thursday, Sept 22 10.00-11.45 HB4 Guest lecture: FT in space applications, Torbjörn Hult, Ruag Space AB
9 5 Monday, Sept 26 13.15-15.00 HB4 Safety assessment: Hazard and Risk Analysis, FMEA, FTA. Technical Management: Life-cycle models, IEC 61508
10 5 Thursday, Sept 29 10.00-11.45 HB4 Safety assessment: Hazard analysis, Risk Analysis, Allocation of safety integrity levels, Hardware reliability prediction, Safety case. Technical Management: Life-cycle models, ISO 26262
11 5 Friday, Sept 30 13.15-15.00 HB4 Guest lecture: Functional safety, certification and standards, Jan Jacobson, SP Technical Research Institute of Sweden
12 6 Monday, Oct 3 13.15-15.00 HB1 Guest lecture: Fault-tolerance in JAS-Gripen, Lars Holmlund, Saab Aerosystems
13 6 Thursday, Oct 6 10.00-11.45 HB4 Software redundancy: More on N-version programming and Recovery blocks
14 6 Friday, Oct 7 13.15-15.00 HB4 FT in distributed systems: Consensus,  Byzantine failures.
15 7 Thursday, Oct 13 10.00-11.45 HB4 FT in distributed systems: The Time-Triggered Architecture
16 7 Friday, Oct 14 13.15-15.00 HB4 Error detection and recovery techniques

 


Exercise plan (preliminary)

Exercise no. Course
W
eek
Date Time Room Content Problems
1 2 Monday,  Sept 5 15.15-17.00 HC3 Reliability modeling: Reliability block diagrams, fault trees. 2.2, 2.3, 2.6, 2.7
2 2 Friday,
Sept 9
13.15-15.00 HB4 Reliability modeling: Markov chains 3.1, 3.2, Variant of 5.6
3 3 Monday,  Sept 12  15.15-17.00 HC3

Availability modeling.

3.12, 3.11, 5.2
4 3 Thursday, Sept 15 10.00-11.45 HB4 Introduction to laboratory class 1 Lab-PM
5 4 Monday,  Sept 19  15.15-17.00 HB4 Probabilistic safety analysis. 3.8, 3.9
6 4 Friday,  Sept 23 13.15-15.00 HB4 Generalized Stochastic Petri Net Models

Introduction to laboratory class 2

Lab-PM
7 5 Monday,  Sept 26 15.15-17.00 HB4 Dependability modeling 5.9, 5.10, Exam problems
8 6 Monday,  Oct 3 15.15-17.00 HB1 Failure rate function, FMEA. 1.1, Exam problems
9 7 Monday,  Oct 10 15.15-17.00 HB4 Exam problems To be decided
10 7 Friday,  Oct 14 15.15-17.00 HB4 Exam problems To be decided


Laboratory classes


Examination

Participation in the laboratory classes and approved laboratory reports.

Written exam. Grades: failed, 3, 4, 5.

First exam: Wednesday, October 19, 2011, 14.00 - 18.00, V-building

Second exam:  Monday, January 9, 2012, 14.00 - 18.00, V-building

Third exam: Tuesday, August 21, 2012, 14.00 - 18.00, V-building