Created: 2012-09-01
Change history:
2013-08-19: Date and place for third exam stated.
2012-10-19: Problems to be solved during exercise session 10 is updated.
2012-10-16: Lecture plan finalized. Content descriptions for lectures 12, 13, 14, 15, 16 and 17 updated. Problems solved during exercise session 9 is updated.
2012-09-15: The lecture "More on N-version programming and Recovery blocks" moved back to Oct 5; the lectures on Safety Assessment moved forward to Sept 24 and Oct 1.
The course gives an introduction to fault-tolerant and safety-critical computer systems. Fault-tolerance is used in a wide range of critical embedded, enterprise and server applications. The course covers four major areas: 1) Design principles for centralized and distributed fault-tolerant computer systems, 2) Dependability analysis of fault-tolerant systems, 3) Techniques and processes for assessment of safety critical systems, and 4) Standards and terminology. The design principles are illustrated through system examples from areas such as space, aviation, road vehicles and transaction processing.
The course book is available at Cremona. All other course literature will be made available on the course homepage.
Lecture no. | Course Week | Date | Time | Room | Content |
1 | 1 | Tuesday, Sept 4 | 08.00-09.45 | HC1 | Introduction: Basic concepts in fault-tolerant computing, hardware redundancy, voting redundancy, basic terminology. |
2 | 1 | Thursday, Sept 6 | 10.00-11.45 | HC1 | Hardware redundancy: Voting
redundancy,
Standby redundancy, Active redundancy System example: HP Non-stop Architecture. |
3 | 1 | Friday, Sept 7 | 15.15-17.00 | HC1 | Reliability modeling: Basic concepts in reliability theory, reliability block diagrams, fault trees |
4 | 2 | Monday, Sept 10 | 13.15-15.00 | HC1 | Case study: Ariane
501 disaster. Software redundancy: Design diversity, N-version programming, Recovery blocks. |
5 | 2 | Thursday, Sept 13 | 10.00-11.45 | HC1 | Reliability modeling: Markov chain models |
6 | 3 | Monday, Sept 17 | 13.15-15.00 | HC1 | Availability modeling:
Markov chain models, Birth-death processes. Safety modeling. |
7 | 3 | Thursday, Sept 20 | 10.00-11.45 | HC1 | Generalized Stochastic Petri
Net Models Design diversity in the flight control system for Airbus A330/A340 |
8 | 4 | Monday, Sept 24 | 13.15-15.00 | HC1 | Safety assessment: Hazard and
Risk Analysis, FMEA, FTA.
Technical Management: Life-cycle models, IEC 61508 |
9 | 4 | Thursday, Sept 27 | 10.00-11.45 | HC1 | Guest lecture: FT in space applications, Torbjörn Hult, Ruag Space AB |
10 | 5 | Monday, Oct 1 | 13.15-15.00 | HC1 | Safety assessment: Allocation of safety integrity levels, Hardware
reliability prediction, Safety case.
Technical Management: Life-cycle models, ISO 26262 |
11 | 5 | Thursday, Oct 4 | 10.00-15.00 | HC1 | Guest lecture: Functional safety, certification and standards, Jan Jacobson, SP Technical Research Institute of Sweden |
12 | 5 | Friday, Oct 5 | 15.15-17.00 | HC1 | Software
redundancy: Experimental evaluations of N-version programming and Recovery blocks. Study of field failures in high-performance computing systems. |
13 | 6 | Monday, Oct 8 | 13.15-15.00 | HC1 | FT in distributed systems: Consensus, Byzantine failures. Layered fault tolerance |
14 | 6 | Thursday, Oct 11 | 10.00-11.45 | HC1 | Reliability trends for integrated circuits. Error detection. Experimental evaluation of error detection mechanisms in a jet-engine controller. |
15 | 6 | Friday, Oct 12 | 15.15-17.00 | HC1 | FT in distributed systems: The Time-Triggered Architecture |
16 | 7 | Monday, Oct 15 | 13.15-15.00 | HC1 | Guest lecture: Fault-tolerance in JAS-Gripen, Lars Holmlund, Saab Aerosystems. |
17 | 7 | Thursday, Oct 18 | 10.00-11.45 | HC1 | Clock synchronization
in time-triggered systems. More on error detection techniques. Course summary. |
Exercise no. | Course Week |
Date | Time | Room | Content | Problems |
1 | 2 | Monday, Sept 10 | 15.15-17.00 | HC1 | Reliability modeling: Reliability block diagrams, fault trees. | 2.2, 2.3, 2.6, 2.7 |
2 | 2 | Friday, Sept 14 |
15.15-17.00 | HC1 | Reliability modeling: Markov chains | 3.1, 3.2, Variant of 5.6 |
3 | 3 | Monday, Sept 17 | 15.15-17.00 | HC1 |
Availability modeling. |
3.12, 3.11, 5.2 |
4 | 3 | Friday, Sept 21 | 15.15-17.00 | HC1 | Introduction to laboratory class 1 | Lab-PM |
5 | 4 | Monday, Sept 24 | 15.15-17.00 | HC1 | Probabilistic safety analysis. | 3.8, 3.9 |
6 | 4 | Friday, Sept 28 | 15.15-17.00 | HC1 | Generalized Stochastic Petri
Net Models Introduction to laboratory class 2 |
Lab-PM |
7 | 5 | Monday, Oct 1 |
15.15-17.00 | HC1 | Dependability modeling | 5.9, 5.10, Exam problems |
8 | 6 | Monday, Oct 8 |
15.15-17.00 | HC1 | Failure rate function, FMEA | FMEA and
reliability analysis, 1.1, Exam problem |
9 | 7 | Monday, Oct 15 |
15.15-17.00 | HC1 | Exam problems | Old Exams 2004-08-23(problem 1) 2010-01-11 (problem 3) 2004-08-23(problem 1)
|
10 | 7 | Friday, Oct 19 |
15.15-17.00 | HC1 | Exam problems | Old Exam 2011-10-19 (problem1,2,3) |
Note! Read the lab report requirements carefully and make sure all of them are fulfilled before you send in your report!
The lab report shall be sent in electronically via the Student Portal. .
Expect that it will take at least two days for us to grade your report.
You will be notified via email of the result of the grading.
Hand in your report as soon as possible, but no later than Monday, October 29, 2012.
Participation in the laboratory classes and approved laboratory reports.
Written exam. Grades: failed, 3, 4, 5.
First exam: Tuesday, October 23, 2012, 14.00 - 18.00, HA, HB, HC
Second exam: Monday, January 15, 2013, 14.00 - 18.00, Mechanical engineering building, Hörsalsvägen 5
Third exam: Tuesday, August 22, 2012, 14.00 - 18.00, VV