Testing, Debugging, and Verification TDA567/DIT082, LP2, HT2015

Exercises Week 3

Here are the exercises for Week 3 -- with solutions.

Tracking causes and effects

The following method intends to compute the n-th Fibonacci number, but contains a bug for the first two numbers (Java source).

 1: static int fib(int n) {
 2:   int f = 0; 
 3:   int f0 = 1;
 4:   int f1 = 1;
 5:   while (n > 1) {
 6:     n--;
 7:     f = f0 +f1;
 8:     f0 = f1;
 9:     f1 = f; }
10:   return f; }
  1. For each statement, determine the statements that are data-dependent on it.
    We only need to consider those statements that perform a write:
    1. {10}. Writes to f which is only read at line 9, but by then already overwritten in line 8.
    2. {7}
    3. {7, 8}
    4. {5}
    5. {9, 10}
    6. {7}
    7. {7}
  2. For each statement, determine the statements that are control-dependent on it.
    All statements 6-9 are control-dependent on the while statement in line 5.
  3. Assume we are on line 9 in the computation of fib(2). Successively compute the backward-dependent statements - try using the algorithm from the lecture.
    It depends on {7,5,3,4}
    Does this explain why the defect in this program shows only for the first two Fibonacci numbers?
    Indirectly, one can see that when executing within the loop, if one walks backwards the dependencies (control and data), the initialization of f at line 2 is not among the dependencies walked. This means that the cause of infection is removed since the bad initialization value is not propagated inside the loop. So there will not be a failure whenever the loop is entered, but there will be a failure if it is not entered, that is, when n=0 or n=1.

Input minimisation

In this exercise we take a look at one particular algorithm for minimising a (failing) test input: the ddmin algorithm (published here).

Download the following archive: DDMin.zip

In eclipse, import the project by choosing File -> Import ... -> Existing Project into Workspace.
Select archive file, browse to the downloaded zip and click Finish.

Your task is the following:

  1. Read the javadoc on the DDMin and DDTest class.
  2. Read the source code of the Buggy class: its run method contains a bug (clearly on purpose).
  3. Read the source code of the BuggyFail class and run its main method.
  4. Answer the question the main method. You can turn on the deubg messages to get a better insight.

If you have AspectJ installed, you can instead download this version which uses aspects for logging, to avoid cluttering of the code: DDMinAspects.zip

A short annotation on the debugging output:
ddmin 2 - [a,  , d, e, b, u, g, g, i, n, g,  , t, e, s, t]
We started ddmin with chunk size 2. Thus, we split into two halves. The first test is when we removed the first chunk (half), and then the test when we removed the second chunk:
  test [i, n, g,  , t, e, s, t] PASS
  test [a,  , d, e, b, u, g, g] PASS 
Both passed, so we are not that lucky that we can minimise the failing input a lot in one go. Instead, ddmin now tries to remove a shorter chunk, by dividing the input into 4 parts (doubling the current size).
ddmin 4 - [a,  , d, e, b, u, g, g, i, n, g,  , t, e, s, t]  
We now remove the first chunk, that is 1/4 of the whole input, [a, , d, e]:
  test [b, u, g, g, i, n, g,  , t, e, s, t] FAIL
Now ddmin was succesful, and 'already did its job': it reduced the original input and still found a failing test case. However, the goal is to minimise as much as possible so we go further on this failed input. Having removed one chunk, we also continue on a chunk size of one shorter (4-1=3). We thus get the three chunks [b, u, g, g], [i, n, g, ] and [t, e, s, t]:
ddmin 3 - [b, u, g, g, i, n, g,  , t, e, s, t]
  test [i, n, g,  , t, e, s, t] PASS
  test [b, u, g, g, t, e, s, t] PASS
  test [b, u, g, g, i, n, g,  ] FAIL
Another chunk can be removed succesfully! Again, we continue on that smaller input and continue on chunk size (3-1=2):
ddmin 2 - [b, u, g, g, i, n, g,  ]
  test [i, n, g,  ] PASS
  test [b, u, g, g] PASS
Both succeeded, so we again double the number of chunks, reducing the size of each chunk and making it more possible that the remaining input still fails:
ddmin 4 - [b, u, g, g, i, n, g,  ]
  test [g, g, i, n, g,  ] FAIL
The rest of the algorithm goes on similarly:
ddmin 3 - [g, g, i, n, g,  ]
  test [i, n, g,  ] PASS
  test [g, g, g,  ] FAIL
ddmin 2 - [g, g, g,  ]
  test [g,  ] PASS
  test [g, g] PASS
ddmin 4 - [g, g, g,  ]
  test [g, g,  ] PASS
  test [g, g,  ] PASS
  test [g, g,  ] PASS
  test [g, g, g] FAIL
ddmin 3 - [g, g, g]
  test [g, g] PASS
  test [g, g] PASS
  test [g, g] PASS
At this point the number of chunks is 3, but that is also the length of the array. So increasing the number of chunks is not possible, since we cannot make chunks that small (the smallest ones contain at least one element, otherwise we don't make the input shorter). Therefore the algorithm concludes:
Minimalised version of 'a debugging test' is 'ggg'.
However, due to our 'greedyness', we first dropped the first four elements of the array [a, , d, e, b, u, g, g, i, n, g, , t, e, s, t], thereby removing the possibility of arriving at minimal input [e,e]. This demonstrates that the algorithm gets a local optimal, but not a global optimal.

When was this bug introduced?

In this scenario, we will pretend that we are working on a little software project, implementing a merge sort function. Like in any project, we work with a version control system.

Download the archive: bisect.zip

Unzip this archive on your computer. It contains a git repository.

The merge function has seen many improvements, and has been constantly tested. Everything seems to be working well. However, one day someone reports a bug to your project, observing that the following list is not sorted correctly:

{3, 6, 8, 7, 1, 5, 2}

Since we never tested this input, the corresponding bug might have been introduced in any of the earlier versions of the merge function. We will use our version-controlled system to find out where the bug originated.

  1. Write a unit test for the reported bug. Do not extend SortTest, but instead create a new file (e.g. SortTestNew.java). We will move the test to SortTest once the bug is fixed.
    SortTestNew.java:
    import static org.junit.Assert.*;
    import org.junit.Test;
    public class SortTestNew {
      @Test
      public void testReportedBug() {
        int[] input = {3, 6, 8, 7, 1, 5, 2};
        int[] expected = {1, 2, 3, 5, 6, 7, 8};
        Sort.sort(input);
        assertArrayEquals(expected, input);
      }
    }
    
  2. Create a copy of runTest.sh to run SortTestNew instead, verifying that we indeed have a bug.
    runTestNew.sh:
    #!/bin/bash
    javac -cp /usr/share/java/junit4.jar:. SortTestNew.java
    java -cp /usr/share/java/junit4.jar:. org.junit.runner.JUnitCore SortTestNew
    
  3. Now, use git bisect to find the change that introduced the bug. You can assume that the bug was not present in the very first commit of the repository, i.e. the first commit was good. You might need to learn a little about git first. Try to use git bisect run the automate the search!
    Start the bisection:
    git bisect start
    Mark the last commit as bad:
    git bisect bad HEAD 
    Mark the first commit as good:
    git bisect good 3c4b7609242879c72fa052df15b3d5039a1a0481
    Git bisect has no checked out the commit halfway the known bad and good version. You can either continue to manually mark commits as bad or good (run the test and then run git bisect bad or good depending on the outcome. Or you can use the script:
    git bisect run ./runTestNew.sh
  4. Once you found the change that caused the bug, identifying the mistake and fixing it should be straightforward. Do so.
    The problem is that we no longer treat arrays of odd length specially. We should write:
    int rightLength = array.length / 2;
    if (array.length % 2 == 1)
      rightLength += 1;
    int[] right = new int[rightLength];
    
  5. Add the new unit test to SortTest and remove SortTestNew. You can commit the fix if you want to. Congratulations!

Additional quiz-questions from the exercise session:

1) x = 4;
2) if (y == 2)
3)   x = 7;
4) z = x;
1. On which lines is line 4 data-dependent?
1) x = 4;
2) if (y == 2)
3)   x = 7;
4) else
5)   x = 5;
6) z = x;
2. On which lines is line 6 data-dependent?
1) if (y == 4)
2)   x = 3;
3) x = x + 1;
3. On which lines is line 2 control-dependent?
4. On which lines is line 3 control-dependent?
5. On which lines is line 3 backward-dependent?

6. Given a test on arrays of numbers, that fails if we have two consequetive values. We are given the failing input [1,2,2,3]. What is the first smaller array on which ddmin([1,2,2,3], 2) performs the test that returns fail? (note: we are not asking for the final result of ddmin, although that is also a good exercise).

A case study using the Eclipse debugger (optional)

We look at a larger Eclipse project that contains a bug. Our goal is to find the bug by taking advantage of the Eclipse built-in debugger. After fixing the bug we take a look at several bug-finding tools for Eclipse and their performance on this project.

The project we look at is a Datalog query implementation. Datalog is a logical language similar to Prolog, but with several simplifications to keep the language decidable (see also Wikipedia).

Datalog

The following is a classical introduction to Datalog and similar logical languages. The purpose of a Datalog program is to derive facts. A fact is a predicate over some values, for example the fact parent(alice, bob) denotes that Alice is a parent of Bob. A database is a collection of facts. For example:

D = { parent(alice, bob) , parent(bob, chuck) , parent(bob, charlie) , parent(chuck, dave) }

A rule is of the form head :- body. A rule states that given some collection facts (referred to as the rule's body) we can derive a new fact (referred to as the head) of the rule. For example, ancestor(alice, bob) :- parent(alice, bob) allows us to derive that if we know that Alice is a parent of Bob, then we also know that Alice is an ancestor of Bob. It is possible to use variables in rules as well. For example, ancestor(X, Y) :- parent(X, Y) generalises the previous rule to any value for the variables X and Y. A Datalog program is a collection of rules, such as:

p = { ancestor(X, Y) :- parent(X,Y). ancestor(X, Y) :- ancestor(X,Z), ancestor(Z,Y). }

This program consists of two rules. One the rule we just saw before, the other stating the transitive nature of the ancestor-relation.

To evaluate a Datalog program means to give the program an input database and let it derive all the facts it is able to. Thus:

p(D) = { parent(alice, bob) , parent(bob, chuck) , parent(bob, charlie) , parent(chuck, dave) , ancestor(alice, bob) , ancestor(bob, chuck) , ancestor(bob, charlie) , ancestor(chuck, dave) , ancestor(alice, chuck) , ancestor(alice, charlie) , ancestor(bob, dave) , ancestor(alice, dave) }

A common way to use Datalog is to query whether the program can, given a database, derive facts of a certain form. For example

query(p, D, ancestor(alice, ?)) = { ancestor(alice, bob) , ancestor(alice, chuck) , ancestor(alice, charlie) , ancestor(alice, dave) }

and query(p, D, ancestor(dave, ?)) = {} .

A buggy Datalog query implementation

We made a simple Java program that allows you to make Datalog queries. Basically, it first derives all the facts p(D) and then filters those that match the query. The implementation involves the following classes:

  Value.java
  Variable.java

Representing values (such as Alice and Bob) and variables (such as X and Y) respectively. Overwrite the equals and hashCode methods such that their names are compared and not the object pointers (i.e. (new Value("alice")).equals(new Value("alice")) is true).

  Predicate.java

Represents predicates (such as parent, ancestor), also overwrites equals and hashCode in similar fashion.

  Argument.java

Can contain either a Value or a Variable.

  Fact.java
  Atom.java

A fact cannot contain any variables and stores a pair of Predicate and Value[]. In rules variables can appear (in both head and body) so there we use atoms instead; pairs of Predicate and Argument[].

  Rule.java

Contains a head of type Atom and a body of type Atom[]. It has a method deriveOnce(Collection<Fact>) which returns all facts this rule could derive from the supplied collection of facts.

  Program.java

Contains a program of type Rule[]. It has a method deriveAll(Fact[] database) which returns all facts this program can derive by calling deriveOnce on each rule until no new facts are derived.
It also has a method query(Atom, Fact[]) which returns all facts that this program can derive that match the supplied atom. It is implemented as a filter on the results of deriveAll.

  Substitution.java

Used internally for mapping atoms to facts (i.e. to see if the body of a rule can be satisfied by the current database of facts). It has a field from of type LinkedList<Variable> and a field to of type LinkedList<Value> -- that represents a substitution that maps the variable on location i in from to the value on location i in to.
It exports some additional methods to expand the substitution with an additional variable/value mapping and a method to apply the substitution on an atom.

The class Bug.java exposes a bug in the current implementation. When performing the query from the above example, the result is:

query(p, D, ancestor(alice, ?)) = { ancestor(alice, bob) , ancestor(alice, chuck) , ancestor(alice, dave) }

That is, ancestor(alice, charlie) is missing.

Download the eclipse project: Datalog.zip

In eclipse, import the project by choosing File -> Import ... -> Existing Project into Workspace.
Select archive file, browse to the downloaded zip and click Finish.

We are now ready to address the bug. You can find references for debugging with Eclipse here and here.

Confirm the bug

First verify that the bug indeed exists by reading the source code in Bug.java, running the program and checking that the result of the query is incorrect.

First simplifications

We know a bit of how the Datalog library works. Does the bug originate from the query or from the deriveAll method? Adjust the code in Bug.java to get closer to the bug.

Add breakpoints in the problematic method and find out where the first errorneous behavior in this method occurs. Which datalog rule is failing?

Conditional breakpoints

Having discovered which datalog rule is the rule that fails, we want to add breakpoints in the spoiler deriveOnce method, but only for this rule. Add a condition on the breakpoints that ensures this.

Detail formatter

There are two likely causes for this application to fail. Either not all substitutions are found, or the found substitutions are not correctly applied. Unfortunately the substitutions have no nice toString() method. Create a new detail formatter so that we can more easily look at the found substitutions. Use the Display view to test your formatter. (See the references on Eclipse debugging on how to do so)

More conditional breakpoints

Circling closer to the buggy part of the code, we find that the spoiler findAllSubstitutions method is static -- how can we make sure the debugger only interrupts execution in the substitution searches that we are interested in? When interrupting the program, check whether things go wrong either when finding the possible substitutions, or in the exending / adding a new substitution to the total result.

Isolating the buggy code

Having found what goes wrong in this method, it makes sense to create a new test that avoids the other classes and long path to the bug, and makes a direct test on the spoiler expandAll method. That is, we are isolating the test unit. Final spoiler It might be good to be remembered of a common Java programming error. When assigning an object of any kind -- in particular, including LinkedList -- to e.g. a field you only copy the pointer, not the elements of that list. So any code that has the same pointer copy, will share the same actual elements. Thus if you add an element to the linkedlist in the field, you also add an element to the linkedlist of any other code that uses the same pointer...

We will have found that expandAll method is buggy. We can generate a failing test case based on the presented bug, isolating the method. We base it on the case that we found, i.e. we try to extend the substitution X -> Alice, Z -> Bob with both Y -> Charlie and Y -> Chuck.
Value alice = new Value("alice");
Value bob = new Value("bob");
Value charlie = new Value("charlie");
Value chuck = new Value("chuck");

Substitution s = new Substitution();
s = s.extend(new Variable("X"), alice);
s = s.extend(new Variable("Z"), bob);

Substitution sCharlie = new Substitution();
sCharlie = sCharlie.extend(new Variable("Y"), charlie);

Substitution sChuck = new Substitution();
sChuck = sChuck.extend(new Variable("Y"), chuck);

LinkedList subs = new LinkedList();
subs.add(sCharlie);
subs.add(sChuck);
LinkedList res = s.extendAll(subs);
System.out.println(res);

Indeed when we run this test, we find that only charlie is added as a substitution. If now put a breakpoint in the extendAll method, we find that for one of the two cases the newS substitution is null. In this particular case, we also see that before calling extend, newS already has a mapping for Y, namely charlie, and therefore chuck is not added!

Looking into the code of the extend method we then see that this method is supposed to return a new Substitution, which is does. What is it that goes wrong?

It turns out that the bug lies in the use of lists: when creating a new substitution, we only copy the pointer to the lists, not the lists themselves. So the new substitution still shares the same lists as the original one, and the newly added mapping is also added to this this substitution, as we can see by observing their values in the debugger. We can fix it by rewriting the constructor:

private Substitution(LinkedList from, LinkedList to) {
	this.from = (LinkedList) from.clone();
	this.to = (LinkedList) to.clone(); 
}

Now locate the bug and fix it!

Bugfinding tools (optional)

We fixed the reported bug for our Datalog program, but are there more? There exist several tools that aim to help you find bugs. Typically these tools come with a collection of known mistakes or bad coding habits and help you to detect these -- indirectly, these are often the cause of bugs.

Although there are no more bugs in our library that we are aware of, there are some bad coding habits that could easily introduce new ones. We take a look at some analysing tools to find them:

You will find that some of these tools would have detected our bug directly! If you walked through the whole debugging exercise you will probably agree that these tools can potentially save you a lot of time.



Home | Course | Schedule | Exam | Exercises | Labs | Evaluation | Tools | Links Atze van der Ploeg, Nov 19, 2015