We can distinguish two kinds of efficiency:
In the following section, we will discuss, in the context of the Fudgets system, how we obtain reasonable execution efficiency. We have not tried to quantify programmer efficiency in a way that permits further comparison or judgement.
Two tiny programs:
The Fudgets counter example from Section 9.4: 13s The Motif counter example from Figure 112: 14s
Some small programs:
The Fudget calculator from Section 9.8 with 15 buttons: 13s The Fudgets calculator Cla with 28 buttons: 16s The calculator xcalc with 40 buttons, from X Windows distribution: 18s
Some larger programs:
The Fudgets WWW browser from Chapter 32: 22s Mosaic 2.6: 75s Netscape 3.0: 137s Netscape 4.04j2: 313s
Figure 101. Comparing the startup times of some programs when running via a 16.8kbps modem connection.
When it comes to event processing, we naturally wanted to to minimise the number of events that has to be handled by the functional program. Fortunately, the X Windows system can do a lot of event processing for the application. By setting event masks and button grabs appropriately, you can often eliminate all insignificant events, i.e., all events that are sent to the application program carry some meaningful information. In simpler window systems, the application has to deal with every little mouse move and button/key presses/releases by itself.
XGrabButton()in the Xlib manual) with an event mask that selects button presses/releases and enter/leave window events (each GUI element is a window), can be used to select exactly these events, with only one small exception: if the mouse pointer enters the button area, the mouse button is pressed, the pointer leaves the button area and the mouse button is released, the program will receive an insignificant button release event. The important thing is that no unnecessary motion events will be received.
We also tried a third representation,
which was slightly less efficient than the continuation-based representation.data SP i o = StepSP [o] (i->SP i o)
In the discussion below, we assume the continuation-based representation (although some of the ideas can be carried over to other representations).
loopThroughRight(Section 18.2) is a general way to adapt an existing stream processor for use in a new context. Another simple and common way to adapt a stream processor is by mapping a function on the elements of the input or output stream:
For example, tagged parallel composition of fudgets, >+<, can be defined likesp -==- mapSP g mapSP f -==- sp
wherefud1 >+< fud2 = mapSP post -==- (fud1 -+- fud2) -==- mapSP pre
postare the appropriate re-tagging functions. However, if implemented directly, such a definition has a rather high overhead.
>+< to a form not involving
mapSP, but instead recursion and
pattern matching on the stream-processor constructors, a
more efficient solution can be obtained.
Programs transformations of this kind are tedious to do by
hand, but it could still be worthwhile if the resulting code
is to be included in a library. The above described
transformation has been done by hand in the Fudget library. We
measured the effect on a communication intensive fudget
program containing a parallel composition of 50 fudgets (the
Space Invaders program described in
Chapter 35). The transformation reduced the CPU
time consumption by over 35%. Encouraged by this result, we
also transformed some more combinators (for example
>=^< discussed in
Section 13.1) in the same way.
It would of course be nicer to have these transformations
done automatically, especially when they are needed in
application programs. The kind of automatic transformation
that would be useful here is deforestation
[Wad90], which eliminates
intermediate data structures (applications of
GetSP in this case) by using certain unfold/fold
The manual work required in this solution is located in the library. The application programmer need not be aware of it. The automatic work required is inlining (unfold) by the compiler. It actually works even without inlining, but the efficiency gain is not as big.
The expressions we wish to optimise are of the kind
illustrated above: a stream-processor combinator applied to
mapSP f, for some
The trick is to make
mapSP a constructor
in the stream-processor data type:
Since the type is abstract, adding constructors to it like this will not be visible to application programmers.data SP a b = PutSP b (SP a b) | GetSP (a -> SP a b) | MapSP (a -> b) | NullSP
MapSP is a constructor, the implementation of
serial composition (as shown in Figure 45) can be
extended to handle the case when one or both arguments are
MapSP in a more efficient way. This
means that an expression like
can evaluate toMapSP f -==- MapSP g
With inlining, this step can be taken by the compiler and the compositionMapSP (f . g)
f . gcan then be optimised further. Without inlining, we have at least eliminated a use of
-==-, and thereby reduced the number of generated applications of the
We have not tested the above ideas in the Fudget library.
The first test measures the efficiency of serial composition
and compares the operators
-==-. We measured the time it took to send
around 5000 messages through a serial composition of a
varying number of identity fudgets (or identity stream
processors). The program used is shown in
Figure 102. It was compiled with HBC and run on a
Pentium Pro 200Mhz under NetBSD 1.2 [Neta]. The
results are shown in Figure 103. We can
see that the time grows roughly linearly with the length of
the composition and that serial composition of fudgets is
much more expensive than serial composition of stream
The last table in Figure 103 shows the
performance of the function composition
map id .
. map id. It is more efficient than the other
import Fudgets main = fudlogue mainF mainF = nullF >==< tstF >==< concatSP >^^=< stdinF tstF = case argReadKey "comb" 1 of 1 -> nest (idF>==<) idF depth 2 -> nest (idSP>^^=<) idF depth 3 -> absF (nest (idSP -==-) idSP depth) nest f z 0 = z nest f z n = f (nest f z (n-1)) depth = argReadKey "depth" 0
Figure 102. A program to measure the efficiency of serial composition.
time ./Internal <testinput1 -S -h8M - -depth n
idF >==< idF >==< ... >==< idF):
n User time GCs GC time Max heap Max stack 0 0.080u 0 0.00 50 4.173u 40 0.10 62432 836 100 8.475u 80 0.25 95840 1410 200 18.418u 163 0.95 159884 3042 400 44.521u 334 3.35 288020 3768
idSP >^^=< idSP >^^=< ... >^^=< idF):
n User time GCs GC time Max heap Max stack 0 0.100u 0 0.00 50 0.755u 8 0.01 35712 210 100 1.452u 15 0.03 42270 272 200 2.869u 30 0.08 53504 642 400 5.706u 59 0.18 76976 1218
idSP -==- idSP -==- ... -==- idSP):
n User time GCs GC time Max heap Max stack 0 0.076u 0 0.00 50 0.411u 4 0.01 32524 171 100 0.812u 7 0.00 35840 348 200 1.574u 15 0.04 41956 651 400 3.047u 30 0.07 54924 1260
map id . map id . ... . map id
n User time GCs GC time Max heap Max stack 0 0.000u 0 0.00 50 0.094u 2 0.00 3956 98 100 0.203u 4 0.00 5616 275 200 0.434u 8 0.00 9712 566 400 0.906u 16 0.01 17616 473
Figure 103. The efficiency of serial composition.
The second test measures the efficiency of parallel
composition. We measured the time it took to send about 70000
messages through one of the fudgets in a parallel
composition of identity fudgets. The program used is shown in
Figure 104. The parallel compositions were created
Figure 105. The depth of the tree, and hence the time it takes to send a message to a particular fudget in tree grows logarithmically with the size of the parallel composition. However, since all that is known about the address type is that it is an instance of the
Eqclass, the table lookup has to be implemented as a linear search and hence the lookup time varies linearly with the position in the list. From the results we see that the time of the table lookup soon becomes the dominating factor. This suggests that it would be a good idea to provide alternative combinators to
listF, which require the address type to be an instance of the
Ordclass, or even the
Ixclass, to reduce the time complexity of the table lookup time to logarithmic or constant, respectively.
import Fudgets main = fudlogue mainF mainF = nullF >==< tstF >==< concatSP >^^=< stdinF tstF = listF [(i,idF) | i<-[1..size]] >=^< (,) sel size = argReadKey "size" 1 sel = argReadKey "sel" 1
Figure 104. A program to measure the efficiency of parallel composition.
time ./Internal2 <testinput2 - -size n -sel k
n 2log n time GCs GC time max heap 1 0 1.352u 11 0.02 29908 32 5 2.179u 18 0.03 34924 256 8 2.457u 22 0.07 66744 2048 11 2.989u 27 0.31 317776n=256
k 2log n time GCs GC time max heap 64 8 4.640u 33 0.11 127 8 7.001u 44 0.12 128 8 6.801u 44 0.16 256 8 11.126u 67 0.24 87880
Figure 105. The efficiency of parallel composition of fudgets.