Program transformation – new programs for old

Program transformation alters the structure of a program without affecting its meaning. It can be applied to re-engineering, program understanding and the year 2000 problem. Mark Harman looks at the many ways of writing the same program.

There are many ways of writing the same program, some are easier to understand, while others are more efficient. Often there is a trade off between understandability and efficiency. We can either make the program faster, but harder to understand or expand the code in a way that makes it easier to understand at the expense of making it slower. Transformation allows us to have the best of both worlds by modifying the syntactic structure of the program to suit our taste. Because the transformation process guarantees that the original and transformed versions have the same effect, the programmer can read whichever version is easiest to understand and execute whichever is fastest.

The idea of program transformation has been around for some time, an early example is the pretty printer. A pretty printer lays out a program according to a set of predefined style rules. These rules tell it by how much to indent nested structures, how much white space to insert between syntactic components, how to handle line wrap around, and so on. A pretty printer deals with the lexical structure of the program (the very lowest syntactic level), by simply rearranging the existing text to make it more attractive on the page. An important property of a pretty printer is that it only removes and inserts white space and therefore cannot have any effect upon the behaviour of the program. This idea is central to all forms of transformation. Transformation changes the syntax of a program without changing its semantics.

Another example of program transformation happens inside the compiler when it walks over the object code that it has produced to tighten up on certain common constructions making the whole program faster. These optimisations are often called peep-hole transformations, because they operate on small sections of object code at any given time (that is, they operate on the program via a peep-hole, through which only these small sections of code are visible).

Source level transformations, like the pretty printer, are used to make programs easier for humans to understand and manipulate, whereas object level transformations, like peep-hole optimisation, are performed on object code in order to make it more suitable for the machine.

A simple example of transformation

A for loop can be rewritten as a while loop. That is, given a for loop of the general form for(e1,e2,e3)s, where e1, e2 and e3 are arbitrary expressions and where s is some arbitrary statement, we could rewrite the loop as e1; while (e2) { s; e3; } without affecting the meaning of the loop. If we applied this transformation rule throughout the program, we would replace all for loops with equivalent while loops. In this way, we can use source level transformation to restructure a program, or to rewrite the program in a form that a particular programmer finds easier to read. All that we are doing, is taking the pretty printing idea a stage further. We do more than simply fiddle with the white space around the program: crucially, we retain the guarantee that the original and the transformed program have the same effect. Replacing for loops with while loops is not a particularly exciting transformation. However, the idea of program transformation can be applied to a surprisingly wide variety of programming problems.

Side effects can be painful

A side effect is an assignment to a variable that occurs when an expression is evaluated. For example, the expression ++i in C has the value i+1, but when it is evaluated, it also has the side effect of assigning i+1 to i. As all programmers know, using side effects can lead to programs which are compact and efficient though often hard to understand.

Consider, for example, the C fragment in Figure 1. Although this is a very small fragment of code, its effect on the variables i and x is not immediately clear. A more understandable version of the fragment is given in Figure 12. The code in Figure 1 is hard to understand because it combines side effects, short-circuit boolean evaluation and the representation of booleans by integers in a single compact fragment.

if(++i && i--) x=1 ;

Figure 1 – A fragment with side effects.

Figure 2 contains a rule for converting statements into a side effect free-form. The rule is written as a function, sefree, which takes an expression and returns a side effect free version of it (an expression which has the same value, but which does not cause a side effect). Of course the side effect must still happen when we execute the transformed version of the program, so we have a second function, side_effects, which takes an expression and yields a sequence of assignment statements that replicate the side effects of the original. For example, ++i, is transformed into a statement i=i+1; by side_effects and into a side effect free expression, i+1, by sefree.

sefree(i--) = i

sefree(--i) = i-1

sefree(i++) = i

sefree(++i) = i+1

side_effects(i--) = i=i-1;

side_effects(--i) = i=i-1;

side_effects(++i) = i=i+1;

side_effects(i++) = i=i+1;

Figure 2 – Simple side effect removal by cases.

Strictly speaking, since assignment is an expression statement in C, the transformation does not completely remove side effects. Instead, it ensures that all side effects occur only in assignment statements. This is natural, as the assignment expression statement is the archetypal side effect statement.

In order to deal with the effect of short circuit evaluation of the && operator we can use the transformation short_circuit in Figure 3, which transforms logical operators into equivalent conditional expressions. In this transformation, and those which follow, e1 and e2 stand for some arbitrary expression and s stands for some arbitrary statement. For the expression ++i && i--, the transformation short_circuit gives us the equivalent expression ++i?i--:0. This is a correct transformation, because the conditional expression ++i && i-- means ‘First, evaluate ++i’. If the result is false, then return false. Otherwise return the result of evaluating i--'. Using short_circuit, the program fragment in Figure 1 can be transformed into the fragment in Figure 4. This new version still contains all the original's side effect, but these will be removed by applying further transformations.

short_circuit(e1&&e2) = e1?e2:0

short_circuit(e1||e2) = e1?1:e2

Figure 3 – A Transformation which makes short circuit evaluation explicit.

In order to transform the fragment in Figure 4 into a side effect free form, we shall need to extend the functions sefree and side_effects from Figure 2 to handle conditional expressions. The rules which do this are given in Figure 5. Here, we need an update function to take account of the side effects which occur when the predicate part of the conditional expression is evaluated. These side effects change the values of expression in the following then and else part of the conditional expression. For example update(i--, ++i) produces i+1, the meaning of the expression i-- when the side effect ++i is taken into account. Applying the function sefree to the conditional expression ++i?i--:0 gives us the expression (i+1)?(i+1):0.

if(++i?i--:0) x=1 ;

Figure 4 – A version of Figure 1 with explicit short circuit evaluation.

The statement in Figure 1 is an if statement, so we need a rule for sefree and side_effects which can be applied to if statements. This rule is given in Figure 6. To make things simpler, I have assumed that the statements in the then part of the if statement do not affect any of the other variables in the predicate of the if statement. Without this assumption, the transformation would be more complicated, but still possible. Using the rule sefree on the program fragment in Figure 4, we obtain the equivalent fragment in Figure 7. Namely, a side effect free version of the fragment we started off with in Figure 1.

side_effects(e1?e2:e3) = if (sefree(e1)) { side_effects(e1); side_effects(e2); }

else { side_effects(e1); side_effects(e3); }

sefree(e1?e2:e3) = sefree(e1)?sefree(update(e2,e1)):sefree(update(e3,e1))

Figure 5 – Side effect removal for conditional expressions.

Using transformation to simplify a program

Although the fragment in Figure 7 contains no side effects, it is still hard to understand its effect, as removing side effects has made it far longer. Fortunately, transformation can help us here too. Often transformation is used as a tool for program simplification, effectively massaging the program into a simpler, but equivalent form. These transformations are easier to apply to side effect free programs than to those which contain side effects, so side effect removal is a natural first step. Having done this we can apply more general transformations to our programs. Applying these transformations is rather like treating the program as a fragment of mathematics and performing algebraic simplification upon it. Luckily for the programmer, this algebraic simplification can be automated, so it will not be necessary to become a mathematician in order to use program transformation.

sefree(if (e) s) = if (sefree(e) s; side_effects(e);

Figure 6 – Transforming if statements into a side effect free form.

Some simple examples are listed in Figure 8. All of them assume that expressions are free of side effects. The transformations in Figure 8 are called axioms because they can always be applied to side effect free expressions and statements – they are axiomatic. The first four axioms apply to expressions, while the last two apply to statements. In the final axiom, the function sub, is used to transform an expression. The function call sub(x,i,y) yields the expression you get when all occurrences of the variable i in the expression x are replaced by the expression y. For example, sub(i*i+p,i,q+1) would yield the transformed expression (q+1)*(q+1) + p. That is, a version of i*i+p in which both occurrences of i have been replaced by the expression q+1.

if ((i+1)?(i+1):0) x=1;

if (i+1) {i=i+1; i=i-1;}

else i=i+1;

Figure 7 – Side effect free version of the fragment in Figure 1.

Figure 9 contains a more complex transformation. This transformation is called a rule because it requires a test to be performed before it can be applied – it is not axiomatic. Specifically, the rule can only be performed provided def(s1) Ç ref(s2) = Æ . The function call def(s1) yields the set of variables defined (assigned a value) in statement s1. The function call ref(s2) yields the variables referenced (mentioned in expressions) in the statement s2. The proviso def(s1) Ç ref(s2) = Æ requires that there are no variables which are both defined in s1 and also referenced in s2. That is, the two sets must have no intersection. If this proviso is not satisfied then the transformation cannot be performed, because it might change the meaning of the program fragment.

axiom1(e?e:0) = e

axiom2(e?1:e) = e

axiom3(e-e) = 0

axiom4(e+0) = e

axiom5(i=i;) = ;

axiom6(i=e1;i=e2;) = i = sub(e2,i,e1);

Figure 8 – Some transformation axioms.

Transformation rules are often defined in this mathematical form, because it allows us to prove that the transformations are correct (that the transformed and original program fragments have the same effect). This proof is the responsibility of the provider of a transformation system and it gives us a perhaps more certain guarantee that we can rely on the transformations it produces.

rule(if(e) s1; if(e) s2; else s3;) = if(e) {s1;s2;} else s3;

provided def(s1) Ç ref(s2) = Æ

Figure 9 – Some transformation rules.

Using the transformation rules in Figure 8 and 9, we can perform significant algebraic simplification on the side effect free fragment we produced in Figure 7. Let's start with the two assignments i=i+1; i=i-1;. Clearly these cancel one another out. A transformation system could realise this fact by applying a sequence of transformations. By applying axiom6, i=i+1;i=i-1; becomes i=i+1-1;, which using axiom3 becomes i=i; which by axiom5 becomes ;. Thus the fragment in Figure 7 can be transformed to the simpler version in Figure 10, which by applying the transformation rule in Figure 9 and axiom1 of Figure 8 becomes the further simplified fragment in Figure 11.

if ((i+1)?(i+1):0) x=1;

if (i+1) ;

else i=i+1;

Figure 10 – Simplified version of Figure 7.

Finally, we could transform the predicate i+1 to i+1!=0, which could be transformed to i!=-1 (by subtracting one from both sides) which would give us the final fragment in Figure 12.

if (i+1) x=1;

else i=i+1;

Figure 11 – Simplified version of Figure 10.

In many cases, the transformation system is simply recognising simplifications which would have been obvious to a well trained programmer. This is true of all the individual transformation steps. Each is extremely simple. However, the combined effect of applying a sequence of simple transformations can be startling. Its rather like an ant colony: each ant is very simple and has a very mundane role, but the action of the entire colony, with all ants acting in concert, can be astonishingly complex and sophisticated. In program transformation, the whole is often far greater than the sum of its parts.

if(i!=-1) i=0; else x=1;

Figure 12 – Final transformed version of the fragment in Figure 1.

Automation

Using transformation, we have transformed the code fragment if (++i && i--) x=1; into the more understandable fragment if (i!=-1) i=0; else x=1;. It has been quite an arduous task, involving many simple transformation steps. No programmer could be expected to perform these transformations by hand, it would be easier to study the original, even with its complex side effects. The whole transformation process could have been automated by a transformation tool, which a programmer could use as an assistant during development and maintenance.

Just such a program transformation tool is available (for free) on the internet. It is called the Maintainers' Assistant and was developed by researchers at the University of Durham's Centre for Software Maintenance, with funding from the Engineering and Physical Sciences Research Council (EPSRC). From the Maintainers' Assistant, an industrial strength tool, FermaT was developed.

The Maintainers' Assistant is currently only available on Unix/Linux systems, running X Window. It works by compiling the source program into a Wide Spectrum Language (WSL), which is sufficiently general that it can represent programs in languages as diverse as C and IBM 370 assembler. Transformations are selected from a menu, using a point and click interface. Many individual transformation steps are combined into higher level transformation tactics, which perform a sequence of steps with the aim of restructuring a program in a particular way. Also, many of the transformations used by the Maintainers' Assistant have been proven correct, using mathematical logic, in research papers and PhD theses. The Maintainers' Assistant (both source code and executable) and research papers describing its design, implementation and application can be downloaded from http://www.dur.ac.uk/CSM/projects/ma/.

Even given the power of tools like the Maintainers' Assistant, it is unlikely that transformation will ever be a completely automated activity. Rather we can expect to see tools which work interactively, alongside the software developer/maintainer, removing much of the tedium from source code analysis, restructuring and simplification. These systems will sometimes need to be guided by the developer, who will know better (using domain knowledge, experience and that all-important programmer insight) which transformation strategy works the best in each case.

The future of program transformation

Program transformation technology continues to be developed. Systems like the Maintainers' Assistant represent the current state of the art. They are capable of automating many simple transformations as well as more sophisticated sequences of transformations, and have already been applied to several real world restructuring problems.

As compilers become ever more sophisticated they will employ more and more transformation technology. In the future, compilers may become much more than mere translators that convert one program notation into another. Using source-level transformation technology, the gap between compilers and CASE tools will continue to diminish to the point where programmers have sophisticated tools at their disposal, capable of integrating analysis, re-engineering, understanding, testing and measurement.

Much of the work on transformation has hitherto been concerned with conventional 3GL programming notations such as Pascal, assembler and C (as well as functional style languages like ML). Transformation technology has yet to catch up with the object oriented paradigm, which presents formidable, though surmountable problems for automated program transformation. We can expect to see significant progress in this area.

Mark Harman is director of research at the School of Computing in the University of North London (http://www.unl.ac.uk/~mark/projproj.html). Together with researchers from the Universities of Durham and Brighton, he is currently seeking funding to apply transformation technology to Ada95 programs, in order to study concurrent transformation and slicing (See EXE, October 1996). Dr. Harman is also working with Chris Kopec to use genetic algorithms (See EXE, April 1997) to achieve improved algorithms for slicing and transformation. You can email him at m.harman@unl.ac.uk.

The ‘Year 2000’ problem

Transformation can be used to isolate and re-engineer system components which are affected by, among other things, the use of two digit date fields. So the technology has a role to play in mitigating software crises which may arise from ramifications of the ‘Year 2000’ problem. The Centre for Software Maintenance at Durham University has set up a ‘Year 2000’ strategy group (http://www.dur.ac.uk/CSM/themes/year2000/) to provide a research-based consultancy service concerning the ‘Year 2000’ problem. Much of the work of the Durham group is based on transformation using the Maintainers' Assistant and its big brother, FermaT.

Bug detection

Because transformation can be used to rewrite a program, and can simplify fragments of code, it is often helpful in detecting errors. It can be particularly effective when combined with program slicing. Slicing is a technique for removing statements from a program which have no effect upon some chosen variable at some point within the program. Thus, slicing is really just another form of program transformation (See EXE, October 1996).

Consider, for example, the following fragment of code:

for(i=0,sum=a[0],biggest=sum;i<20;sum=a[++i])

if (a[i] > biggest) biggest = a[i];

The variable sum is supposed to contain the sum of the first 20 elements of the array a. By focusing on the value of sum and applying transformation and slicing, we can produce a simplified version of the program concerned only with the variable sum. This will require more transformation rules than can be reasonably listed in this article, but each transformation will be defined in a similar way to the ones which we have already met. If we apply transformation to the fragment above, we could produce the simplified fragment sum = a[20];, which has the same effect on sum as the original. We can see clearly that something is not right with the value of sum. We would at least have expected it to be contained within some form of loop. (The error is that the original assignment expression sum=a[++i] should have been written sum=sum+a[i++].)

 

(P)1997, Centaur Communications Ltd. EXE Magazine is a publication of Centaur Communications Ltd. No part of this work may be published, in whole or in part, by any means including electronic, without the express permission of Centaur Communications and the copyright holder where this is a different party.

EXE Magazine, St Giles House, 50 Poland Street, London W1V 4AX, email editorial@dotexe.demon.co.uk