
The ECN No Name Newsletter is no longer being published. This is an archived issue.
[previous article] [next article]You may recall a January 1988 article in the No Name Newsletter regarding floating point performance on Suns. Under the previous release of SunOS (3.5) the best thing you could do was to use the "-f68881" flag when compiling your programs. This instructed the compiler to make calls to code using the Motorola MC68881 Floating Point Co-processor which is installed in all ECN Sun3 workstations.
Under the current release of SunOS (4.0) you can go a bit further by actually putting MC68881 floating point instructions directly into your program. This technique, called "inlining", saves a considerable amount of time over the normal method of calling a floating point routine. I tested this method with a discrete fourier transform program I use for benchmarking. I got the following results on an idle Sun 3/50:
% cc -f68881 -O -o no-inline dft.c -lm % time no-inline < testdata 935.5u 0.5s 15:39 99% 0+128k 1+0io 2pf+0w % cc -O -o inline dft.c /usr/lib/f68881/libm.il % time inline < testdata 547.4u 0.3s 9:09 99% 0+120k 1+0io 1pf+0w
The first three of lines of the above example shows how I compiled my dft program and the time it took to run. This is using the "-f68881" method and it runs in about 15 minutes.
The second set of lines show how simple it is to compile the program using the "inline" technique. Note the speed increase! With just a recompilation, my dft program now runs in only 9 minutes!
The inline file "/usr/lib/f68881/libm.il" is not a library, but a set of assembly language routines and replacement rules. These are used by an optimization pass of the compiler to replace function calls to math routines in your program. These assembly language routines are inserted in the middle of your program replacing the function calls, hence the name, "inline".
Inline routines are faster for two reasons. First, they avoid the overhead of making a function call, saving several instructions each time they are used. Second, and most importantly, they are hand-assembled, making direct use of the floating point co-processor in the most efficient manner possible.