Compiler Comparison

I have to admit I've not been impressed with some of the benchmark tests found on the internet when comparing one PIC compiler against another. Certainly claims such as 'x number of instructions per second' for a high level language is pretty meaningless unless you have some clear terms of reference. In addition, I've seen execution times, ROM and RAM usage tables without any source, so it is impossible to validate the figures given.

Worst still, benchmark tables tend to be presented by the makers of the compiler with a clear bias. In this respect, the following is no exception! However, I've at least given the code used and some terms of reference so the reader can investigate further, if they wish to do so. The samples were built for an 18F452 running at 20MHz. The following compilers were used.

The Evaluation

The first compiler comparision code set was a very simple manchester encode and decode implementation. The code used had the virtue of having a number of standard language constructs. It is clear that control flow consitutes the majority of any real world program. For example, 'if' statements and looping constructs, such as 'for...next'.

The code used in the tests was not intended to be optimised in any way. However, each implementation is designed to match as close as possible the original Swordfish version. ROM and RAM size, in addition to executaion times, were measured without the output routines as specific compilers have different strategies for converting numbers and outputting them. However, this will be discussed later.

What we want to measure is 'like for like' for standard language constructs. I really have to stress the like for like concept. There are many ways to skew the results. For example, using an inbuilt routine that is specific to a single compiler or rewriting code that overcomes a certain limitation. Whilst it is perfectly acceptable to use one or more of these techniques when writing your own code, it seems a little suspect to blindly apply them when trying to obtain any meaningful results in our tests, without putting their use into some context.

Manchester Encode and Decode

The measurements were obtained by using the MCUs 16 bit timer (TMR1), whose start and stop points wrap the code we are interested in measuring. You can view the code used for each compiler test here.

CompilerROM UsedRAM UsedExecution Time (us)
Swordfish BASIC1813580.4
CCS C2221683.0
MikroElectronika BASIC35429202.4
Proton BASIC2027106.2
PICBasic PRO36826390.8

From the above table it can be seen that Swordfish has the fastest execution time for encoding and decoding, closely followed by the CCS C compiler. Swordfish also uses the least amount of program ROM.

However, note that Swordfish performs worst in terms of RAM usage. One reason for this is that Swordfish allocates the first 25 RAM locations as system registers. These are always fixed to allow user ASM routines to access fixed register locations, floating point being one example. It's an advanced technique but very useful.

Perhaps the most important thing to note with RAM usage is that three of the compilers evaluated (Swordfish, CCS and MikroElectronika) utilise recycling algorithms for RAM. This means that RAM is constantly reused. For example, you may have a program which calls just one subroutine and the total cost of RAM is 100 bytes. However, if you then add another 20 subroutines the RAM usage might stay at 100, because RAM is recycled. Unless you allocate large chunks of RAM at the module level, it is unlikely that you would ever exceed most PIC18 series RAM limits when using a compiler that implements an effective RAM recycling strategy.

I personally believe that comparing RAM usage for compilers that implement a RAM recylcing strategy serves little purpose unless the algorithm used (a) adversly affects ROM usage or (b) increases program execution times. In the figures shown above, it can be seen that the Swordfish ROM usage and execution times are extremely good. In short, the RAM allocation in Swordfish is not impacting on ROM usage or executation time in a negative way.

Manchester Encode and Decode - with Output

The measurements above did not include output routines. This is because each compiler has a different strategy for converting numbers to string values and then outputting them. It does skew the results, but worth looking at anyway. CCS C and PROTON use a technique which examines the input stream and generates optimised ASM code. PBP uses a hybrid approach. The input stream is examined and a set of pre-defined ASM macro are linked into the build. Swordfish uses an open source library technique. That is, both the string conversion and output routines are written in the Swordfish language itself and can be viewed and edited like any other Swordfish program. The MikroElectonika approach is similar to Swordfish, but the libaries used are precompiled binaries and cannot be viewed or edited by the user.

The measurements were obtained by using the MCUs 16 bit timer (TMR1), whose start and stop points wrap the code we are interested in measuring. The timer start point is the same as the previous encode and decode example code. The timer end point immediately follows the conversion and outputting of the decoded value. The baudrate for all tests was set at 19200. You can view the output code used for each compiler test here.

CompilerROM UsedRAM UsedExecution Time (us)
Swordfish BASIC5811041534.2
CCS C598191117.0
MikroElectronika BASIC922641897.2
Proton BASIC412191253.4
PICBasic PRO610261612.6

Both CCS C and PROTON compilers performed extremely well in terms of execution time. This was expected, as both compilers have specialist conversion and output routines that can be tuned during compile time. The PROTON compiler also generated the smallest ROM footprint of all of the compilers, followed by Swordfish.

The execution speed for Swordfish comes after CCS and PROTON. This is a very respectable time, considering that all conversion and output routines are written in the Swordfish language itself and are only evaluated at runtime.

From the above figures, it could be said that compilers with inbuilt (and therefore optimised) routines will perform better that compilers that depend on runtime libraries, such as Swordfish. Is this true?

Inbuilt or Library?

The next evaluation looked at harware I2C. Both CCS C and PROTON use 'inbuilt' code generation. Swordfish and MikroElectronika implement I2C using runtime libraries. Like before, the Swordfish libraries are written in the Swordfish language - the MikroElectronica libraries are pre-compiled binaries.

The following measurements were obtained by using the MCUs 16 bit timer (TMR1), whose start and stop points wrap the code we are interested in measuring. You can view the output code used for each compiler test here.

CompilerROM UsedRAM UsedExecution Time (us)
Swordfish BASIC265334819.6
CCS C260114876.6
MikroElectronika BASIC454314839.6
Proton BASIC362114949.8

Most of the execution time is spent waiting for a character to be written to external I2C EEPROM. However, it can be seen that the Swordfish version has the fastest execution time for writing and reading. It's ROM footprint is also very small, coming a very close second to the CCS compilers 260 bytes. It should therefore not be assumed that a compiler with an inbuilt routine for a given task will always outperform a compiler that does not.