Tips for optimising bank switching?

SHughes_Fusion · Post by **SHughes_Fusion** » Wed Oct 28, 2015 4:42 pm

The program I'm working on currently implements two serial buffers, one for receive and one for transmit.

I was rather surprised when I increased the sizes from 16/64 bytes to 32/128 bytes that my code size also increased - by 133 words!

After comparing the before and after .asm files it appears the cause is a load of extra MOVLB commands the compiler has had to insert to handle RAM bank switching.

Presumably the compiler doesn't optimise the location of variables to minimise bank switching, but is there any way to 'help' by forcing variables to be in a particular bank if they are associated?

Jerry Messina · Post by **Jerry Messina** » Wed Oct 28, 2015 5:31 pm

Subroutine parameters, local variables, and temps all come from the stack frame which gets allocated first. They start in bank 0 and continue on using as many banks as reqd.

After that come the module level static variables and if I remember correctly they mostly get allocated in the order they're seen in the 'include' list of modules. Variables in a module tend to be in the same bank, but there's no guarantee since it'll cross banks if it needs to.

What tends to happen is that your buffers are typically module level, so they end up in a latter bank but the routines that access them are using a lot of bank 0 variables for temp things like indices, locals, etc.

Sometimes there are games you can play with making what would normally be local variables module level static (which increases your total RAM usage), but sometimes that's a game of Whack-A-Mole as you change code, add new modules, etc. and things change. If you include the important modules first that can sometimes help.

There's also a trick of putting your important data declarations into their own file and then using the protected region feature and absolute declarations to locate the data in bank 0. I've used that method in a few cases where I really needed every clock cycle I could squeeze out for a few special functions. It works best if you can locate all the variables into a single structure or two, and you have to do a little manual housekeeping, so it's not generic and it might actually make things worse for other variables. You could use the same trick and put them at the end of memory, but then you run into the locals being in a different bank again.

Thank the PIC18 designers and the marketing folks with their "no banking required" nonsense.

You can always locate everything by hand (That's a joke, son. That's a joke).

SHughes_Fusion · Post by **SHughes_Fusion** » Thu Oct 29, 2015 8:25 am

Thinking back, when I used to use the BKND compiler for the PIC12/16 series, that had the ability to specify which bank a particular variable was stored in.

Does Swordfish offer a similar function? I guess that ideally the compiler would intelligently work out where to put variables to minimise bank switching but I'd guess also that such a function wouldn't be trivial to implement.

I've found that by moving the comms module to the last one to be included saves 25 instructions. The comms module dims both the buffers and the pointers so hopefully they'd be in the same bank - unless of course they end up crossing a bank boundary...

The processor I'm using only has two RAM banks (14K22) but I can see for bigger processors this could well get very messy!

I'll have a play and see if reordering the declarations helps at all. When it comes down to it I have plenty of code space left so a few extra instructions don't matter, I just don't like to see things not being as efficient as possible...

SHughes_Fusion · Post by **SHughes_Fusion** » Thu Oct 29, 2015 11:57 am

Wow - I've just found one thing that really should go in the 'how to optimise your code' FAQ.

I have a 'Program Globals.bas' in which I define all variables which need to be global rather than sub / function temporary.

I've moved *all* variable definitions in to that - I had quite a few which I'd stuck in the main program for ease of reference.

Moved them, made them public, recompiled and the code size reduced from 9915 bytes to 9059 bytes.

Now, the only Dim statements I have in my main program are 'redirectors' - i.e. giving port pins an alias.

It's not ideal as Swordfish doesn't allow project-level searching so to check on the declaration of a variable you need to change source files but for a 10% saving in code space I won't argue! (And presumably also an increase in program efficiency as that's 10% less instructions to execute)

Jerry Messina · Post by **Jerry Messina** » Thu Oct 29, 2015 12:11 pm

Moved them, made them public, recompiled and the code size reduced from 9915 bytes to 9059 bytes

That's interesting. I wonder why. I do know that moving declarations around can have an effect on things as the banking requirements change.

...that had the ability to specify which bank a particular variable was stored in. Does Swordfish offer a similar function?

Not that I'm aware of.

About the closest thing is using the absolute attribute and locate them yourself.
Here's a test program I had showing some different methods of doing this if you want to play around with it... I modified it for the 14K22

Code: Select all

device = 18F14K22
// the 14K22 has 512 bytes of ram (bank0 + bank 1)

// enable stack frame variable diagnostics (results in the .asm file)
// note: you won't see anything located 'absolute' in the 'xx variable bytes used' stats 
// you'll have to look at the variable addresses in the .asm file
#option _showvar = true
#option _showalloc = true

// this routine uses a choice of 4 allocation methods... pick one
#option METHOD = 1  

// some dummy allocations to simulate normal program ram use
// this will start in bank 0 
// try changing the array size to 256 so it spills over into bank1 and see what happens
// if you use methods #1 or 2 things will overlay and you won't get any warnings
// if you use method #3 you'll get a 'variable allocation exceeds device maximum' error
// if you use method #4 the array can be as large as what's available
dim var_array(100) as byte
dim var_end as byte         // marker so we can see where dummy() ends

// locate some vars at a fixed location in bank 1
// when you use 'absolute' the compiler doesn't track, check or reserved the memory so you're on your own
// if things overlay or conflict, tough luck
// note: you can't use 'sizeof()' in a compiler directive, so you have to do the locations by hand
#define BANK1 = $100

#if (METHOD = 1)
// method #1 - individual absolute declarations
dim rxix as byte absolute BANK1
dim rxhead as word absolute (BANK1 + 1)         // + sizeof(rxix)
dim rxtail as word absolute (BANK1 + 1 + 2)     // + sizeof(rxix) + sizeof(rxhead)
dim rxbuf(64) as byte absolute (BANK1 + 1 + 2 + 2)    // + sizeof(rxix) + sizeof(rxhead) + sizeof(rxtail)

#elseif (METHOD = 2)
// method #2 - easier to just put everything into a struct and locate that absolute
structure rx_t
    rxix as byte
    rxhead as word
    rxtail as word
    rxbuf(64) as byte
    marker as byte      // so we can find the end of the structure
end structure

// you could just do this to locate the struct, but if you do the compiler won't track the memory
dim rx as rx_t absolute BANK1

#elseif (METHOD = 3)
// method #3 - use 'protected region' to reserve the absolute memory so the compiler won't use it
// (this will also split the memory available to SF into two disjointed regions)
// unfortunately, if you want to limit the block size to only what's required we need to know the 
// sizeof(rx_t) at compile time, but since you can't use 'sizeof()' in a compiler directive
// we have to compute the size manually. later, we'll use a macro to verify things
structure rx_t
    rxix as byte
    rxhead as word
    rxtail as word
    rxbuf(64) as byte
    marker as byte
end structure

#define SIZEOF_RX_T = 1 + 2 + 2 + 64 + 1        // sizeof(rx_t)
#define _protected_a_start = BANK1
#define _protected_a_end   = BANK1 + SIZEOF_RX_T

dim rx as rx_t absolute _protected_a_start

// verify that we reserved the proper size for the protected region
// macros are evaluated at compile time, so this is a sneeky way to get the
// compiler to use 'sizeof()' to check the protected block size
macro check_protected_a_size()
  const _prot_a_start = _protected_a_start
  const _prot_a_end = _protected_a_end
  if ((_prot_a_end - _prot_a_start) < sizeof(rx_t)) then
      CheckParam(0, "_protected_a reserved block too small")
  endif
end macro
// invoke the macro to check the reserved block size
// you'll get a compiler error if the #defines don't match
check_protected_a_size()

#elseif (METHOD = 4)
// method #4 - use 'protected region' to reserve memory, but put it at the end of ram
// instead of at the beginning of BANK 1
// this makes a larger contiguous block available to SF so you can use larger arrays
structure rx_t
    rxix as byte
    rxhead as word
    rxtail as word
    rxbuf(64) as byte
    marker as byte
end structure

#define SIZEOF_RX_T = 1 + 2 + 2 + 64 + 1       // sizeof(rx_t)
#define _protected_a_start = _maxram - SIZEOF_RX_T
#define _protected_a_end   = _maxram

dim rx as rx_t absolute _protected_a_start

// verify that we reserved the proper size for the protected region
// macros are evaluated at compile time, so this is a sneeky way to get the
// compiler to use 'sizeof()' to check the protected block size
macro check_protected_a_size()
  const _prot_a_start = _protected_a_start
  const _prot_a_end = _protected_a_end
  if ((_prot_a_end - _prot_a_start) < sizeof(rx_t)) then
      CheckParam(0, "_protected_a reserved block too small")
  endif
end macro
// invoke the macro to check the reserved block size
// you'll get a compiler error if the #defines don't match
check_protected_a_size()
#endif

main:
// some code to access the vars so they don't get optimized out
var_array(0) = 0
var_end = 0

#if (METHOD <> 1)
rx.rxix = 0
rx.rxhead = 0
rx.rxtail = 0
rx.rxbuf(0) = 0
rx.marker = $55
#endif

SHughes_Fusion · Post by **SHughes_Fusion** » Thu Oct 29, 2015 12:38 pm

I'll have a play with your code and see what I can learn from it.

I had saved the .asm from an older version of my code with a buffer size of 8. I can't recall the exact size this compiled to, around 9600 bytes I believe.

This contains 323 MOVLB instructions.

The .asm with all variable declarations moved in to a separate file contains 87 MOVLB instructions, so a saving of 472 bytes. Also, with this approach the code size does not vary with serial buffer size.

So it would appear it is mainly or possibly entirely due to the compiler not needing to insert as many MOVLBs that I am seeing the code size reduction. And by implications, defining as many variables as possible in the same place - but not in the main source file - seems to allow the compiler to better optimise where they are allocated.

I think it could be useful to start a section in the Wiki for code saving tips like these as it is only by chance I've discovered what savings are possible.

Jerry Messina · Post by **Jerry Messina** » Thu Oct 29, 2015 2:44 pm

It's been a while since I looked at this, but I don't recall the compiler really doing much to optimize variable locations.

From what I remember, they get allocated pretty much like this:
1) SF system variables (25 bytes or so)
2) frame variables (they would start in bank 0, at location 25)
part of bank 0 is also the special 'access bank' which can be used w/out bank selects
3) module level static variables from the "include xxx.bas" files, in the order they're specified
(this includes SF libraries, too)
4) variables in the main program module

What you end up with is very dependant on the amount of data, the order they're declared, and how much of that bank 0/access bank frame space gets used.
Putting them all into one file doesn't really change that, except it's a bit easier to control the order they're defined.

I think if you were to play around with your "globals.bas" you could get things arranged such that you'd end up right back where you started.

SHughes_Fusion · Post by **SHughes_Fusion** » Thu Oct 29, 2015 2:57 pm

You're right.

I tried moving things around in the Globals.bas file. The first variables I moved made no difference.

Then I moved two Integers from the end of the file to the middle. Program size went up by 87 instructions. Searching the file, there were 80 more MOVLB instructions...

Now I'm wondering whether I've inadvertently hit pretty much the most efficient organisation of variables by chance of if I spend ages trying different arrangements to see if I can save more...

Jerry Messina · Post by **Jerry Messina** » Thu Oct 29, 2015 4:37 pm

I think you probably lucked out on your first attempt. You managed to save almost 10%, so I'd call it quits and go buy a lottery ticket.

This is your lucky day.

SHughes_Fusion · Post by **SHughes_Fusion** » Fri Oct 30, 2015 2:00 pm

I don't give up that easily!

With a bit more messing I've knocked another 40 MOVLBs off. I can't see it getting any smaller though.

I've now started looking at other, bigger programs. One that compiled to 19503 bytes using 2495 bytes of RAM contains 1589 MOVLB instructions. That's over 16% of the total code, just RAM banking... (Why are Swordfish programs always an odd number of bytes long?)

I'd not realised just how inefficient the PIC18 core was in this respect... Are the PIC24 and PIC32 any better?

(As an aside, not a Swordfish question, but we are looking for a core for projects that need more power than the PIC18. We can't decide between PIC24 and PIC32 - any insight?)

Time to start fiddling with my variable definitions I think!

Jerry Messina · Post by **Jerry Messina** » Fri Oct 30, 2015 5:58 pm

I'd not realised just how inefficient the PIC18 core was in this respect... Are the PIC24 and PIC32 any better?

Neither of them have banked memory, so from that respect both the 24 and 32 are better. (that's not 100% true for the PIC24, but the banks are 32K+)

A lot of folks seem to skip over the PIC24 series and go straight to the 32-bit series. Personally, I prefer the PIC24. It's pretty similar to the PIC18 architecture-wise (without the banks of course), while the PIC32 is a MIPS-based core so it's totally different.

From what I've seen, on a MHz to MHz comparison performance can be pretty similar between them even though it's 16 vs 32-bits. There's a lot of caveats to that of course so it really depends on what your application is. The kind of stuff I usually do wouldn't benefit much from using 32-bit.

Before you decide on a PIC32, I'd take a few days and read through the Microchip forums. They've made some real boner moves with some of the 32-bit chips (esp the newer MZ series) , and they also seem to be taking more of a blank-box approach to the development side of things with their Harmony libraries. I'd be real leery of that since up till now Microchip's strong point hasn't been software and they seem to continue bungling their way along with unfinished development tools, code generators that don't work, and updates that are never backwards compatible. In general, I don't know what they're doing over there, but lately it doesn't seem like they know either. Everything "will be addressed in a future update"... unfortunately they've now been saying that for years.

If you don't want to switch to using C, David has his free Firewing compiler that can target either family (http://www.firewing.info) ... very similar to Swordfish.

Jerry Messina · Post by **Jerry Messina** » Fri Oct 30, 2015 6:16 pm

Oh, and the "program memory bytes used" comes from MPASM, and includes the CONFIG bytes.

The number of config bytes varies, but there's an odd number of them for a lot of chips.

David Barker · Post by **David Barker** » Sun Nov 01, 2015 7:07 pm

I'm with Jerry, check out the the PIC24 series. I use the 24FJxxGA002 quite a lot in newer projects and it's a really nice chip to use (can build natively in Firewing, so it's free). You get more instructions per KB than the PIC32 (24 word rather than 32) and as Jerry says, the ASM is much more agreeable. I use QFN much of the time, about £1.20 GBP in 100's. About £1.35 SPDIP in similar quantities...

SHughes_Fusion · Post by **SHughes_Fusion** » Mon Nov 02, 2015 10:12 am

Thanks, both. I was veering towards the PIC24 anyway so what you've said has confirmed the choice in my mind.

It seems a more 'integrated' option - I get the feeling Microchip decided they had to have a 32 bit core and didn't have the time / resources to design their own so just stuck the MIPS one in there and tacked their own stuff around it. Maybe there is also the feeling that with a 32 bit core you've got so much power you don't need to worry about optimising stuff properly and can just go down the 'black box' route.

Our stuff is all monitoring and control. Absolute power isn't necessary, peripheral integration is more important.

octal · Post by **octal** » Mon Nov 02, 2015 2:46 pm

24FJxxGA002 series is very nice. 24FJ devices tend to have nice and very rich peripheral set. They exists in various packages and are well established architecture. The main problem with PIC24 series is that the silicon errata datasheets seems to never shorten even with the new versions of silicon releases.
While PIC24 are very mature and very good chips (PIC24HJ are very fast), I hate the "still banking" thing that appears on some of them. The PSV thing is "bizarre" ... and the banking introduced in the PIC24 chips having more than 64KB of RAM (PIC24xxDA serie for LCD and graphics acceleration) is really strange (I understand the 16bit limits but...
Microchip also does a terrible work at hiding the 24 bit flash addressing in GCC in order to have flash and ram pointer compatible with each other. Having a section with "big GOTOs" is amazing.

PIC32 are another story. While I don't really share the opinion of David and Jerry about MIPS core, I think that Microchip is really missing the 32 bit era of microcontrollers. I personally really like MISP assembly (and core). It looks like elementary BASIC language. And for the HARMONY framework you are not forced to use it. Most of PIC32 peripherals are identical to those of PIC24, so you can really write your drivers easily (or use old microchip libraries). The only thing that can be fooling with PIC32 is the multilevel cache (and the branch instruction with its "delay slots").
PIC32MZ are another story. Microchip released a completely buggy silicon. Even the peripherals are completely buggy (ADC channels that returns random values, SPI rated to 50MHz that can't go beyond 21MHz, buggy QSPI...). We were waiting for Microchip to introduce a corrected version this year, but surprisingly they released the PIC32MZxxEF version (I never tested it).
While PIC32 with their 28 pin parts and the big PIC32MX with their 128KB of RAM are excellent chips, I really hate the rubbish Microchip is doing with its 32 bit chips. I would have preffered to see either ARM chips from Microchip, or maybe see PIC32 released with 1 year of delay but at least correctly debugged.

As for the price, if you want to stick with Microchip, PIC24 are very good upgrade to PIC18 (PIC16 and PIC12 are useless unless you need CLC or 8/6 pin chips). Most enhancements in actual PIC18 and PIC16 (like pins remapping, RTC, ...) already exists in PIC24 since years. But if changing architecture can be an option, I think Cortex-M0 chips are unbeatable. You could got some Cortex-M0 as powerful as PIC24 for only 0.80euros (60cts for big quantities), but this change is really costy (steep learning curve, change hardware and debug tools, almost no DIP chips, ...).

Swordfish

Tips for optimising bank switching?

Tips for optimising bank switching?

Re: Tips for optimising bank switching?

Re: Tips for optimising bank switching?

Re: Tips for optimising bank switching?

Re: Tips for optimising bank switching?

Re: Tips for optimising bank switching?

Re: Tips for optimising bank switching?

Re: Tips for optimising bank switching?

Re: Tips for optimising bank switching?

Re: Tips for optimising bank switching?

Re: Tips for optimising bank switching?

Re: Tips for optimising bank switching?

Re: Tips for optimising bank switching?

Re: Tips for optimising bank switching?

Re: Tips for optimising bank switching?