Hi all,
We are building some software modules that call funcitons and can go deep on the stack with the right combination of interrupts. We are working to find the combination that is causing stack overflows.
It seems MPLab doesn't show the stack with our 18F87J11 (or all pic 18s??). We also can't set breakpoints on it, for example: "if stack hits 25 deep, break". Or even break on write.
We could work with the stack if we were able to run our code in the MPLab simulator but we need to access peripheral controllers, so that's out.
And suggestions on this?
Cheeers,
-Tom
How do we debug stack overflows in MPLAB?
Moderators: David Barker, Jerry Messina
-
- Swordfish Developer
- Posts: 1473
- Joined: Fri Jan 30, 2009 6:27 pm
- Location: US
Stack issues are a real pain, and the debuggers don't help much.
Here's a little module that might help. It relies on setting the 'reset on stack error' config bit, and captures the stack contents into an array after the reset occurs. If the module detects a stack overflow, it'll automatically stop the debugger once it's captured the stack
stackdebug.bas:
Here's a main program you can use to test it
You have to make sure NOTHING modifies the stack/STKPTR before this module runs, and if you want to capture the whole stack it needs a fair amount of ram (128 bytes)
Here's a little module that might help. It relies on setting the 'reset on stack error' config bit, and captures the stack contents into an array after the reset occurs. If the module detects a stack overflow, it'll automatically stop the debugger once it's captured the stack
stackdebug.bas:
Code: Select all
module stack_debug
// enable reset on stack fault config setting
config STVREN = ON
// the stack is 21-bits wide, so we use a 32-bit value for the stack() array
// if you have a device with < 64K of flash, you can cheat here and just use
// a 16-bit word (which saves ram). this should work ok unless you have a
// REALLY serious stack error issue. if in doubt, then comment this out and
// just use 'type STACKENTRY_T = longword'
#if (_maxrom > $ffff) then
type STACKENTRY_T = longword
#else
type STACKENTRY_T = word
#endif
// stack fault bits (located in the STKPTR reg)
const STKUNF = 6,
STKOVF = 7
// the pic18 stack has 32 entries, 0-31
const STACK_SIZE = 32
// set this to the beginning stack entry you wish to capture.
// normally this would be 0 to view the entire stack, but if you can't
// spare that much ram then you can use this to skip over the beginnning
// entries and just look at the later part of the stack.
// for example, setting STARTING_ENTRY=20 will just capture the last 12
// entries (32-20=12)
const STARTING_ENTRY = 0
// stack contents array
public dim stack(STACK_SIZE-STARTING_ENTRY) as STACKENTRY_T
// software breakpoint instruction
// this is undocumented... it assembles to an opcode of 0x00E0
// it will stop any of the hardware debuggers
public inline sub _trap()
asm
trap
nop // added for breakpoint skidding
end asm
end sub
if (STKPTR.bits(STKOVF) = 1) then
FSR0 = addressof(stack) // get address of the stack array
STKPTR = STARTING_ENTRY // start at the beginning of the stack
repeat
POSTINC0 = TOSL // read contents of the stack into the stack array
POSTINC0 = TOSH
if (sizeof(STACKENTRY_T) = sizeof(longword)) then
POSTINC0 = TOSU
POSTINC0 = 0
endif
STKPTR = STKPTR + 1
until (STKPTR = 0) // and loop until we've wrapped (32 entries)
// the debugger will stop here
// you can inspect the stack() array in the watch window
_trap()
endif
// clear stack error flags
STKPTR.bits(STKOVF) = 0
STKPTR.bits(STKUNF) = 0
// and the stack cache
clear(stack)
end module
Code: Select all
program stack_test
'device = 18F4520
// include this module before all others
// it MUST run before anything modifies the stack in order to capture
// the post-crash/reset info. also, make sure 'config STVREN=ON'
include "stackdebug.bas"
//
// generate a stack overflow. if everything is set correctly, this should
// cause a STVREN reset, and you should end up at the _trap() instruction
// located in stackdebug.bas, where you can inspect the stack() array using
// the debugger watch window
//
// note: the numbers here are dummies, and are ignored... we're not
// really pushing a value, just the current program counter
//
asm
push 1
push 2
push 3
push 4
push 5
push 6
push 7
push 8
push 9
push 10
push 11
push 12
push 13
push 14
push 15
push 16
push 17
push 18
push 19
push 20
push 21
push 22
push 23
push 24
push 25
push 26
push 27
push 28
push 29
push 30
push 31
push 32
push 33
end asm
// you should never get here
while (true)
end while
end program
Nice! Thanks, Jerry!
What we did last night was run a number of PUSH commands in assembler to build the stack up, then ran the rest of our application as a function call. Thus we started with a stack of say, 20, then simply watched to see where it failed.
Then we watched the stack depth by pausing at critical points. We discovered that if we had multiple interrupts being serviced at the same time, it was possible to overflow the hardware stack. Thus we put in a condition where EUSART1 cannot be serviced while EUSART2 is being serviced. This prevented the stack going an extra 7 layers deep.
However, the code you posted gives us a much better diagnostic tool, where we now can see exactly where the calls originated. In comparison, what we were doing was the civil engineering equivalent to testing bridge maximum loads by driving heavier and heavier trucks over new bridges to see at what weights cause a collapse!
Thanks,
-Tom[/list]
What we did last night was run a number of PUSH commands in assembler to build the stack up, then ran the rest of our application as a function call. Thus we started with a stack of say, 20, then simply watched to see where it failed.
Code: Select all
#if debug_smallStack
sub stackFillThenRunApp()
while Sys.currentStackHeight < StartingStackSize - 1
ASM
PUSH
end ASM
wEnd
while true
runApplication()
wend
end sub
#endif
However, the code you posted gives us a much better diagnostic tool, where we now can see exactly where the calls originated. In comparison, what we were doing was the civil engineering equivalent to testing bridge maximum loads by driving heavier and heavier trucks over new bridges to see at what weights cause a collapse!
Thanks,
-Tom[/list]
-
- Swordfish Developer
- Posts: 1473
- Joined: Fri Jan 30, 2009 6:27 pm
- Location: US
Good detective work! Shame the hardware tools don't help much with this sort of thing. I know you can get the RealIce (and I think the ICD3) to stop on a stack overflow, but you STILL can't view the stack contents.We discovered that if we had multiple interrupts being serviced at the same time, it was possible to overflow the hardware stack
This points out one of the pitfalls with using multiple interrupt priorities, and why it's important to always do as little as possible in the ISR. Add multi-level interrupts to the mix and it can use up a lot of resources quickly.
Of course, some folks take that to the extreme and recommend that you "just set a flag in the ISR" and handle everything in the main loop. If you're going to do that then there's no point in using the interrupt in the first place! I don't think they understand that in doing that they just turned an interrupt-driven system into a completely polled one... a complete waste of time with a lot more effort.
Anyway, glad you got it sorted out.