Tracing on STM32 discovery

There are two separate trace blocks in Cortex-M3: the ITM and ETM, that is, Instrumentation Trace Macrocell and Embedded Trace Macrocell. Yeah, the names tell nothing about the difference between the two. ITM does higher level, smaller bandwidth, less intrusive traces like watchpoints, interrupts and periodic program counter sampling. ETM, on the other hand, traces every single instruction executed by the CPU. In practice it is usually necessary to slow down the CPU to use ETM, while ITM can be used without slowdown.

ITM traces

The ITM trace can be configured to contain:

Program counter value every N instructions, N >= 32
Interrupt entry and exit
Watchpoints (up to 4), i.e. data values and address of accessing code
Software events, basically acting as a high-speed serial port for debug prints
DWT events, i.e. a notice every time N instructions/cycles/ram loads have been executed, useful for code profiling

In the trace above we can see the code toggling the PC8 GPIO pin. I have set a watchpoint on the GPIOC registers, and we get information that the watchpoint was triggered by code at address 0x????1010 which wrote 0x00000100 to the register.

After that we get a Software trace packet, which the code sends by calling ITM_Print(0, "On");. The zero is the port number: there are 32 ITM trace ports, which can be used to keep prints from e.g. interrupts and other code separate from each other. A nice thing is that you can also send through 32 bit binary values without having to format them as text.

There are no interrupts visible, so the Current Mode is kept as "Thread", i.e. outside interrupts. Also the PC sampling doesn't occur during this zoomed-in view, so the Current Location doesn't change.

ETM traces

The ETM trace always contains:

Program counter value and CPU state periodically, to synchronize the trace viewer
Interrupt entry and exit
Execution information of instructions: e.g. "6 instructions executed, 1 skipped due to condition codes". The viewer software has to have a firmware image to know what the actual instruction was.
Target address of indirect branches, i.e. branches where the address is not known at compile time.

The ETM trace can also be configured to:

Pause the CPU when trace FIFO is full. This is usually what you want, because if you are tracing at ETM level then you probably don't like having gaps in the trace.
Report every branch address. This slows down the trace and is not needed when a firmware image is available.
Report data values. This could be quite useful, but is not supported by the Cortex-M controllers.

The ETM trace format is quite efficient. Usually you can trace at close to real speed if your TRACESWO pin is at the same frequency as your CPU. However, with a cheap 24 MSps logic analyzer, capture is only possible of up to 8 MHz signals. The CPU stalling (pausing) functionality then becomes useful to capture full traces at the cost of slowing down the program.

Above is a four-instruction piece of code from my example program. I hope that the display format is easy to understand: it shows the program addresses, the instructions executed and the source code line and function that they correspond to.

But what is actually happening there? The first instruction clears the PC9 pin. Why is it already cleared in the left edge of the graph? This is one of the shortcomings of the ETM trace as compared to ITM - there is a significant delay between the event and the trace, typically around 10 microseconds when tracing at 8 MHz. One has to compensate for this manually when reading the graph.

The three other instructions correspond to ETM_SetupMode(); call, which ends the ETM trace. Because the ETM trace slows down the program, it often makes sense to only enable it for specific parts of the code. This could be done through configuring code memory addresses to ETM, but often it is just easier to insert function calls to the code.

What is not shown is branches and non-executed instructions. They are pretty straightforward also, and the code shown of course reacts to the branches that are taken or not taken.

TPIU: the link to the outside world

There is still one component standing between the CPU and our logic analyzer: the Trace Port Interface Unit. This block formats the data into 16-byte packets, and also interleaves the ETM and ITM traces if both are enabled. However in practice the ETM trace generates so much data that ITM trace events get lost when ETM is enabled.

The TPIU can support 1-4 bit output bus for the trace data. Currently my sigrok decoders only support the 1-bit UART format, which in my opinion is the easiest to capture with a simple logic analyzer.

Configuring the CPU for tracing

The CPU doesn't just start outputting a trace when it starts up - the ITM/ETM blocks have to be enabled by register writes first. It is possible to do this either through a debugger, or by directly accessing the registers in code.

The STM32 and LPC17xx microcontrollers I have tried so far all disable the debug unit, including trace blocks, until a debugger has connected. So in practice you have to connect a debugger anyway atleast once after power on reset, even if you configure tracing in code like in the example.

The trace is output on the TRACESWO pin, which may be shared with GPIO on some chips. It is also typically shared with the JTAG bus, which means that you have to use a SWD debugger when tracing.

Example code and documentation

You can find an example project here. It is designed to run out-of-the-box on STM32 VL discovery, where the trace data comes out of PB3 pin. The STM32F100 on the cheapest discovery board however only has ITM tracing, so if you want to try ETM, you need STM32F105 or other higher-end controller.

The sigrok decoders have been merged to the main git repo and should be included in nightly builds. If you want some test files while waiting for your FX2 to arrive, look at the sigrok-dumps repo. There is also some documentation on the wiki: ITM, ETM.

How to decode the traces

As an example, download the sigrok-dumps files. You will find an example file in the folder arm_trace/stm32f105/trace_example.sr. Open it in PulseView.

You should see the trace data on line 0. Do the following:

From the toolbar menu, add UART decoder, select RX as 'SWO' and set baudrate to 8000000.
Select "Stack decoder" and "ARM TPIU". Let the TPIU stream be "1" for ITM.
Select "Stack decoder" and "ARM ITM". If you want, you can type in the full path to trace_example.elf to get the function names displayed.

Then again for ETM trace:

From the toolbar menu, add UART decoder, select RX as 'SWO' and set baudrate to 8000000.
Select "Stack decoder" and "ARM TPIU". Let the TPIU stream be "2" for ETM.
Select "Stack decoder" and "ARM ETMv3". You probably want to type in the full path to trace_example.elf to get the instructions displayed.

Or in video form:

If your browser does not support WebM for video, you can either watch on YouTube or download the video.

– Petteri Aimonen on 4.3.2015

Comment on this page (5 comments so far)