PN7120 does not respond on the I2C bus

NXP’s PN7120 is NFC controller for contactless communication at 13.56MHz. It interfaces with the host CPU via the I2C bus. The I2C slave address of  PN7120 is 0b010100Lx, where L is a configurable LSB of the address (by pin B2, I2CADDR0), and x is the standard I2C R/W bit. Hence the normal I2C address is either 0x28 or 0x29:

// write to device at address 0x28, R/W=Write
i2c_master_write_byte(cmd, (0x28 << 1) | WRITE_BIT, ACK_CHECK_EN);

According to UM10819 User Manual, Fig. 23 on page 35, the first NCI command after reset must be the CORE_RESET_CMD, which is followed by an appropriate response from PN7120.

The CORE_RESET_CMD is simply this byte sequence:

const uint8_t nci_core_reset_cmd[] = { 0x20, 0x00, 0x01, 0x01 };

What they don’t tell us in the datasheet (or else I could not find it): there is a boot timeout cca 990ms after the reset is removed (pin VEN=1) within which the CORE_RESET_CMD must be recevied. If it is not received, the PN7120 goes in a strange power-down mode and will no longer respond on the I2C bus! Only a new hw reset via VEN can bring it out of the power-down.

This picture provides updated boot state diagram of PN7120:

 

CORE_RESET_CMD on the I2C bus:

Simply no repospone on I2C address 0x28 if there was no CORE_RESET_CMD within 990ms after PN7120 reset:

Our hardware: ESP32 and Mikroelektronika’s NFC Click.

The Tale of a GCC Upgrade

Once upon a time, I decided it is a good time to upgrade the version of gcc compiler we use in our ARM926-based firmware project. Historically, this project uses gcc version 4.6, which is quite outdated nowadays (January 2018). In a different project, built around ARM-Cortex, we already successfully use the gcc version 5.4.1 (Linaro release in January 2017), so I wanted to upgrade our ARM926 project to this newer gcc version.

But some things never go so smoothly as one may imagine at the beginning… 🙂

Issue #1

The first problem we encounter is a simple assembly syntax error:

Error: .size expression for firstLoader.reset does not evaluate to a constant

For some reason, our original assembler code used ‘.’ and ‘_’ interchengeably:

.size firstLoader.reset, . - firstLoader.reset

The correct code must be:

.size firstLoader_reset, . - firstLoader_reset

Issue #2

Now the compilation finishes without warnings, and the final ELF file is produced just fine. Yet, when it is loaded into the embedded system via Lauterbach debugger, there is a strange error during memory verification:

 Hmm… Removing the option /VERIFY from LOAD gets us past the error, however, assembly code displayed by debugger is quite wrong. As shown below, the debugger thinks that the main() starts with ‘nop’ and ‘bx r14’ (subroutine call), which is completely bogus:

It seems as if all symbols are shifted around, and nothing quite matches up. Running the code comes to a crash somewhere in the startup.

Normal disassembly via ‘objdump -s’ shows that itcm_code has an offset: it should start at physical address 0x0000, but it starts from 0x0040:

Disassembly of section .itcm:

00000028 <__start_itcm_code__-0x18>:
...

00000040 <__start_itcm_code__>:
...

00000140 <secondLoader>:
secondLoader:

It is not clear at all why this happens…

Finally, a full disassembly of all sections in the ELF file, including the data sections, reveals where the problem lies:

Disassembly of section .note.gnu.build-id:

00000000 <.note.gnu.build-id>:
0: 00000004 andeq r0, r0, r4
4: 00000014 andeq r0, r0, r4, lsl r0
8: 00000003 andeq r0, r0, r3
c: 00554e47 subseq r4, r5, r7, asr #28
10: 8ede0a2f vfnmshi.f32 s1, s28, s31
14: 72934257 addsvc r4, r3, #1879048197 ; 0x70000005
18: 2fb63d30 svccs 0x00b63d30
1c: a3a01a65 movge r1, #413696 ; 0x65000
20: 12fa8758 rscsne r8, sl, #88, 14 ; 0x1600000

Disassembly of section .itcm:

00000028 <__start_itcm_code__-0x18>:
...

00000040 <__start_itcm_code__>:
...

00000140 <secondLoader>:
140: e59f030c ldr r0, [pc, #780] ; 454 <secondLoader_never+0x10>
144: e3e024ff mvn r2, #-16777216 ; 0xff000000
148: e5802000 str r2, [r0]

The new compiler toolchain inserts a special unique build-ID to the output binary, and if it is explicitly not told where to place it, it ends up in the worst place possible – before the start code and right in the interrupt vector table.

There are at least two solutions to this: 1) use gcc option ‘-Wl,–build-id=none’ to disable the feature completely, or 2) put the build ID in the ELF file at a place where it makes no harm, e.g. after the end of code. This is done by adding the following instructions in a suitable place into the linker script:

.note : {
   KEEP (*(.note*))
} > SDRAM

After the linker script is fixed, the software starts and runs just fine, but strangely only up to the first interrupt…

 Issue #3

We have already fixed two problems: the assembly syntax problem, and the linker build-ID problem. The third problem is the most puzzling one: the program starts and runs correctly, but after the first interrupt is served, no further interrupts are ever served. The processor happily continues running the main application code, but no ISRs are ever entered again. Yet the identical code compiled with the old compiler 4.6 works perfectly. What could be a problem now?

This is execution state before the first interrupt is raised:

This is inside interrupt service route, serving the first IRQ:

The ISR finishes its work, and returns back to the application code:

After that, no further interrupts are ever served…

Perhaps you see the problem right away in the screenshots, but it took me half a day to spot it.

Look in the last picture in the right window labelled ‘REGISTERS’. The left-most column in the REGISTERS window shows from top to bottom the processor flags (N, Z , C, …) and after that the processor state. It shows the processor is running in the… IRQ state! Yet the program counter is already back in the main application code.

Let’s see now how the ISR function is defined in our C code:

volatile t_int main_irq_handler(void) __attribute__ ((interrupt("IRQ"))) __attribute__ ((section(".isr")));

The prototype already has the __attribute__((interrupt(“IRQ”))), and anyway it works in the old gcc-4.6 quite well, so where’s the issue? Let’s compare codes generated by both compilers. This is the ISR function epilogue of main_irq_handler as generated by the good old gcc-4.6:

618: e1a00003 mov r0, r3
61c: e24bd010 sub sp, fp, #16
620: e8fd800f ldm sp!, {r0, r1, r2, r3, pc}^

And now the same epilogue code, but from gcc-5.4.1:

8ac: e1a00003 mov r0, r3
8b0: e24bd018 sub sp, fp, #24
8b4: e8bd980f pop {r0, r1, r2, r3, fp, ip, pc}

Ok, different compilers generate slightly different codes. But what is the peculiar symbol ‘^’ in the good gcc’s version? Google search of ‘arm ldm’ quickly finds the answer:

So the character ‘^’ is a special suffix that must be used exactly at the return from ISR. But the new code generated by gcc-5.4.1 uses the normal return with ‘pop’!

Now it is becoming clear why only the first interrupt request is ever served. Due to a software bug in our ISR epilogue the processor does not recognize that ISR has ended. The CPU remains in the IRQ state even after a formal return to the main application code, and thus it cannot receive any further IRQs.

At this point, it is a good idea to simply test if another compiler works better. And yes, in gcc-7.2.1, which is the current newest Linaro release in January 2018, the code works yet again well. We evidently have hit a bug in gcc-5.4.1 in its ARM926 ISR code generation path.

Summary

In conclusion, what I have learned:

  1. assembly syntax is not rock-stable, and it changes over the time,
  2. default linker behaviour changes over the time,
  3. compiler bugs exist, but they are subtle, and can bite unexpectedly.

 

Did you know: Unaligned accesses in ARM

The X86 has always supported unaligned accesses. In the ARM world the first architecture that supported unaligned accesses in hardware was ARMv6. The architecture was implemented in the ARM11 core around the year 2002 and onward. There is an excellent article at ARM Infocenter giving technical details:

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka15414.html

The support for unaligned accesses must be enabled in an ARM core explicitly. This is done by setting the bit A in the register SCTLR. Still, unaligned accesses will be allowed only on Normal memory; accesses to  Device memory type are always checked and will throw exceptions on misaligned accesses.

I had to verify that a Cortex-A53 core (ARMv8) correctly implements the support for unaligned accesses. In the beginning this task seemed very simple as only the bit SCTRL.A had to be set, I thought. However, the hidden issue is that all memory is treated as the Device memory type by default! Memory type specification is part of MMU page tables. Each block or page descriptor has a 3-bit field called Attribute Index. This is an index into the Memory Attribute Indirection Register (MAIR), which holds eight descriptors of the memory types used in the system (e.g. normal cacheable, normal non-cacheable, device, etc.). The operating system must know the system memory map and the caching requirements; therefore it can maintain virtual memory tables with correct attributes.

In the end to implement my test case, I had to implement paging tables with a flat mapping of VA to PA, setting the memory types on RAM blocks as needed.

Connecting MCU and FPGA at 100Mbit/s Using Ethernet RMII [Part 2]

This is Part 2 of a two-part series on Ethernet RMII. In Part 1 I described my hardware setup and basic Ethernet operation. In the second and final part I will describe the design of specialized MAC cores I implemented on FPGA, and there will be measurements to see how much throughput and latency the system can achieve.

Continue reading “Connecting MCU and FPGA at 100Mbit/s Using Ethernet RMII [Part 2]”

Connecting MCU and FPGA at 100Mbit/s Using Ethernet RMII [Part 1]

This is Part 1 of the two-part series on Ethernet RMII. Part 2 is also available.

Imagine your application requires a non-standard periphery controlled by an embedded processor. What options do you have? The periphery can be implemented in an FPGA; depending on periphery complexity you can choose an optimal FPGA that fits your budget. Where the processor goes? There are three possibilities: (a) inside FPGA as a soft-core → it will increase the cost of FPGA (larger type needed) and complicate HDL and software design. Or (b) inside FPGA as a hard-core → a nice compact solution and quite possible with heterogeneous FPGA from Xilinx (Zynq) and Altera (SoC). But the cost of these modern devices could still be too high for price sensitive applications. You must fit both your software and HDL to pre-engineered combinations of FPGA and ARM CPU sizes (perhaps a small Cortex-M core would suffice but you must pay for a gigahertz-class Cortex A cores).

The third option (c) is using a stand-alone MCU (maybe even not an ARM) and a standard FPGA. How do you connect them? You are limited to interfaces offered by the MCU. In modern low-end MCUs (by that I mean smaller STM32Fxxx devices) you have I2C (400 kbit/s), UART (115 kbit/s), SPI (~10Mbit/s), Fast Ethernet (100 Mbit/s). So what about the Ethernet core in the MCU? Could it be used to interface with FPGA? Sure it can!

Continue reading “Connecting MCU and FPGA at 100Mbit/s Using Ethernet RMII [Part 1]”

A Fistful of Radios

During pre-Christmas sale on Seeed Studio Bazaar  they offered these digital radio modules with the nRF24L01+ chip for only US$0.81 each. So I bought 10 of them outright 🙂

Fistful of radios

What would YOU  suggest to do with them? Build a wireless flower life-support monitoring network? A mobile voice communications radio system? Retrofit them into talking toasters and robotic vacuum cleaners? Let me know!

Error: jtag status contains invalid mode value – communication failure = SOLVED!

This issue bugged me a long time, finally I solved it this evening. Debugging code on my PIP-Watch using my ST-LINK-v2 JTAG debugger was very painful because the debugger software — OpenOCD and GDB — kept failing randomly during debug sessions with a rather cryptic message:

Error: jtag status contains invalid mode value - communication failure
Polling target stm32f1x.cpu failed, GDB will be halted. Polling again in 100ms

I scratched my head, updated firmware in ST-Link, looked at JTAG/SWDIO signals using a scope… But nothing helped.

Finally I found this message in a discussion forum. The problem is in the low-power mode! When CPU core clock is halted the debugger connection fails and debug session is halted.

I am using low-power mode to halt CPU clock when OS is idle – in vApplicationIdleHook() function I have the __WFI() – wait for interrupt – intrinsic function.

The solution is either to entirely disable low-power modes, or allow low-power debugging in the DBGMCU_CR register:

1
DBGMCU_Config(DBGMCU_SLEEP | DBGMCU_STOP | DBGMCU_STANDBY, ENABLE);
DBGMCU_Config(DBGMCU_SLEEP | DBGMCU_STOP | DBGMCU_STANDBY, ENABLE);

This code must be executed during CPU initialization before any low-power mode is first activated. After this, debugger connections will be kept even in sleep, standby and stop modes. The problem is gone!