Firmware

Think Radio’s Guide to Good Firmware Design
Bootloaders, Reflashes, Device Firmware Upgrades etc.

Nowadays, you should be concerned about releasing a product without a method of correcting defects in the firmware. Pretty much a worst case scenario is, once products are actually in the hands of customers, a significant defect becomes known.

The costs of replacing the product may be unmanageable, let alone if a safety-related issue emerges. If the product doesn’t contain a bootloader, then there may be no alternatives to either recalling and replacing defective products, or suffering the economic and reputational damage of not doing so.

Firmware upgrades can’t solve all problems, but if the product doesn’t contain a bootloader, there may be no possibility of correcting the firmware once it is in the hands of the customer.

Encryption / Security

Almost every product requires it now, and fortunately things like AES (Advanced Encryption System) hardware is now commonplace. However, as they say, implementing some sort of encryption is easy, but making it truly secure, isn’t.

There are countless ways of unintentionally rendering a “security” system insecure. From storing network passwords in plain text (where they could easily be read with a debugger) in a low-cost, disposable networked device, to making rushed late-in-the-day design changes for the convenience of a particular customer, such as reducing entropy to the point of being no more than superficially tamper resistant. Perhaps more than any other aspect of the total product design, security needs to be fully worked-out in a advance, and not half-heartedly worked out later, in a way heavily constrained by ill-considered early design decisions.

For more serious applications (e.g. that handle real money) it may be necessary to consider a cryptoprocessor to provide resistance to the more serious and determined attempts to compromise your product, such as attempting to extract the cryptographic keys or read-out and disassemble the firmware, or even probe the wafer.

Interrupt Handling (ISR’s)

Well designed interrupt handlers (aka Interrupt Service Routines ISR’s) are a core essential of good firmware design. An interrupt is a section of code that is triggered to run, either by an external event event such as a button press or received data over comms, or by an internal events such as a timer completing (counting down to zero).

This section of code may run absolutely exclusively (nothing else can interrupt it), or it can only be interrupted by a higher-priority interrupt. The former is the simpler, and easier to design. All interrupts, when triggered, either interrupt the currently executing code in the main event loop, or rouse the device from sleep. The main event loop must be designed with this in mind, and be exceptionally careful with any data objects shared between the main loop and any interrupts, as that data may change at any time.

The interrupt handler should do absolutely the bare minimum processing required (such as reading data from real external devices, and copying it to a buffer), as it is unaware of any delay that may be occurring to any other pending processing tasks. It’s usual to have an extremely simple system of flags, so that an interrupt can signal to the main event loop that the interrupt is complete, and what is usually the next stage, an event handler, should act on the changes made by the interrupt handler.

The root cause of odd, erratic and rarely occurring bugs are often found to be an unintentional side-effect of the coincidence of a certain combination of interrupts being triggered, and the particular events being handled in the main event loop.

The Event Loop (and not busy waiting)

The correct solution is a main event loop, where the last task is the idle task, if there are no other pending events. It’s where the routine smooth transition to sleep should occur (other than emergency sleep on a critically low battery event).

(yes, I really do occasionally see busy waiting built into embedded code!)

Conversely the first event in the event loop (the list order implies priority) should be to check the battery (if the battery is critically low, there should be the shortest possible interval between a wake-up event, such as the user repeatedly pummelling the power-on button, and returning to sleep).

Watchdogs

Watchdogs are very simple things. Once started, the software must reset the watchdog regularly (called kicking, or patting, the watchdog), or the watchdog forces the system to reboot. They are a distinct piece of hardware in the microcontroller, rather than a software timer. Ideally they should never be triggered, but do provide a last resort mechanism where rarely-occurring but serious bugs are present.

Bad Design.

Ignoring compilation warnings.
No, low or incorrect testing regime.
Busy waiting (consumes unnecessary power).
Misuse of interrupt handlers (does more than the essential minimum).
Build system dependencies (not everything that should be rebuilt is, some things are rebuilt unnecessarily).
Battery management errors (unexpected loss of power, and data loss, even fire hazards from severe errors).
Lack of effective diagnostics and tracing.