Secure boot does not compose well

Composable security properties of a system are great: they allow you to define a security property locally on a subsystem and then this property will hold in whatever environment this subsystem is placed. This greatly simplifies the analysis, as you can deal with it chunk by chunk instead of having to tackle the whole system all at once.

Unfortunately, secure boot is not an easily composable property. This is a pretty straightforward result familiar to anyone that has spent an evening thinking about it. Let’s dive into it.

Secure boot basics #

What is secure boot? The usual understanding is this: a system has secure boot if once it boots it is on a known-good state. (This isn’t a mathematically rigorous definition, but we’ll live with it.) This property is, of course, implicitly tied to the system boundary (typically where untrusted data crosses).

As an example, say your system is a smart watch. The main processor implements secure boot, for example by authenticating the firmware it boots. (How the processor does this is another question we’re not dealing with here.) Once this processor boots, it is in a known-good state since it only booted authentic firmware. This processor is ready to process untrusted data from the outer world.

              system S1
             ┌────────────────┐
outer world  │                │
        ◄───►│   processor    │
             │                │
             └────────────────┘

Why we want secure boot? So far so good. The processor may process some untrusted data that triggers a run-time vulnerability. This isn’t great. Secure boot doesn’t help you here at all. The main value of secure boot is to make attacker’s persistence much harder. This ultimately helps survive a compromise. If someone breaks into S, they’ll be kicked out on next boot (by definition of secure boot). That is very valuable. If the attacker wants to persist, they must repeat the attack on every boot and persist outside of the boundary. This has consequences:

it (hopefully) disrupts the attacker’s economics. The need of persisting outside of the boundary may make the attack not scalable. (Roughly, the marginal cost of attacking the N+1 unit makes the whole attack venture unprofitable.) For example, in the previous clock case, it might imply that the attack requires physical proximity every time the attacker wants to do something bad to your fancy watch.
it may not be possible at all to persist outside that boundary, or it may be more expensive, or it may be way easier to detect.

This is how secure boot helps you: it kicks the attacker out on every boot.

What can go wrong composing systems? #

Naturally, when you change that boundary, the secure boot property may no longer hold. Continuing with our previous example, say we want to add WiFi connectivity to our smart watch. For this, we add an additional adjacent processor B that implements the WiFi functionality. We can model this case by the following system, slightly more complex than the previous one:

              system S2
             ┌───────────────────────────────────────────────────────┐
             │                                                       │
             │       ┌───────────────┐       ┌───────────────┐       │
outer world  │       │               │       │               │       │
      ◄──────┼──────►│  processor B  │◄─────►│  processor A  │       │
             │       │               │       │               │       │
             │       └───────────────┘       └───────────────┘       │
             │                                                       │
             └───────────────────────────────────────────────────────┘

Assume that B is an MCU without secure boot. (After all, all the “security critical” functionality lives in A.) Assume there’s an exploitable vulnerability in B. Since there’s no secure boot, that means the attacker can persist in B. They can arbitrarily modify B’s firmware. In particular, they can use B as a launchpad to attack A, at runtime, on every boot. This does not strictly violate A’s secure boot property, but the upshot is that A will be compromised on every boot. This is indistinguishable from persistence!

This is a cute example of this phenomena: secure boot does not necessarily compose well. Let’s think about this more thorough.

Composing rules #

We can reason about the following rules:

When the system is composed of all secure-boot enabled components, the system as a whole has secure boot. (This is a very rough rule. In reality we need to be precise: we need to define clearly the boundary, we need to assume legitimate FW from A doesn’t try to compromise legitimate FW from B, we need to ensure all components boot at the same time, etc…)
As soon as at least one processor does not have secure boot, the system cannot “automatically” claim secure boot.

Practical design principles #

Rule 2 is too strict in real life. In a moderately complex system, it’s not realistic to demand all 100% components have secure boot. (For example, if you’re an OEM, there might be peripherals with MCUs with opaque (and likely shitty!) FW beyond your control – for example screen controllers, network cards, etc)

We need to relax a bit and be pragmatic. Things that can help:

capability of the non-secure boot device B: does it have enough memory/power to host an attack against A?
connectivity: is B well connected to A?
what’s A’s attack surface to B? Can you put some extra eyes in that code to mitigate the risk of a run-time compromise of A from B? Is it parsing packets? How? Does it make sense to trigger a reboot after handling this?

Sometimes it’s helpful to think secure boot isn’t a binary property…

Recovery from a compromise #

This is always useful to think about:

in the event of a B compromise: can you still push an update to A?
can B ever block A’s network path to prevent A from getting FW updates? Can B tamper A’s “last FW version available” packets? I’m assuming A has some freshness guarantees on this information… :-)
can A update B? Ideally A can control B’s firmware

Takeaways #

secure boot should not be considered in total isolation component by component, but is a system property, tied to a boundary. The interconnection between secure boot-enabled processors is important. This means, secure boot doesn’t compose automagically
roughly, if there’s any component that doesn’t have secure boot, the whole system cannot claim secure boot
even if this is a “theoretical” result, we have ways to qualify this result, live with it and live happily with imperfect systems.

Instances of this situation #

tbd