Oscar Reparaz

ANSSI x509 cert parser

2025-03-21T00:00:00+00:00

This is a fantastic piece of software:

Arnaud Ebalard. x509-parser: a RTE-free X.509 parser. https://github.com/ANSSI-FR/x509-parser

This implements a X.509 parser, with profuse ACSL annotations so that the whole parser is verifiable with frama-c

This guarantees the parser is free from run-time errors: invalid memory accesses, signed integer overflows, undefined behavior)
It is suitable for embedded devices (no malloc, no floats, no dependencies)
The current ACSL annotations cannot prove the code adheres to some specification (correctness) but nevertheless it’s a great piece of engineering
It is a good resource to learn frama-c and ACSL in a non-trivial project.

This is an excerpt from the paper:

References:

Arnaud Ebalard, Patricia Mouy, and Ryad Benadjila. Journey to a RTE-free X.509 parser. pretty didactical paper: https://www.sstic.org/media/SSTIC2019/SSTIC-actes/journey-to-a-rte-free-x509-parser/SSTIC2019-Article-journey-to-a-rte-free-x509-parser-ebalard_mouy_benadjila_3cUxSCv.pdf
Presentation slides: https://www.sstic.org/media/SSTIC2019/SSTIC-actes/journey-to-a-rte-free-x509-parser/SSTIC2019-Slides-journey-to-a-rte-free-x509-parser-ebalard_mouy_benadjila.pdf
Presentation at SSTIC 2019, with video (in french) https://www.sstic.org/2019/presentation/journey-to-a-rte-free-x509-parser/
Code: https://github.com/ANSSI-FR/x509-parser

Paper: https://iacr.org/submit/files/slides/2023/rwc/rwc2023/46/slides.pdf

Other projects from ANSSI:

libdrbg (no frama-c verification) https://github.com/ANSSI-FR/libdrbg
libecc (well engineered) https://github.com/libecc/libecc

Omron blood pressure monitor: reading the internal EEPROM

2023-12-28T00:00:00+00:00

This note explains how to read the internal EEPROM of an Omron Upper Arm Blood Pressure Monitor 3 Series (model BP710N) to achieve interoperability. You’ll be able to read your blood pressure measurements with your own microcontroller.

Hardware

This is the hardware we’re dealing with. It sells for about $40 on Amazon in the U.S. as of 2019. There are more expensive models from the same manufacturer that have Bluetooth connectivity. Here we are dealing with the simplest model: no Bluetooth, no WiFi, no USB.

PCB

This is a picture of the main board. We can identify some components:

the main Toshiba processor (IC1 in the picture) is in a LQFP64 package. Markings on the chip: 1904 HAL, T5DE 1UG, 916549. I couldn’t find any datasheet.
the small IC on the right (IC5) is an 4-kbit I2C EEPROM (markings on the chip: 4G08 82753). This is a 3.3 V EEPROM and behaves like a 24C04.

The I2C bus for the EEPROM is conveniently exposed as two through-holes vias. This is annotated in the picture. It is very easy to solder this bus to your favorite microcontroller. Your extra microcontroller can drive the EEPROM (I2C is a multi-master bus).

High-level behavior

After the unit completes measuring your blood pressure (which takes about a minute), the microcontroller stores the whole measurement into the EEPROM. This is done without any user interaction. The EEPROM memory is large enough to store the last 14 measurements.

EEPROM memory map

This section provides the information necessary to achieve interoperability:

The EEPROM is 512 bytes long and holds 14 files.
Each file is 14 bytes long and stores a single measurement.
The first file starts at offset 0xAC, the second one starts at 0xBA, and so on till the last file that starts at 0x112.
Each file stores:
- 1 byte for the systolic pressure, stored as (measured_systolic_mmHg - 25). For example, a systolic pressure of 125 mmHg will be stored as 0x64. (This encoding thus can represent values from 25 mmHg to 280 mmHG.)
- 1 byte for the diastolic pressure in mmHg units. For example, a diastolic pressure of 80 mmHg will be stored as 0x50.
- 1 byte for the pulse measurement in BPM “units” (min^-1). For example, a pulse of 53 bpm will be stored as 0x35.
- 5 bytes that appear to be nearly constant. In my unit these are 0E 20 04 3F 10.
- 6 more bytes that I don’t understand what they do. The two last look pretty random / uniformly distributed and could be a CRC, although I didn’t dig deeper.
There are other interesting addresses:
- Address 0x60 stores the “last measurement file index”. It is a pointer to the measurement that was last written to EEPROM. The mapping “pointer value → file offset” is pretty regular with an exception for the pointer value 0x00:
```
   mem[addr=0x60] -> file offset
   -----------------------------
   0x01 -> 0xAC (0xAC + 14*0)
   0x02 -> 0xBA (0xAC + 14*1)
   0x03 -> 0xC8 (0xAC + 14*2)
   0x04 -> 0xD6 (0xAC + 14*3)
        ... regular ...
   0x0C -> 0x146 (0xAC + 14*11)
   0x0D -> 0x154 (0xAC + 14*12)
   0x00 -> 0x162 (0xAC + 14*13) // last one, NB: discontinuity!
```
- Address 0x04 and 0x05 store the total measurement count in little endian. (This is duplicated in addresses 0x06-0x07.) This counter advances even if the measurement is bad. Bad measurements aren’t written to a file.

How to figure this out for yourself. The easiest is to dump the EEPROM contents before and after a measurement and study how the contents change. You can read this EEPROM with any 24C04 driver. I had success with https://github.com/nopnop2002/esp-idf-24c . The Toshiba microcontroller talks to the EEPROM at 320 kbit/s (but of course you can talk to it at a slower bitrate). For example, after a measurement the following EEPROM contents changed:

the pointer to last measurement (stored at 0x60) advanced from 0x09 to 0x0A
the file starting at address 0x12A (corresponding to 0x0A) changed
the first 3 bytes of the file starting at 0x12A are 6f 45 36 which mean:
- 0x6f: systolic pressure of 0x6f+25 = 136 mmHg
- 0x45: diastolic pressure of 0x45 = 69 mmHg
- 0x36: pulse of 54 bpm
the contents at 0x04 advanced by one (total number of measurements).

Other references. The unit I have has the following markings: MODEL: BP710N, REF HEM-7121-Z. As of 2024 it looks like Omron is selling an updated model BP7100. I don’t know how the BP7100 looks like inside. Chances are that it is very similar.

Warning. You are on your own. Any modification will likely void your warranty. THE INSTRUCTIONS ARE PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE INSTRUCTIONS.

A cryptographic desk clock: agreement protocols

2022-09-03T00:00:00+00:00

Check out the different parts:

Part I: protocol for a cryptographic desk clock
Part II: time reference
Part II: agreement protocols (this page)
… sometime in the future … client implementation

Say you have 3 imperfect clock measurements. They do not tell the exact same time. Maybe they are drifting away, or to put things worse maybe one is totally broken. Here we look at this basic problem: given several clock measurements, how do you establish time? We’ll see three fundamental works in fault-tolerant agreement protocols.

Agreement

Establishing a notion of agreement in discrete variables in easy. For example, we can vote to decide where a group should go have dinner (= quorum-based technique). Or we can say we trust certain message when at least a certain threshold of parties trust it (and express this via cryptographic signatures). This works nicely. We can build very robust systems with this approach, a la triple-modular redundancy. Commercial planes land autonomously hundreds of people safely thanks to this.

Establishing agreement in continuous variables is a bit harder. After all, what are outliers? Five-sigma deviation from the mean? Why not $42 \sigma$? And how do you take into account the measurement confidence? Or the sensor accuracy?

Never go to sea with two chronometers; take one or three

Marzullo 1983

Keith Marzullo proposed a synchronization algorithm based on interval intersection. Marzullo algorithm provides accurate time synchronization, but does not really deal with faulty clocks. This is the basic idea:

NTP uses Marzullo’s algorithm as inspiration.

Lamport 1987

Leslie Lamport wrote in 1987 the absolutely delightful Synchronizing Time Servers.

The main advantage is that Lamport’s technique can simultaneously deal with faulty clocks and provide similar time to nodes that are relatively close:

Lamport noticed there was something off with Marzullo’s approach. Lamport expresses with astonishing clarity that two nodes that receive a slightly different view of the same clocks may end up with very different end result. This is represented in this picture:

In the context of distributed systems, this is important. You’d expect “close” nodes to have a “similar” view of the time. Technically, this is because the agreement function is not ``Lipschitz continuous’’

The solution Lamport provides is essentially averaging midpoints when the outliers are thrown away. In precise terms:

Drawback: Lamport could not give an “optimal” solution for this. (It is telling the honesty of Lamport, hardly seen in other authors.)

Schmid and Schossmaier 1999

Fast forward, 20 something years later, Schmid and Schossmaier solved Lamport’s concern in their paper How to Reconcile Fault-Tolerant Interval Intersection with the Lipschitz Condition.

They define a very simple agreement function that is Lipschitz continuous, and that takes into account all available information:

This is a comparison of Schmid and Schossmaier vs Lamport function showing Lipschitz continuity in action:

A cryptographic desk clock: time reference

2022-08-05T00:00:00+00:00

Check out the different parts:

Part I: protocol for a cryptographic desk clock
Part II: time reference (this page)
Part II: agreement protocols
… sometime in the future … client implementation

Here we will talk about the roughtime time server, and touch a bit on the architecture and security properties. The time reference itself (the time server) is this beauty mess of wires, currently living behind a couch. It has a small cryptographic processor, a GPS and a raspberry pi. This reference propagates the House Standard Time via WiFi.

Server architecture

Time to build! We partition the time server into two clearly distinct blocks:

trusted domain: comprised of a GPS module plus a small RTOS-based microcontroller. Here we implement the following functionality: parsing GPS module NMEA data, key storage, generate cryptographic signatures
untrusted domain: a raspberry pi running a golang binary implementing the following functionality: talk to network peers over UDP, construct requests to the trusted domain for signing bundling multiple requests (Merkle tree construction)

Why? The whole point of partitioning this way is to minimize attack surface and compact all the security-critical functionality in a small block. This partition is in a sense optimal: the trusted domain does the minimum amount of work needed, minimizing the trusted codebase. All networking code is offloaded to the raspberry (none of which is security critical), all security-sensitive is in the trusted domain. No secrets live in the raspberry. The worst possible impact of a compromised untrusted domain is availability.

The interface between those two blocks looks essentially like this (less important fields omitted):

typedef struct signing_request_t {
  uint8_t merkle_root[32]; // (merkle-)hash of all client nonces
};

typedef struct signing_response_t {
  uint64_t midpoint; // time at the moment of signing
  uint8_t signature[64]; // essentially, signature over merkle_root + midpoint
};

Note that data in signing_request can be 100% attacker controlled — this is fine and actually a scenario we accept. This means the raspi can be 100% popped (the “only” consequence being reduced availability). Time is not chosen by the untrusted domain, the time reference (the GNSS) is directly fed into the trusted domain. The communication between those two domains is a simple serial line, with no framer (yolo), and fixed size packets (difficult, but not impossible, to screw up). The raspi cannot reflash the microcontroller; a human has to plug a programmer to the microcontroller.

A cryptographic desk clock: time transfer protocol

2022-08-01T00:00:00+00:00

Check out the different parts:

Part I: protocol for a cryptographic desk clock (this page)
Part II: time reference
Part II: agreement protocols
… sometime in the future … client implementation

Slightly annoying fact: clocks are normally synced with a protocol (NTP - Network Time Protocol) that provides zero cryptographic guarantees. On the surface this is totally fine, but when we start composing security-sensitive protocols (think certificate expiration checks, audit logs or time-bound credentials), we need to turn a blind eye…

Not today! let’s build a simple cryptographic desk clock to set the House Standard Time in proper cryptographic fashion. Here is the limited edition, mission-critical clock sitting in my home office:

Protocol

The protocol we’ll be using for clock synchronization is roughtime. This is a pretty new protocol introduced in by Adam Langley (Google). It’s pretty barebones, and simple enough you can write a client in an evening — so much fun.

roughtime is essentially a challenge-response protocol. The client sends a random 32-byte challenge to the server, which replies with a signed message containing the client challenge and the current time. (There’s some machinery to make this efficient, like packing requests in a Merkle tree, but this isn’t important now.)

Since you’re probably familiar with NTP, here are the key differences:

	Roughtime	NTP
Target time precision	Coarse: ~seconds. Fine for human consumption, certificate expiration	Excellent. Sub-second
Security model	Excellent. Architecture tolerates a few actively malicious servers, no SPOF. Network is considered untrusted: packets are cryptographically protected	Bad. Network is assumed trusted (there are some extensions for adding a secure transport layer)
Protocol maturity	Very new, although very simple and solid	Golden standard
Software/libraries availability	Bad. DIY land, mostly one-off efforts that get abandoned. Mostly targeting beefy machines, almost nothing for embedded targets	Excellent, very mature. Reference implementation has been maintained for 20 years, multiple implementations
Software/libraries quality	Good. Mostly written in memory safe languages	Vary a lot
Server ecosystem maturity	Very bad. Highly concentrated, few players. Main developer (Google) seems to have abandoned development. Cloudflare runs servers. Google is still offering this without any availability guarantees	Very good. There are global pools of NTP servers.
Who uses it? Current deployments	??? probably a few nerds, unclear	Virtually everywhere. time.apple.com, etc
Client code complexity	apart from crypto, easy	very easy if high precision isn’t required, high otherwise. Reference implementation is extremely complex 100k LoC of C
Suitable for embedded?	Suboptimal. The public-key signature scheme is ed25519, which is notoriously RAM-hungry, plus needs SHA512 code	yes
History, context, motivation	Introduced in 2016 by Adam Langley (Google) to solve Google needs. Little public involvement after initial release	Introduced around 1985 by Dave L. Mills, who continued improving it on a life-long effort
Algorithms	Marzullo, ed25519, Merkle trees	Marzullo, phase-locked loop

nonce-sanitizer: using authenticated encryption without fear

2022-07-28T00:00:00+00:00

This post describes nonce-sanitizer: a very simple tool that prevents the major screw-up everyone is scared to make: 😱 repeated nonces under the same key 😱. In short, nonce-sanitizer provides seat belts as a thin wrapper around the AEAD code that adds a hard asserd that nonces don’t repeat.

But that’s such a n00b’s mistake! If you think “I’m a good programmer, I won’t ever make that mistake”, think harder. Even the crypto grandmasters make this mistake. It can happen to anyone anytime —even during an unrelated refactor. Do you have tests that would catch this bug? Do they run automatically every time you touch code?

Working principle

nonce-sanitizer implements an AEAD interface. The input/output behavior is identical to the AEAD mode selected. (Currently, ChaCha20Poly1305 and AES-GCM.) In addition, it will check in the background if the nonce passed is sane, and bail if it isn’t. For this, nonce-sanitizer keeps the passed nonces in an internal state. The current definition of sane is: the same combination (key, nonce) hasn’t been passed before.¹

Functionality-wise, this tool internally looks something like this:

 func encryptAEAD_NonceSanitizer(key, nonce, plaintext) -> ciphertext {
    if isRepeatedNonce(key, nonce, plaintext) {
       bail();
    }
    ciphertext = encryptAEAD(key, nonce, plaintext)
 }

Developer ergonomics

To use nonce-sanitizer, you just replace the calls to AEAD encrypt with the implementation provided by nonce-sanitizer. Everything else should just work. This is 100% backwards compatible —the encryption behavior remains identical. There’s no need to bump any protocol version.

In the happy path, if your nonces behave, one you set this up, you can forget about it. So this is kind of additive security precaution
In the sad case (naughty nonces), nonce-sanitizer will bail and prevent the confidentiality loss.

There’s no need to configure anything. Performance-wise, nonce-sanitizer takes a small hit. Check if this is significant in your application.

Golang implementation

A PoC in golang is available at https://github.com/oreparaz/go-nonce-sanitizer. This implementation wraps golang.org/x/crypto/chacha20poly1305.

How to use it. The interface transparently wraps the AEAD mode, so you can use it as the AEAD mode. This is the single-line code modification needed to add nonce-sanitizer to age:

diff --git a/internal/stream/stream.go b/internal/stream/stream.go
index 7cf02c4..bc8a321 100644
--- a/internal/stream/stream.go
+++ b/internal/stream/stream.go
@@ -11,7 +11,7 @@ import (
        "fmt"
        "io"

-       "golang.org/x/crypto/chacha20poly1305"
+       "github.com/oreparaz/go-nonce-sanitizer/chacha20poly1305"
        "golang.org/x/crypto/poly1305"
 )

Raw performance. On a GCP ec2-micro instance:

for long packets (8 kB), the overhead is small (less than 10% loss of throughput)
for IP packets (1350 bytes), the overhead is about 33% less throughput
for very short packets (64 bytes), the overhead is around 3x slowdown

Raw data: output of go test -bench=.:

goos: linux
goarch: amd64
pkg: github.com/oreparaz/go-nonce-sanitizer/chacha20poly1305
cpu: Intel(R) Xeon(R) CPU @ 2.20GHz
Benchmark/Seal-WithoutNonceSanitizer-64-2     293.65 MB/s       0 B/op     0 allocs/op
Benchmark/Seal-WithNonceSanitizer-64-2        109.59 MB/s      49 B/op     2 allocs/op
Benchmark/Seal-WithoutNonceSanitizer-1350-2  1147.98 MB/s       0 B/op     0 allocs/op
Benchmark/Seal-WithNonceSanitizer-1350-2      857.35 MB/s      50 B/op     2 allocs/op
Benchmark/Seal-WithoutNonceSanitizer-8192-2  1407.33 MB/s       0 B/op     0 allocs/op
Benchmark/Seal-WithNonceSanitizer-8192-2     1323.52 MB/s      55 B/op     2 allocs/op
PASS
ok      github.com/oreparaz/go-nonce-sanitizer/chacha20poly1305      8.727s

Performance of age. In a file encryption application like age, the overhead is imperceptible to the human eye. With instrumentation, encryption of a 650 MB file takes 2.0 seconds. Without instrumentation, it takes 1.9 seconds.

# with instrumentation
$ time ./age -r age1ql3z7hjy54pw3hyww5ayyfg7zqgvc7w3j2elw8zmrj2kg5sfn9aqmcac8p < /tmp/junk > /tmp/junk.age

real	0m2.082s
user	0m0.645s
sys	0m1.001s

# without instrumentation
$ time ./age -r age1ql3z7hjy54pw3hyww5ayyfg7zqgvc7w3j2elw8zmrj2kg5sfn9aqmcac8p < /tmp/junk > /tmp/junk.age

real	0m1.969s
user	0m0.605s
sys	0m0.944s

FAQ

When should I use it? Is it good for me? If you are picking nonces, nonce-sanitizer is for you. If your library picks nonces for you, you’re fine.²

I’m using libsodium, do I have to worry? You can use a well-established library like libsodium or tink and still misuse nonces. So refer to the previous point: if you’re picking nonces, yes, you can use nonce-sanitizer.

Why should I not use nonce-sanitizer? Perhaps if you’re struggling for performance, or are very tight on RAM. But really, so many people have tripped here before, so think twice.

Will this eat all my RAM?. The internal state that stores nonces grows as more nonces are passed. This grows until a threshold is hit, and from there on old nonces are discarded. So the RAM usage is capped and won’t grow unbounded. This is a tunable parameter.

Is it going to ruin performance? There are too many different applications of AEAD to make general statements about the impact of this instrumentation. In general, nonce-sanitizer has small impact in client-side code (where an increased memory usage is tolerable), or applications that run in human time (and aren’t affected by slight increase in latency). Busy servers are trickier, but probably acceptable for debug builds/deployments.

I’m using random nonces, I don’t need this? You can still screw it up, see https://chromium-review.googlesource.com/c/chromiumos/platform/ec/+/1592990 (https://www.chromium.org/chromium-os/u2f-ecdsa-vulnerability/). Would you notice in that case?

Tell me about misuse-resistant modes. You should use them! But sometimes it’s not easy to retrofit those.

Why haven’t I heard about this before? That’s a good question. We can only speculate. This isn’t rocket science, but probably keeping track of all nonces was too expensive. But today memory in phones and computers is cheap.

Inspiration. This tool takes inspiration from memory sanitizers for non-memory safe languages. Memory sanitizers shadow every byte of memory from your program to hopefully detect memory misuse before they become a real problem. Thanks to tools like AddressSanitizer or Valgrind we can write C and not stress too much about it. Tools help us sleep well at night. (Note that nonce-sanitizer applies both to non-memory safe and memory safe languages – memory safety doesn’t have anything to do with misusing nonces.)

Limitations

The tool keeps some state in RAM. The tool won’t detect all nonce collisions since this state is pruned from time to time.

Extensions

I’d love to see nonce-sanitizer in other languages, or integrated into existing libraries. I’m not planning to work on this but ping me if you want some pointers to work on this.

Optimizations / internals

The following design decisions might be useful if you want to reimplement this:

Should I store the cryptographic key itself? You can make all the data structures secret-free by storing a hash of the key if you need it. Unsure if this is worth it if the key lives in the same memory space
Should I include plaintexts also in the map? The only reason for this is to not have false alerts where the same (key, nonce, plaintext) is passed. This isn’t a violation of AEAD usage —you’re performing exactly the same computation
What data structure should we use for storing nonces? It just should be very fast. In golang we resort to a hash map. This PoC implementation can be for sure optimized; PRs are welcome.
What prunning strategy should we use? Right now we store the last 1000 nonces, and prune the cache from time to time, so that memory usage doesn’t grow unbounded. You can use a circular buffer to have consistent performance, tbd what the sweet thresholds are.
- You could probably use distinguished points to (probabilistically) detect collisions without enormous amounts of memory. (Keeping track only of those nonces with a certain prefix.) Feels like a combined method would be a good choice: keep the last N nonces and keep the last N “distinguished” nonces.
Should we compress IVs? too fancy for a V1.0
Should we check for repeated nonces upon receiving? Not duplicating nonces is a sender responsability (as the sender will bear the consequences of the confidentiality loss), but it seems cheap enough to also do it on the receiving side. This might help spot buggy implementation on the remote peer.

There’s an exception for this: same (key, nonce) combinations are allowed if the plaintext is the same. This exception isn’t implemented yet. ↩
There are cases (like TLS 1.3) in which a buggy implementation that reuses nonces just won’t work at all (won’t be interoperable) because the protocol mandates an implicit nonce (such as a sequence number). This is a good design principle that by design makes reusing nonces very difficult. ↩

Checklist to evaluate MPC deployments

2022-07-25T00:00:00+00:00

This is a quick checklist I use when evaluating multi-party computation designs, deployments or ideas; both in industry and academia. I hope you find it useful. Two qualifiers:

While this was written mainly thinking about MPC, most of the following applies also to any fancy cryptographic technique.
Different scenarios call for different criteria. So do not take this checklist literally, but rather as an aid in your assessment process.

Assumptions

Have you documented all implicit and explicit assumptions? Good results are just implications: if some hypothesis hold, then you can do some fancy stuff. This doesn’t necessarily mean the initial assumptions hold in your particular application. Examples of common assumptions that may not hold in your case:

Trusted hardware does / does not exist. Beware of assumptions of the style “can’t use secure hardware” since that will artificially constrain your design space.
Trusted third parties are / aren’t acceptable. Don’t your customers need to trust you anyways? To what extent? For what is it OK for them to trust you? For what not? And what about a third party? How do you explain this? Pinky promise or cryptography? This analysis can be nuanced.

Implicit/tacit assumptions are all over papers. The most fundamental assumptions will certainly not appear in a research paper. For example: in a new, fancy threshold signature paper you won’t find the basic assumption that threshold signatures are useful when naive multi-signatures (multiple parties sign independently the same message) are unacceptable.

Sensitivity

What is the sensitivity of data you’re handling? Are you handling…

… other people’s personal information? Other people’s health data? Other people’s money?
… or, in contrast, are you handling other people’s emojis? Likes on cat pictures?

Those two categories have very different risk appetite profiles. You get the idea.

What is your rigor level needed? Determine this based on the nature of your data. Would you be comfortable explaining your design decisions after something goes wrong? Also, consider the person writing the paper has a very different risk profile than the person deploying to production. The #1 goal for The Person Writing The Paper is getting it published in a good conference. The potential downside is negligible for them.

Maturity

Does the scheme maturity match your data sensitivity? Loosing other people’s money has a very different impact than leaking what’s the preferred emoji of English-speaking iPhone users. You can probably run cutting-edge stuff in the latter problem.

Evaluating a technology maturity isn’t easy. Here are some tips:

Peer review means peer review and nothing more. Not exhaustive audit, not comprehensive review. Peer review often means an expert reviewing the content for couple of hours.
Test of time can be meaningful or meaningless. For example: Rainbow signatures were a well-studied promising scheme, designed in 2004 and broken 18 years later by the fresh eyes of a 20-something-old fresh out of grad school. Take-away: it’s difficult to predict when the next Perelman will emerge from a cave and tell us how to quickly factor integers. We always live with this risk.
Proofs can be flawed. Provable security is a contentious topic. I personally like provable security to the extent that signals that someone had to think a bit on the scheme. Recommended the excellent series on another look on provable security.
Good and garbage conferences look identical from the outside. You need to become an expert to tell them apart. Besides, “unpublished” works can also be groundbreaking.
Twitter. It is easy to mistake popularity for maturity. For example, contrast the popularity of ARX designs with the opinion of the Keccak team on the topic:

… It is very hard to estimate the security of [ARX] primitives. […] For MD5, it took almost 15 years to be broken while the collision attacks that have finally been found can be mounted almost by hand. For SHA-1, it took 10 years to convert the theoretical attacks of around 2006 into a real collision. More recently, at the FSE 2017 conference in Tokyo, some attacks on Salsa and ChaCha were presented, which in retrospect look trivial but that remained undiscovered for many years.

Alternatives

Reevaluate the alternative, consider the naive multiple-party extension. For example, let’s say we’re evaluating whether to use a threshold signing scheme or not. &Sometimes, the straight alternative, multi-signatures (naively signing multiple times with different keys) work natively at the protocol level. This is the best case. Sometimes you can define the protocol to accept multi-signatures (for example when you are writing the firmware verification routine). The naive multi-party extension is often way waaaaaay less complex and easier to get it right than the complex MPC protocol.

Cryptography is tricky to get it right, consider the alternative where all the resources diverted to MPC were instead allocated to build a more traditional security system safe. What would you build?

If this sounds obvious to you (it is!), you’d be surprised with the staggering amount of “MPC for the sake of MPC” where the naive alternative would work just fine (but probably less useful to raise money).

Design space

Do you know what you design space is? For example: can you use secure hardware? Why not? Using secure hardware typically leads to simpler and better systems. Sometimes you just can’t use them (cost, development or accessibility problems – maybe your customers don’t have an expensive phone with a separate enclave). Sometimes you do and can afford to give a piece of hardware with a security enclave. There’s an incredible amount of secure hardware: smart-cards, yubikeys, Apple secure elements, AWS Nitro enclaves. Try to use them.

Naturally, the more restrictive your assumptions are the more complex your end design will be. And beware of security nihilism: you better trust something, typically some hardware. You need to bootstrap trust somewhere – no amount of cryptography will help you pull trust out of a hat.

Environment

Does your deployment environment match the security assumptions? A basic assumption in MPC is that the effort to compromise N parties is some function of N. If all parties are being deployed to the same environment, this assumption is broken: the cost to attack N parties (over the network) is the cost of a single attack, plus a bit of copy-paste to replicate the attack to other parties.

This copy-paste problem appears for example when the same software is deployed in several regions in AWS, possibly managed by the same team, and running the same tech stack.

For MPC to make sense, it should be deployed into an heterogeneous environment. There are fields which do this, for example aeronautics. However, this is very expensive. Triple-modular redundancy yields very safe designs, but at a very high cost.

What are the single points of security failure? Abstractions are made to make our lives easier. There is an huge amount of details left out in written publications. Make a list of them. For example: how does code get distributed to your endpoints?

How does the recovery flow look like? In any moderately complex system, you’ll have a recovery flow: reconstruct your key material from backups, or send the user a link to reset their password in case they forget it. Recovery flows are a juicy target for attackers. Analyze hard the recovery flow, do the MPC properties also extend to the recovery flow?

Wrong tool?

Sometimes, MPC is just the wrong tool for the problem. Examples:

sometimes MPC is trying to solve a UX problem at the core
other times, MPC tries to retrofit new functionality into a legacy system
cryptocurrencies have spun lots of interesting applications of MPC. Think: did this problem exist before cryptocurrencies? How were we solving it before? is it relevant today?

Sustainability

Does your team have the necessary competences to successfully maintain your MPC system throughout the entire lifecycle? One thing is translating an algorithm from a paper into code (that’s the easy part), the hardest part is maintaining that and having assurance it’s doing the right thing. For example: in order to write unit test of any MPC scheme you need to understand the scheme at a very deep level. Take for example this apparently innocuous bug in a Threshold RSA signing scheme.

Ecosystem

Is this system deployed somewhere else? Should you care? The fact that company X did it doesn’t mean you should do it. Maybe they are 20x your size, maybe they have 20x your budget, maybe they are geniuses, maybe they are clueless, maybe they did it to retain a key employee, maybe their product manager didn’t follow this checklist :-)

Apple does differential privacy. Apple has big pockets. They can afford to hire a team of very competent researchers and throw this problem at them to play with. But differential privacy is insanely complex – I know talented people that gave up on doing research on this field because it is so so hard. Apple does it, that doesn’t mean you can spark the magic dust of differential privacy everywhere.

Doing MPC just for marketing is very, very expensive. You better spend it elsewhere!

Complexity

Ultimately, is the complexity warranted, when all things are considered? Complexity is the elephant in the room in contemporary cryptography. At the end of the day, this is the tricky question – all things considered can take many forms. Good luck!

Secure boot does not compose well

2019-01-01T00:00:00+00:00

Composable security properties of a system are great: they allow you to define a security property locally on a subsystem and then this property will hold in whatever environment this subsystem is placed. This greatly simplifies the analysis, as you can deal with it chunk by chunk instead of having to tackle the whole system all at once.

Unfortunately, secure boot is not an easily composable property. This is a pretty straightforward result familiar to anyone that has spent an evening thinking about it. Let’s dive into it.

Secure boot basics

What is secure boot? The usual understanding is this: a system has secure boot if once it boots it is on a known-good state. (This isn’t a mathematically rigorous definition, but we’ll live with it.) This property is, of course, implicitly tied to the system boundary (typically where untrusted data crosses).

As an example, say your system is a smart watch. The main processor implements secure boot, for example by authenticating the firmware it boots. (How the processor does this is another question we’re not dealing with here.) Once this processor boots, it is in a known-good state since it only booted authentic firmware. This processor is ready to process untrusted data from the outer world.

              system S1
             ┌────────────────┐
outer world  │                │
        ◄───►│   processor    │
             │                │
             └────────────────┘

Why we want secure boot? So far so good. The processor may process some untrusted data that triggers a run-time vulnerability. This isn’t great. Secure boot doesn’t help you here at all. The main value of secure boot is to make attacker’s persistence much harder. This ultimately helps survive a compromise. If someone breaks into S, they’ll be kicked out on next boot (by definition of secure boot). That is very valuable. If the attacker wants to persist, they must repeat the attack on every boot and persist outside of the boundary. This has consequences:

it (hopefully) disrupts the attacker’s economics. The need of persisting outside of the boundary may make the attack not scalable. (Roughly, the marginal cost of attacking the N+1 unit makes the whole attack venture unprofitable.) For example, in the previous clock case, it might imply that the attack requires physical proximity every time the attacker wants to do something bad to your fancy watch.
it may not be possible at all to persist outside that boundary, or it may be more expensive, or it may be way easier to detect.

This is how secure boot helps you: it kicks the attacker out on every boot.

What can go wrong composing systems?

Naturally, when you change that boundary, the secure boot property may no longer hold. Continuing with our previous example, say we want to add WiFi connectivity to our smart watch. For this, we add an additional adjacent processor B that implements the WiFi functionality. We can model this case by the following system, slightly more complex than the previous one:

              system S2
             ┌───────────────────────────────────────────────────────┐
             │                                                       │
             │       ┌───────────────┐       ┌───────────────┐       │
outer world  │       │               │       │               │       │
      ◄──────┼──────►│  processor B  │◄─────►│  processor A  │       │
             │       │               │       │               │       │
             │       └───────────────┘       └───────────────┘       │
             │                                                       │
             └───────────────────────────────────────────────────────┘

Assume that B is an MCU without secure boot. (After all, all the “security critical” functionality lives in A.) Assume there’s an exploitable vulnerability in B. Since there’s no secure boot, that means the attacker can persist in B. They can arbitrarily modify B’s firmware. In particular, they can use B as a launchpad to attack A, at runtime, on every boot. This does not strictly violate A’s secure boot property, but the upshot is that A will be compromised on every boot. This is indistinguishable from persistence!

This is a cute example of this phenomena: secure boot does not necessarily compose well. Let’s think about this more thorough.

Composing rules

We can reason about the following rules:

When the system is composed of all secure-boot enabled components, the system as a whole has secure boot. (This is a very rough rule. In reality we need to be precise: we need to define clearly the boundary, we need to assume legitimate FW from A doesn’t try to compromise legitimate FW from B, we need to ensure all components boot at the same time, etc…)
As soon as at least one processor does not have secure boot, the system cannot “automatically” claim secure boot.

Practical design principles

Rule 2 is too strict in real life. In a moderately complex system, it’s not realistic to demand all 100% components have secure boot. (For example, if you’re an OEM, there might be peripherals with MCUs with opaque (and likely shitty!) FW beyond your control – for example screen controllers, network cards, etc)

We need to relax a bit and be pragmatic. Things that can help:

capability of the non-secure boot device B: does it have enough memory/power to host an attack against A?
connectivity: is B well connected to A?
what’s A’s attack surface to B? Can you put some extra eyes in that code to mitigate the risk of a run-time compromise of A from B? Is it parsing packets? How? Does it make sense to trigger a reboot after handling this?

Sometimes it’s helpful to think secure boot isn’t a binary property…

Recovery from a compromise

This is always useful to think about:

in the event of a B compromise: can you still push an update to A?
can B ever block A’s network path to prevent A from getting FW updates? Can B tamper A’s “last FW version available” packets? I’m assuming A has some freshness guarantees on this information… :-)
can A update B? Ideally A can control B’s firmware

Takeaways

secure boot should not be considered in total isolation component by component, but is a system property, tied to a boundary. The interconnection between secure boot-enabled processors is important. This means, secure boot doesn’t compose automagically
roughly, if there’s any component that doesn’t have secure boot, the whole system cannot claim secure boot
even if this is a “theoretical” result, we have ways to qualify this result, live with it and live happily with imperfect systems.

Instances of this situation

tbd

DPA on the third round of a Feistel

2016-09-19T20:48:23+00:00

Straight DPA attacks target the outer rounds of a cipher: either the first one (if you know the input) or the last one (if you know the output). In a Feistel construction, you can easily DPA the second round. Things are messier if you want to target inner rounds.

Here we describe a simple, cute DPA attack on the third round output of a Feistel. There are much more advanced attacks that target fourth and even deeper rounds of a Feistel; this attack is on the other hand very simple (just assumes DPA works, no fancy leakage assumptions) and very fun. The full details are in our paper:

A first-order chosen-plaintext DPA attack on the third round of DES

O. Reparaz, B. Gierlichs

CARDIS 2017

On the left you have the three first rounds of a Feistel. In the case of DES, the 64-bit input is placed into two 32-bit words $(L_0,R_0)$ and then we crank the cipher. In the following, we assume we only get side-channel leakage from $(L_3,R_3)$. This is a good approximation to what happens if the first two rounds are well protected (for example, by masking).

Step 1. Step 1 consists of a DPA attack targeting $L_3$ with chosen input: vary $L_0$ and fix $R_0$. When you craft your inputs like this, you’re deactivating the first round. The first round output stays constant, so in the expression $L_1=L_0 \oplus F_{k_1}(R_0)$ the $F_{k_1}(R_0)$ term is unknown but constant.

We target leakage from $L_3$ to recover $k_1\oplus F_{k_0}(R_0)$. (We’re ignoring the expansion function to make the explanation easier.) Since it is a Feistel, diffusion only affects half of the state per round. So using leakage corresponding to input of round 4 we can recover second round subkey. So far we recovered the second round subkey $k_1$ plus an unknown offset $F_{k_0}(R_0)$. We have no idea about this offset $F_{k_1}(R_0)$. All we know is that it’s a constant.

Step 2. To untangle the two terms, we perform a super basic key-recovery differential attack on 1-round of a Feistel. The basic idea is that the output difference $\Delta = F_{k_0}(R_0) \oplus F_{k_0}(R_0')$ after one round leaks an awful lot information on the key $k_0$. So we repeat step 1 (picking a different fixed $R'_0$) to collect an extra $k_2 \oplus F_{k_1}(R_0')$, derive the differential $\Delta$ (the $k_2$ term cancels out) and then recover the key $k_0$. We need to collect a few (2 or 3) different $\Delta$ (by repeating step 1 a few times). From here on, everything falls down. You recover $k_0$, then you can plug it to solve for $k_1$ and you’re basically done, since you recovered two consecutive round keys and you can invert the key schedule.

Proof of concept. We implemented this attack end-to-end. The details are https://eprint.iacr.org/2017/1257.pdf. We exploit leakage only from the third round and perform the simple differential attack yielding just 4 candidates for $k_0$.

dude, is my code constant time?

2016-09-10T20:48:23+00:00

Non-constant time crypto code is dangerous. Exactly 20 years ago Kocher presented the first timing attack on a cryptographic implementation. Since then, a long list of implementations have been broken by timing attacks.

Naturally, there are countermeasures in place, even in popular open source crypto libraries. However, those countermeasures are rarely publicly tested for effectiveness. Some timing countermeasures are just baked into the code, without a test harness that could provide some degree of confidence that the countermeasure is sound. In many cases, however, shit happens and things can go wrong (for example, your compiler may get in your way).

How do we test if a piece of code runs in constant time or not? Not easy. (Otherwise every crypto library would be using it.) It turns out that, in general, it is not enough to look at the C code, or even at the assembly output.

There are already several tools available out there:

ctgrind. This is a patch to valgrind by Langley. Essentially ctgrind monitors secret-dependent memory accesses and program flow jumps. (There is an updated patch by Aumasson, but similar things can be achieved just by running stock valgrind with uninitialized secret data.)
ctverif. This tool by Almedia, Barbosa, Barthe and Dupressoir is based on formal methods and static analysis.
Flow-tracker. This tool by Rodrigues Silva “finds one or more static traces of instructions” that lead to timing variabilities.

I released dudect: a small tool to assess whether a piece of code runs in constant time or not. The approach is easy and fundamentally different from previous approaches: run the piece of code under test with different inputs, measure its execution time and apply statistics (leakage detection). This method is very simple and effective in nature. We rely on actual measurements and not on any hardware model. This is an important difference wrt previous approaches.

The resulting tool is fairly small (around 350 source code lines of C) and can run in a wide range of platforms (from high-end servers to embedded processors.)

The source code for dudect is available in github (public domain.)

Showcase: AES on x64

Let’s test the timing variability of two AES128 implementations in my laptop (Haswell architecture). The first implementation is the classic Rijmen–Bosselaers–Barreto rijndael-alg-fst.c (T-tables implementation for 32-bit processors). I used clang to compile dudect and link it against rijndael-alg-fst.o.

The procedure starts by measuring the execution time for two input classes corresponding to two different input values. (The code calls the RDTSC instruction to get accurate cycle-level timing measurements.) We take roughly a million measurements (taking a few minutes) corresponding to both input values.

In the next figure I plotted two cumulative distribution functions (cdfs) for both distributions in orange and blue. We can see that a single AES execution takes under 250 cycles. More importantly, we see that the distributions for the two classes are clearly different, that is, we suspect there is timing leakage about the input data.

To check our hypothesis a bit more rigorously we can apply a statistical test. A natural match here is the Kolmogorov-Smirnov test, but a simpler test will do also. I applied a t-test to the previous dataset. The next figure shows the evolution of the t-statistic:

The t-statistic reaches huge values (>100). This is bananas and means that the two distributions are different, that is, there is information leakage on the execution time.

Now let’s switch to the bitsliced implementation. I took a bitsliced AES from the libsodium library (which comes from Schwabe porting Kasper’s code). The resulting cdfs are very similar –they are actually indistinguishable from naked eye in the following figure:

However, looking at a cdf alone may not be enough to determine that both distributions are equivalent (that is, there is no timing leakage). The next figure shows the results of several t-tests. The statistic value never crosses the 4.5 value, which indicates that we did not detected leakage, up to several million measurements.

Thus, as expected, the table implementation leaks timing information whereas the bitsliced doesn’t.

Other examples

I’ve used dudect to test for constant-time some other implementations:

a vector-permutation AES implementation by Hamburg.
a curve25519-donna implementation running on an embedded 32-bit ARM7TDMI with an early-abort multiplier. In this case we split dudect: some parts run in the target platform and the statistics in a host computer.

Dependencies

The source code is available from github. No exotic tools are needed to build this, just a C compiler. This code includes the AES examples below.

How it works

The internals are described in our paper

Oscar Reparaz, Josep Balasch and Ingrid Verbauwhede
dude, is my code constant time?
DATE 2017

I leverage tools typically used in hardware side-channel evaluations, such as leakage detection tests, first published in a wonderful paper by Coron, Naccache and Kocher in 2000. Leakage detection is now bread and butter in hardware side-channel analysis.