<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.3.2">Jekyll</generator><link href="/feed.xml" rel="self" type="application/atom+xml" /><link href="/" rel="alternate" type="text/html" /><updated>2026-03-26T16:57:03+00:00</updated><id>/feed.xml</id><title type="html">Oscar Reparaz</title><entry><title type="html">A big-key public key encryption scheme</title><link href="/oscar/misc/bigpk.html" rel="alternate" type="text/html" title="A big-key public key encryption scheme" /><published>2025-12-23T00:00:00+00:00</published><updated>2025-12-23T00:00:00+00:00</updated><id>/oscar/misc/bpk</id><content type="html" xml:base="/oscar/misc/bigpk.html"><![CDATA[<p>Big-key cryptography is regular cryptography with huge private keys. The main idea is to make key exfiltration harder. Unlike standard cryptography, which uses (say) 32-byte secret keys, big-key cryptography uses on purpose massive keys (gigabytes) so that an attacker gaining temporary access to your system has a tough time exfiltrating all this. This increases the likelihood you detect the attacker or slow them down.</p>

<p><strong>Big-key public key encryption</strong>. Here we design a public-key encryption system with “short” public keys (32 bytes) and very long private keys. The basic idea is to use an identity-based encryption scheme (like Boneh–Franklin) under the hood. The following explanation assumes you have a standard IBE with <code class="language-plaintext highlighter-rouge">IBE.Setup()</code>, <code class="language-plaintext highlighter-rouge">IBE.Extract()</code>, <code class="language-plaintext highlighter-rouge">IBE.Encrypt()</code> and <code class="language-plaintext highlighter-rouge">IBE.Decrypt()</code> operations.</p>

<h3 id="setup-phase">Setup phase</h3>

<ol>
  <li>Master Key Generation (Alice)
    <ul>
      <li>Alice runs <code class="language-plaintext highlighter-rouge">IBE.Setup()</code> → generates <code class="language-plaintext highlighter-rouge">(MPK, MSK)</code></li>
      <li><code class="language-plaintext highlighter-rouge">MPK</code> becomes Alice’s public key (published/distributed to everyone)</li>
      <li><code class="language-plaintext highlighter-rouge">MSK</code> is kept temporarily for key extraction only</li>
    </ul>
  </li>
  <li>Identity Pre-Agreement (Alice &amp; Bob)
    <ul>
      <li>They agree on a large set of identities, e.g.:
        <ul>
          <li>“alice-001”, “alice-002”, …, “alice-100000”</li>
          <li>Could be: sequential, random strings, UUIDs, etc.</li>
          <li>Each identity represents one decryption capability</li>
        </ul>
      </li>
      <li>Both parties store this list of valid identities</li>
    </ul>
  </li>
  <li>Big Private Key Generation (Alice)
    <ul>
      <li>For each agreed identity <code class="language-plaintext highlighter-rouge">ID_i</code>, Alice runs:
<code class="language-plaintext highlighter-rouge">SK_i = IBE.Extract(MSK, ID_i)</code></li>
      <li>The “big private key” is the collection:
<code class="language-plaintext highlighter-rouge">BigSK = {SK_001, SK_002, ..., SK_100000}</code></li>
      <li>CRITICAL: Alice then securely wipes <code class="language-plaintext highlighter-rouge">MSK</code> from memory/storage</li>
      <li><code class="language-plaintext highlighter-rouge">MSK</code> is never stored long-term, only used during setup</li>
    </ul>
  </li>
</ol>

<h3 id="encryption-bob--alice">Encryption (Bob → Alice)</h3>

<p>To send message <code class="language-plaintext highlighter-rouge">M</code> to Alice:</p>

<ol>
  <li>Bob picks an identity at random from the pre-agreed list
    <ul>
      <li>E.g., randomly select <code class="language-plaintext highlighter-rouge">"alice-047851"</code></li>
    </ul>
  </li>
  <li>
    <p>Bob encrypts using IBE:
<code class="language-plaintext highlighter-rouge">C = IBE.Encrypt(MPK, "alice-047851", M)</code></p>
  </li>
  <li>
    <p>Bob sends to Alice:
<code class="language-plaintext highlighter-rouge">(C, "alice-047851")</code></p>

    <p>The identity must be included so Alice knows which key component to use</p>
  </li>
</ol>

<h3 id="decryption-alice">Decryption (Alice)</h3>

<p>When Alice receives <code class="language-plaintext highlighter-rouge">(C, ID)</code>:</p>

<ol>
  <li>Alice looks up the corresponding private key component from her big key
    <ul>
      <li>E.g., if <code class="language-plaintext highlighter-rouge">ID = "alice-047851"</code>, retrieve <code class="language-plaintext highlighter-rouge">SK_047851</code></li>
    </ul>
  </li>
  <li>
    <p>Alice decrypts using IBE:
<code class="language-plaintext highlighter-rouge">M = IBE.Decrypt(SK_047851, C)</code></p>
  </li>
  <li>(Optional) Alice can delete <code class="language-plaintext highlighter-rouge">SK_047851</code> after use for forward secrecy</li>
</ol>

<h2 id="drawbacks">Drawbacks</h2>

<ul>
  <li><strong>It’s not “all or nothing”.</strong> An attacker that exfiltrates a portion of the private key gets a non-zero chance of decrypting ciphertexts to Alice (probability proportional to what they managed to exfiltrate).</li>
</ul>

<h2 id="prototype-implementation">Prototype implementation</h2>

<p>Can be found here: <a href="">https://github.com/oreparaz/bigkey-pke</a></p>

<p>Some notes:</p>

<ul>
  <li>This implementation uses vuvuzela crypto (<a href="">https://github.com/vuvuzela/crypto/blob/master/ibe/ibe.go</a>) which uses Barreto-Naehrig curves. These are weaker than expected: see <a href="">https://moderncrypto.org/mail-archive/curves/2016/000740.html</a></li>
  <li>We could use other libraries (maybe over BLS curves), like <a href="">https://github.com/encryption4all/ibe/</a></li>
</ul>]]></content><author><name></name></author><category term="misc" /><summary type="html"><![CDATA[Big-key cryptography is regular cryptography with huge private keys. The main idea is to make key exfiltration harder. Unlike standard cryptography, which uses (say) 32-byte secret keys, big-key cryptography uses on purpose massive keys (gigabytes) so that an attacker gaining temporary access to your system has a tough time exfiltrating all this. This increases the likelihood you detect the attacker or slow them down.]]></summary></entry><entry><title type="html">light: secure messaging</title><link href="/oscar/misc/light.html" rel="alternate" type="text/html" title="light: secure messaging" /><published>2025-12-21T00:00:00+00:00</published><updated>2025-12-21T00:00:00+00:00</updated><id>/oscar/misc/light</id><content type="html" xml:base="/oscar/misc/light.html"><![CDATA[<p>Here’s a secure messaging system that you can’t hack remotely. You can use this to talk to your friends in a way that even mercenary spyware can’t access your messages. This is guaranteed by hardware-level primitives. Let’s call it <em>light</em>.</p>

<p>Our threat model is <strong>remote-only attacks</strong>, with ability of performing zero-click remote-code execution (mercenary spyware).</p>

<h2 id="the-basic-idea">The basic idea</h2>

<p><em>light</em> is an endpoint architecture: three independent nodes connected by unidirectional links. On top of this endpoint architecture, we implement public-key cryptography to encrypt+authenticate messages between peers. The nuance here is how each “endpoint” is constructed.</p>

<p>A conventional secure messenger does everything in one machine: parse untrusted network input, run crypto, handle keys, render + store plaintext, accept keyboard input.</p>

<p><em>light</em> doesn’t do this. Instead, light carefully decomposes this monolithic networking + cryptographic stack into 3 different hardware-isolated nodes, connected via unidirectional links (data diodes).</p>

<h2 id="system-architecture">System architecture</h2>

<p>Each endpoint is partitioned into three discrete nodes:</p>

<ul>
  <li>
    <p>a TX computer (“the source”): takes plaintext you want to send, encrypts/authenticates it, outputs ciphertext and sends it to the GW. This computer isn’t connected to the internet. Even though it can send messages to GW, it cannot receive any bit of information from GW.</p>
  </li>
  <li>
    <p>a RX computer (“the sink”): receives messages from GW, decrypts and displays plaintext on a dedicated display. RX can only receive messages to GW, it cannot send any messages to GW.</p>
  </li>
  <li>
    <p>a GW computer (an untrusted “proxy”): this computer is connected to the internet and is connected to RX and from TX. Its task is to talk to other peers’ GWs and send/receive encrypted messages. This GW can use any transport layer: Signal messaging, XMPP, or other more sophisticated systems to provide better anonymity (tor, etc).</p>
  </li>
</ul>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>                              INTERNET
                                 │
                                 │ (bidirectional)
                                 │
                            ┌────▼────┐
                            │    GW   │ (untrusted proxy)
                            │         │
                            └─┬───────┘───────┬
                              │               ▲
                              │               │
                   encrypted  │               │ encrypted
                   (to RX)    │               │ (from TX)
                              ▼               │
                       ┌──────────┐   ┌──────────┐
                       │    RX    │   │    TX    │
                       │ (sink)   │   │ (source) │
                       └────┬─────┘   └─────▲────┘
                            │               │
                            │               │ plaintext
                            │               │ (input)
                            ▼               │
                      ┌──────────┐    ┌──────────┐
                      │  Screen  │    │ Keyboard │
                      │          │    │          │
                      └──────────┘    └──────────┘
                                            ▲
                                            │
                             ┌─────────────┐│
                             │    Human    ││
                             │  Operator   │┘
                             └─────────────┘
</code></pre></div></div>

<p>There’s an additional unidirectional link from TX to RX. This is used to send direct keystrokes and have a smooth UI.</p>

<p>Note that the set of 3 computers work together, and this is abstracted away to the user. The user experience is identical to that of a regular computer (one keyboard, one display).</p>

<h2 id="system-security-properties">System security properties</h2>

<p>The trick is that the links are unidirectional. Observe that:</p>

<ul>
  <li>
    <p>TX never sees untrusted input. This means this computer will always execute the original code, and it’s impossible to remotely exploit it. Egress messages will always be encrypted to the intended recipient.</p>
  </li>
  <li>
    <p>RX does not have an egress path to the internet. This means that if this computer gets compromised, it’s impossible to exfiltrate keys or plaintext from this computer. The only “output” from this computer is a screen.</p>
  </li>
</ul>

<h2 id="an-attack-human-in-the-loop-social-engineering">An attack: human-in-the-loop social engineering</h2>

<p>Note that there is an exfiltration path to the internet: via the human operating the node. A compromised RX node can’t send bits to the GW, but it can send bits to the user’s brain (by lying on the screen). For example, a compromised RX could display “<em>Sync error. Please type the following ‘session recovery code’ into your TX terminal to reconnect: [hex-encoded private key]</em>”.</p>

<p>There are two ways to mitigate this:</p>
<ol>
  <li>Prevent that from happening with operator discipline. Train them in possible ways to trick them.</li>
  <li>Even if it happens, make exfiltration astronomically slow by using big-key cryptography</li>
</ol>

<p>Big-key cryptography means using very long private keys. Instead of 32-byte keys, use (say) a 10MB private key. Exfiltrating a 10MB key with a human in the loop is so expensive and slow that exfiltration isn’t practical.</p>

<h2 id="cryptographic-constructions">Cryptographic constructions</h2>

<p>The baseline construction is just public-key authenticated encryption (for example, using ECIES). This works well. Your peer’s public key can be hardcoded in the firmware of your TX (predistributed). Or alternatively you can introduce new peers by punching / typing the public key on the keyboard (which is connected to your TX). Note that TX has a unidirectional link to RX, so you can see your own messages you sent.</p>

<p>The full protocol goes like this:</p>
<ul>
  <li>Alice TX generates $s:= (s_1, s_2)$ at random, uses it to key a symmetric big-key cipher and encrypt message $m$.</li>
  <li>Alice TX sends $c := (\text{pk-enc}(\text{pk-bob}, s_1), \text{big-pk-enc}(\text{pk-bob}, s_2))$ where pk-enc is regular ECIES and big-pk-enc is a big-key public key encryption scheme.</li>
  <li>Bob RX receives and decrypts $s$. Uses $s$ to decrypt $m$.</li>
  <li>When Alice wants to send subsequent messages, she sets $s \leftarrow h(s)$ and uses $s$ to encrypt a new message.</li>
</ul>

<p>(Gateways omitted in the description above.)</p>

<p>A wrinkle is that you can’t do a full Diffie-Hellman exchange, since there is no bidirectional channel by design. This prevents us from getting the traditional “perfect forward secrecy” properties. We can get slighter weaker post-compromise security by hashing the key material on each message round. Like this, a compromise of endpoint at time N can’t reveal anything about N-1.</p>

<p>(You can still provide some security for future messages if you send on each message material to be added to the shared secret. That only works if the adversary doesn’t get a copy of every message. This assumption is pretty strong.)</p>

<p><strong>Big-key public key encryption</strong>. For a concrete scheme, see <a href="bigpk.html">the accompanying post</a>.</p>

<h2 id="slowhigh-domains">Slow/high domains</h2>

<p>In addition to the segregation into RX/TX/GW explained above, there’s a hardware-enforced bandwidth limited channel between keyboard and TX, and between RX and display. The goal here is to restrict to a few bits per second the information that can flow in those links.</p>

<p>We do this to provide exfiltration resistance. We do not speed-limit the unidirectional links TX -&gt; GW, TX -&gt; RX nor GW -&gt; RX. These channels can carry a considerable bandwidth (even though they are unidirectional) since big-key cryptography is expensive (can result in large ciphertexts too).</p>

<h2 id="unidirectional-links-consideration">Unidirectional links consideration</h2>

<p>You can’t use ARQ or selective retransmission. We resort to the simplest forward error correcting code: just send the same thing a bunch of times, and hope for the best.</p>

<p>A more elegant solution could be using fountain codes. Keep sending constantly the most recent messages, in case the data link goes temporarily down.</p>

<h2 id="hardware">Hardware</h2>

<p>I’m prototyping each node with 3 raspberry pis linked with unidirectional serial links. I’m using two UARTs in the raspberry: the hardware one and a soft uart via the kernel module <a href="">https://github.com/oreparaz/soft_uart</a>.</p>

<p>You can also implement all 3 nodes within a single FPGA. This is left for future work.</p>]]></content><author><name></name></author><category term="misc" /><summary type="html"><![CDATA[Here’s a secure messaging system that you can’t hack remotely. You can use this to talk to your friends in a way that even mercenary spyware can’t access your messages. This is guaranteed by hardware-level primitives. Let’s call it light.]]></summary></entry><entry><title type="html">What $10 of GPU time buys you in 2025</title><link href="/oscar/misc/gpu-2025.html" rel="alternate" type="text/html" title="What $10 of GPU time buys you in 2025" /><published>2025-11-01T00:00:00+00:00</published><updated>2025-11-01T00:00:00+00:00</updated><id>/oscar/misc/gpu</id><content type="html" xml:base="/oscar/misc/gpu-2025.html"><![CDATA[<p>We are in 2025. Compute is cheap. With \$2 you can rent a beefy GPU by the minute and nerd out all evening. This entertainment is cheaper than going to the movies (and depending who you ask, more fun). Let’s see how many discrete logarithms we can break on a \$10 budget…</p>

<h2 id="the-problem">The problem</h2>

<p>We want to solve discrete logarithm problem in an interval. Our group is an elliptic curve over a 256-bit prime field. The interval size is about 70 bits. This is a problem that has interesting applications (for example, when key generation didn’t use full entropy), but let’s talk about that another time.</p>

<p>The canonical algorithm to solve this is Pollard’s Kangaroo algorithm (extended to distributed search by van Oorschot and Wiener). This takes about $(2+o(1))\sqrt{N}$ group operations. This is an asymptotic measure; but here we want to get concrete and see the concrete cost of breaking (say) a 70-bit discrete log challenge in 2025 USD.</p>

<h2 id="the-code">The code</h2>

<p>We take a GPU implementation for Pollard’s Kangaroo, with minimal modifications to make it work on different GPU architectures.</p>

<h2 id="the-platform">The platform</h2>

<p>We have the following GPUs available for rent:</p>

<table>
  <thead>
    <tr>
      <th>GPU name</th>
      <th>$ / hour</th>
      <th>Where hosted</th>
      <th>Year</th>
      <th>TFLOPS (FP32)</th>
      <th>Compute Capability</th>
      <th>VRAM</th>
      <th>Memory Bandwidth</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Titan Xp</td>
      <td>$0.049/hr</td>
      <td>Korea</td>
      <td>2017</td>
      <td>11.7</td>
      <td>6.1 (Pascal)</td>
      <td>12 GB</td>
      <td>547 GB/s</td>
    </tr>
    <tr>
      <td>RTX PRO 6000 (Blackwell)</td>
      <td>$0.676/hr</td>
      <td>USA</td>
      <td>2025</td>
      <td>119.0</td>
      <td><strong>9.0 (Blackwell)</strong></td>
      <td>96 GB</td>
      <td>1.79 TB/s (≈ 1792 GB/s)</td>
    </tr>
    <tr>
      <td>H200 (Hopper)</td>
      <td>$2.800/hr</td>
      <td>France</td>
      <td>2024</td>
      <td>197.0 (est.)</td>
      <td>9.0 (Hopper)</td>
      <td>141 GB</td>
      <td>4.8 TB/s</td>
    </tr>
  </tbody>
</table>

<p>The machines are rented from vast.ai, a marketplace for GPU work. GPU owners can rent out GPU time and set a floor bid price. I don’t know much about vast.ai but the experience overall was great: top up $5 and connect 5 minutes later to random machines with big GPUs (some with a residential IP address, yay)</p>

<h2 id="results">Results</h2>

<ul>
  <li>How many 70-bit logarithms you can get with a $10 budget?
    <ul>
      <li>Titan Xp: about <strong>4500</strong></li>
      <li>RTX PRO 6000: about <strong>710</strong>. This is a bit surprising, but note that this is 10x more expensive than the Titan Xp.</li>
      <li>H200: TODO</li>
    </ul>
  </li>
  <li>How many 64-bit logarithms you can get with $10 budget?
    <ul>
      <li>Titan Xp: about <strong>13428</strong></li>
    </ul>
  </li>
</ul>

<h3 id="conclusions">Conclusions</h3>

<ul>
  <li>For mid-size problems (around 64-bit problems), older and cheaper GPUs are great. They are pretty efficient and can be optimal for <em>$ / computation</em> (not <em>$ / speed</em>)</li>
  <li>For smaller problems you’re probably better off with just a CPU.
    <ul>
      <li>A 50-bit problem takes just around 11.5s on an Intel E5-2680 v4 @ 2.40GHz. On a Titan Xp it takes about 17s (203 per hour, or about <strong>41000</strong> for \$10)</li>
    </ul>
  </li>
</ul>

<h3 id="eye-candy">eye candy</h3>

<p>Example problem for 70-bit ECDLP on an interval:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Start : 0x8C33238D4C8F2F7FACD4FF3A5BCB73550C7045F3297110CEFE6DF0F00AF193D4
Stop  : 0x8C33238D4C8F2F7FACD4FF3A5BCB73550C7045F32971110EFE6DF0F00AF193D3
[1] Priv: 0x8C33238D4C8F2F7FACD4FF3A5BCB73550C7045F3297111067279BA00590DBE69
    Pub: 0364FBFDFF900733243009E571BC503762EC7A809B132D50F2558B2C431E6E4AE0

&gt;&gt;&gt; bin(0x8C33238D4C8F2F7FACD4FF3A5BCB73550C7045F32971110EFE6DF0F00AF193D3 - 0x8C33238D4C8F2F7FACD4FF3A5BCB73550C7045F3297110CEFE6DF0F00AF193D4 + 1)
'0b10000000000000000000000000000000000000000000000000000000000000000000000'

</code></pre></div></div>

<p>RTX PRO 6000 drawing 300 Watts:
<img src="/assets/gpu/candy1.png" alt="RTX PRO 6000" width="800" /></p>

<p>Titan Xp drawing about 250 Watts:
<img src="/assets/gpu/candy2.png" alt="Titan Xp" width="800" /></p>

<p>Log for the Titan Xp:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>NVIDIA Titan Xp - 50-bit ECDLP Benchmark Results
Date: Fri Nov  7 06:07:01 PM CET 2025
========================================

Run 1: 21s
Run 2: 17s
Run 3: 17s
Run 4: 18s
Run 5: 17s
Run 6: 18s
Run 7: 17s
Run 8: 17s
Run 9: 18s
Run 10: 17s

All tests completed!
Statistics:
===========
Total runs: 10
Total time: 177s (2m 57s)

Mean time: 17.70s
Median time: 17.0s
Min time: 17s (Runs 2, 3, 5, 7, 8, 10)
Max time: 21s (Run 1)

Sorted times: 17, 17, 17, 17, 17, 17, 18, 18, 18, 21

10 GPU-Hour Throughput Estimate:
=================================
10 GPU-hours = 36,000 seconds
Average time per challenge: 17.70s
Estimated challenges in 10 GPU-hours: ~2,033 challenges

Per hour rate: ~203 challenges/hour
Per day rate (24h): ~4,881 challenges/day

Hardware:
=========
GPU: NVIDIA Titan Xp (Pascal, Compute 6.1)
CPU: 4 threads
CUDA: 8.0
Configuration: GPU Grid(60x256), DP size=9
GPU Throughput: ~565-690 MK/s (typical: ~575 MK/s)


NVIDIA Titan Xp - 64-bit ECDLP Benchmark Results
========================================

Run 1: 40s
Run 2: 51s
Run 3: 43s
Run 4: 74s
Run 5: 79s
Run 6: 46s
Run 7: 65s
Run 8: 43s
Run 9: 66s
Run 10: 40s

All tests completed!

Statistics:
===========
Total runs: 10
Total time: 547s (9m 7s)

Mean time: 54.70s
Median time: 48.5s
Min time: 40s (Runs 1, 10)
Max time: 79s (Run 5)

Sorted times: 40, 40, 43, 43, 46, 51, 65, 66, 74, 79

10 GPU-Hour Throughput Estimate:
=================================
10 GPU-hours = 36,000 seconds
Average time per challenge: 54.70s
Estimated challenges in 10 GPU-hours: ~658 challenges

Per hour rate: ~66 challenges/hour
Per day rate (24h): ~1,584 challenges/day

Hardware:
=========
GPU: NVIDIA Titan Xp (Pascal, Compute 6.1)
CPU: 4 threads
CUDA: 8.0
Configuration: GPU Grid(60x256), DP size=9
GPU Throughput: ~565-690 MK/s (typical: ~575 MK/s)
</code></pre></div></div>

<p>Log for the RTX PRO 6000:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
Statistics:
================================================
Total runs:    10
All solved:    10/10 (100% success rate)
Total time:    758 seconds (12m 38s)

Minimum time:  49 seconds
Maximum time:  126 seconds
Mean time:     75.8 seconds
Median time:   75 seconds

Sorted times: 49, 51, 57, 72, 72, 78, 81, 86, 86, 126

Performance Metrics:
================================================
Average GPU throughput: ~3.9 GK/s
Range searched per run: 2^70 (1,180,591,620,717,411,303,424 keys)
Keys per second (avg):  1.56e+19 (2^64.09)

</code></pre></div></div>

<p>CPU-only is faster for 50-bit problems, slower for 64-bit problems</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
  Key Observations:

  1. Huge variance in CPU times: The probabilistic nature of Pollard's Kangaroo shows extreme variance for CPU-only mode:
    - Fastest: 104s
    - Slowest so far: 625s
    - That's a 6x difference between best and worst case!
  2. CPU vs GPU comparison (preliminary):
    - CPU average so far: ~393s (based on first 4 tests)
    - GPU average (from earlier): 54.7s
    - GPU is ~7.2x FASTER than CPU for 64-bit problems ✅
  3. This confirms the break-even point:
    - 50-bit: CPU is 1.54x faster (11.5s vs 17.7s)
    - 64-bit: GPU is 7.2x faster (55s vs ~393s)
    - The crossover happens somewhere between 50-64 bits
</code></pre></div></div>

<h3 id="discussion--todo">Discussion / TODO</h3>

<ul>
  <li>code: there are no modifications to the code for different GPUs. this is probably very suboptimal. experiment with varying parameters</li>
  <li>this is probabilistic. factor success probability</li>
  <li>make an automated discovery of what’s the most economic platform to solve this with the live json bidding from vast.ai</li>
  <li>algorithmic: There are improvements by Galbraith, Pollard and Ruprai that bring the runtime down to $(1.661 + o(1))\sqrt{N}$ <a href="https://eprint.iacr.org/2010/617.pdf">https://eprint.iacr.org/2010/617.pdf</a></li>
  <li>how to rent out your GPUs <a href="https://cloud.vast.ai/host/setup">https://cloud.vast.ai/host/setup</a></li>
</ul>]]></content><author><name></name></author><category term="minor" /><summary type="html"><![CDATA[We are in 2025. Compute is cheap. With \$2 you can rent a beefy GPU by the minute and nerd out all evening. This entertainment is cheaper than going to the movies (and depending who you ask, more fun). Let’s see how many discrete logarithms we can break on a \$10 budget…]]></summary></entry><entry><title type="html">punchsig: cryptographic signatures, data diodes and trusted input</title><link href="/oscar/misc/punchsig.html" rel="alternate" type="text/html" title="punchsig: cryptographic signatures, data diodes and trusted input" /><published>2025-05-25T00:00:00+00:00</published><updated>2025-05-25T00:00:00+00:00</updated><id>/oscar/misc/punchsig</id><content type="html" xml:base="/oscar/misc/punchsig.html"><![CDATA[<p>Look at this beauty. This box is a small USB dongle I built over the weekend with a bunch of junk electronics I had lying around. The dongle cryptographically signs short text inputs directly from your USB keyboard.</p>

<p><img src="/assets/punchsig/punchsig_1.jpeg" alt="punchsig" /></p>

<p><strong>What is in this box?</strong> The dongle acts as an interposer: keyboard plugs into one side, computer into the other. It forwards keystrokes transparently, and can cryptographically sign text you type. To the computer, it just looks like a regular USB keyboard. Internally, the dongle has a “secure processor” which holds the signing keys and generates the ed25519 signature.</p>

<h3 id="the-twist">the twist</h3>

<p>So far, all pretty standard. <strong>The twist here is</strong> that the dongle’s secure processor is isolated by design – it’s airgapped in one direction. The dongle’s secure processor can send data to the computer, but not the other way around. This means the only input the security processor ever sees comes directly from the keyboard – which we assume to be trusted. Malware on the computer can’t talk to it at all. The unidirectional airgap is enforced by means of a “data diode”.</p>

<iframe src="https://player.vimeo.com/video/1087953008?context=Vimeo%5CController%5CApi%5CResources%5CVideoController.&amp;h=1e8a1b1e2d&amp;s=9761ece5770d16dc876ae125450f231fce9e9d78_1748434583" width="640" height="564" frameborder="0" allow="autoplay; fullscreen" allowfullscreen=""></iframe>

<p>In the demo video you’ll see the message “abc” being signed. You type the message between <code class="language-plaintext highlighter-rouge">F5</code> (start) and <code class="language-plaintext highlighter-rouge">F6</code> (end). The signature gets appended directly after your message, prefixed with <code class="language-plaintext highlighter-rouge">psig-</code>. This makes it easy to type directly into emails, chats, or whatever.</p>

<p>Example message:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Hi, hello world from punchsig! psig-1d70ddcfd35e35b78a5a4cbbf7844a7c405c1dac725c7bac48bd7093555cf7130e2a04d563aa0ace0b330c440601524d4c434ad3298ec02e862ef0750eee4c00
</code></pre></div></div>

<h3 id="security">security</h3>

<p><img src="/assets/punchsig/diagram.svg" alt="punchsig" /></p>

<p><strong>How the sausage is made</strong>. Inside, this thing is made of two processors:</p>
<ul>
  <li>The <em>red</em> side of things is connected to the keyboard, reads keystrokes, performs signing and forwards them to the blue board over a unidirectional serial link.</li>
  <li>The <em>blue</em> board just receives keystrokes over the serial link and forwards them to a PC. It emulates a USB HID keyboard.</li>
</ul>

<p><strong>This works differently from a YubiKey.</strong> With a YubiKey, the message to be signed comes from the computer — so malware could tamper with it before it’s signed. The user might notice the final message isn’t what they intended, but malware can still batch two signing requests: the real one and a sneaky extra on the side. While YubiKeys can require a user tap for confirmation, it’s easy for malware to fake an error and trick the user into tapping twice.</p>

<p>We go a bit further than the conventional wisdom of “keys are in hardware, they are inextractable”. Hardware dongles don’t live in a vacuum: they must get the input to sign from somewhere. In the majority of the cases, this means getting the input from a host computer. If this host computer is compromised, then malware can sign arbitrary messages. That’s bad. Even if malware can’t access key material, it can sign whatever they want. By forcing the</p>

<p><strong>Are screens enough?</strong> One way of dealing with this is putting a screen on the signer. This provides “secure output”. The user is expected to verify what is on the screen vs a trusted reference. There are multiple sharp edges with this: a) how does the user know what a “trusted reference is? b) how is the user expected to be diligent and verify 128 bits (and not be lazy and verify “the few first and few last” bits)?</p>

<h3 id="future">future</h3>

<p>The current implementation of punchsig is a prototype I built in a couple of hours – it does have rough edges and is not suitable for production. Directions for improvement:</p>

<ul>
  <li>Implement proper key generation. Implement some kind of lightweight “console mode” to print public keys, generate new keys, etc</li>
  <li>Secure boot</li>
  <li><a href="https://github.com/oreparaz/punchsig-red/pull/1">Firmware builds today are reproducibly by means of nix</a>, but they lack documentation</li>
  <li>Implement forward-secure signatures. This would be nice to protect signatures against future compromise. Some scheme based on certs or <a href="https://eprint.iacr.org/2001/042">Krawczyk’s trick</a> would be relatively easy to implement.</li>
  <li>Implement an attestation CA for locally assembled punchsig</li>
  <li>Protect against replays. Today the user can type by themselves a timestamp. It would be great if this is better handled. One option is adding an RTC (but this adds components + a coin cell battery). A GPS module is another alternative.</li>
  <li>Improve reliability of the USB PIO stack. Not all keyboards enumerate yet.</li>
</ul>

<h3 id="hardware">hardware</h3>

<p>The two boards are based on Raspberry Pi Pico, connected <a href="https://github.com/oreparaz/punchsig/tree/main/hw#connections">like this</a>.</p>

<h3 id="code">code</h3>

<p>Firmware is pretty straighforward and largely taken from sample code. There is great code for the raspberry pi pico that implements USB HID host and client. There’s a very simple framing protocol that forwards the HID reports from one board to the other.</p>

<p>Source available at 
<a href="https://github.com/oreparaz/punchsig">https://github.com/oreparaz/punchsig</a>.</p>

<h3 id="verification">verification</h3>

<p>This is the javascript verification page I’ve been using for testing: <a href="https://www.reparaz.net/punchsig-verify">https://www.reparaz.net/punchsig-verify</a>. Some messages to try:</p>

<ul>
  <li>
    <p>Yo, does this thing work? psig-f1424b7587150bb83ac8a895bb4f3424a6228b747083369638d5bfd7f24bd752f1fcbc3a140d431295ecdb163d224c608a0a7d633684a5cc23d1c1791821e30c</p>
  </li>
  <li>
    <p>1, 2, 3, tango! psig-b6291b8c9724429822816e486cda12a690ba861ef7567769d4d24a9f42270941d40e9e4282ac0517bfce033c992a136603f1751ba8c77396a52e3e427fd6b206</p>
  </li>
  <li>
    <p>ahem, aham, ahum psig-4ca4a688595e70b1907442c2541a2e11a51257344815acfaf7acde2eb437b06244dfb9b44939520a44a1436c77fc0f3800e4144d9e6a10d29ecd975cf2b4aa06</p>
  </li>
</ul>

<p>Parameters:</p>
<ul>
  <li>All use the ed25519 testing public key <code class="language-plaintext highlighter-rouge">34bc5d83dd91bfa5df1ecada9630e3646fdb497afb4353d7d9f6ba2bb9ac41c0</code>.</li>
  <li>All use the global context <code class="language-plaintext highlighter-rouge">punchsig v2025-05-18 context</code>. This is prepended to the message.</li>
</ul>

<h3 id="build">build</h3>

<p>Some pictures to inspire (or scare) you: the “case” is just a sandwich of styrofoam (terrible thermal properties (but the microcontrollers don’t get hot anyways), provide great physical durability).</p>

<p><img src="/assets/punchsig/punchsig_2.jpeg" alt="punchsig" /></p>

<p>There are just two boards that are embedded in the styrofoam:</p>

<p><img src="/assets/punchsig/punchsig_assembly.jpeg" alt="punchsig" /></p>

<p>Connected unidirectionally with 3 wires:</p>

<p><img src="/assets/punchsig/punchsig_assembly_2.jpeg" alt="punchsig" /></p>

<!--
<img src="/assets/punchsig/punchsig_sandwich.jpeg" alt="punchsig" />
The outer is just paper from a leaflet:
<img src="/assets/punchsig/punchsig_box.jpeg" alt="punchsig" />
<img src="/assets/punchsig/punchsig_1.jpeg" alt="punchsig" />
-->]]></content><author><name></name></author><category term="misc" /><summary type="html"><![CDATA[Look at this beauty. This box is a small USB dongle I built over the weekend with a bunch of junk electronics I had lying around. The dongle cryptographically signs short text inputs directly from your USB keyboard.]]></summary></entry><entry><title type="html">ANSSI x509 cert parser</title><link href="/oscar/misc/anssi-x509.html" rel="alternate" type="text/html" title="ANSSI x509 cert parser" /><published>2025-03-21T00:00:00+00:00</published><updated>2025-03-21T00:00:00+00:00</updated><id>/oscar/misc/x509</id><content type="html" xml:base="/oscar/misc/anssi-x509.html"><![CDATA[<p>This is a fantastic piece of software:</p>

<blockquote>
  <p>Arnaud Ebalard. <strong>x509-parser: a RTE-free X.509 parser</strong>. <a href="https://github.com/ANSSI-FR/x509-parser">https://github.com/ANSSI-FR/x509-parser</a></p>
</blockquote>

<p>This implements a X.509 parser, with profuse ACSL annotations so that the whole parser is verifiable with frama-c</p>
<ul>
  <li>This guarantees the parser is free from run-time errors: invalid memory accesses, signed integer overflows, undefined behavior)</li>
  <li>It is suitable for embedded devices (no malloc, no floats, no dependencies)</li>
  <li>The current ACSL annotations cannot prove the code adheres to some specification (correctness) but nevertheless it’s a great piece of engineering</li>
  <li>It is a good resource to learn frama-c and ACSL in a non-trivial project.</li>
</ul>

<p>This is an excerpt from the paper:</p>

<p><img src="/assets/x509/acsl.png" alt="ACSL" width="800" /></p>

<p>References:</p>
<ul>
  <li>Arnaud Ebalard, Patricia Mouy, and Ryad Benadjila. <strong>Journey to a RTE-free X.509 parser</strong>. pretty didactical paper: <a href="https://www.sstic.org/media/SSTIC2019/SSTIC-actes/journey-to-a-rte-free-x509-parser/SSTIC2019-Article-journey-to-a-rte-free-x509-parser-ebalard_mouy_benadjila_3cUxSCv.pdf">https://www.sstic.org/media/SSTIC2019/SSTIC-actes/journey-to-a-rte-free-x509-parser/SSTIC2019-Article-journey-to-a-rte-free-x509-parser-ebalard_mouy_benadjila_3cUxSCv.pdf</a></li>
  <li>Presentation slides: <a href="https://www.sstic.org/media/SSTIC2019/SSTIC-actes/journey-to-a-rte-free-x509-parser/SSTIC2019-Slides-journey-to-a-rte-free-x509-parser-ebalard_mouy_benadjila.pdf">https://www.sstic.org/media/SSTIC2019/SSTIC-actes/journey-to-a-rte-free-x509-parser/SSTIC2019-Slides-journey-to-a-rte-free-x509-parser-ebalard_mouy_benadjila.pdf</a></li>
  <li>Presentation at SSTIC 2019, with video (in french) <a href="https://www.sstic.org/2019/presentation/journey-to-a-rte-free-x509-parser/">https://www.sstic.org/2019/presentation/journey-to-a-rte-free-x509-parser/</a></li>
  <li>Code: <a href="https://github.com/ANSSI-FR/x509-parser">https://github.com/ANSSI-FR/x509-parser</a></li>
</ul>

<p>Related:</p>
<ul>
  <li>Paper: <a href="https://iacr.org/submit/files/slides/2023/rwc/rwc2023/46/slides.pdf">https://iacr.org/submit/files/slides/2023/rwc/rwc2023/46/slides.pdf</a></li>
</ul>

<p>Other projects from ANSSI:</p>
<ul>
  <li>libdrbg (no frama-c verification) <a href="https://github.com/ANSSI-FR/libdrbg">https://github.com/ANSSI-FR/libdrbg</a></li>
  <li>libecc (well engineered) <a href="https://github.com/libecc/libecc">https://github.com/libecc/libecc</a></li>
</ul>]]></content><author><name></name></author><category term="minor" /><summary type="html"><![CDATA[This is a fantastic piece of software:]]></summary></entry><entry><title type="html">Omron blood pressure monitor: reading the internal EEPROM</title><link href="/oscar/misc/omron.html" rel="alternate" type="text/html" title="Omron blood pressure monitor: reading the internal EEPROM" /><published>2023-12-28T00:00:00+00:00</published><updated>2023-12-28T00:00:00+00:00</updated><id>/oscar/misc/omron</id><content type="html" xml:base="/oscar/misc/omron.html"><![CDATA[<p>This note explains how to read the internal EEPROM of an <em>Omron Upper Arm Blood Pressure Monitor 3 Series</em> (model BP710N) to achieve interoperability. You’ll be able to read your blood pressure measurements with your own microcontroller.</p>

<h3 id="hardware">Hardware</h3>

<p>This is the hardware we’re dealing with. It sells for about $40 on Amazon in the U.S. as of 2019. There are more expensive models from the same manufacturer that have Bluetooth connectivity. Here we are dealing with the simplest model: no Bluetooth, no WiFi, no USB.</p>

<p><img src="/assets/omron/unit.png" alt="Omron BP170N" width="500" /></p>

<h3 id="pcb">PCB</h3>

<p>This is a picture of the main board. We can identify some components:</p>

<ul>
  <li>the main Toshiba processor (IC1 in the picture) is in a LQFP64 package. Markings on the chip: 1904 HAL, T5DE 1UG, 916549. I couldn’t find any datasheet.</li>
  <li>the small IC on the right (IC5) is an 4-kbit I2C EEPROM (markings on the chip: 4G08 82753). This is a 3.3 V EEPROM and behaves like a 24C04.</li>
</ul>

<p>The I2C bus for the EEPROM is conveniently exposed as two through-holes vias. This is annotated in the picture. It is very easy to solder this bus to your favorite microcontroller. Your extra microcontroller can drive the EEPROM (I2C is a multi-master bus).</p>

<p><img src="/assets/omron/pcb.jpg" alt="Omron BP170N PCB" width="800" /></p>

<h3 id="high-level-behavior">High-level behavior</h3>

<p>After the unit completes measuring your blood pressure (which takes about a minute), the microcontroller stores the whole measurement into the EEPROM. This is done without any user interaction. The EEPROM memory is large enough to store the last 14 measurements.</p>

<h3 id="eeprom-memory-map">EEPROM memory map</h3>

<p>This section provides the information necessary to achieve interoperability:</p>

<ul>
  <li>The EEPROM is 512 bytes long and holds 14 files.</li>
  <li>Each file is 14 bytes long and stores a single measurement.</li>
  <li>The first file starts at offset <code class="language-plaintext highlighter-rouge">0xAC</code>, the second one starts at <code class="language-plaintext highlighter-rouge">0xBA</code>, and so on till the last file that starts at <code class="language-plaintext highlighter-rouge">0x112</code>.</li>
  <li>Each file stores:
    <ul>
      <li>1 byte for the systolic pressure, stored as (measured_systolic_mmHg - 25). For example, a systolic pressure of 125 mmHg will be stored as <code class="language-plaintext highlighter-rouge">0x64</code>. (This encoding thus can represent values from 25 mmHg to 280 mmHG.)</li>
      <li>1 byte for the diastolic pressure in mmHg units. For example, a diastolic pressure of 80 mmHg will be stored as <code class="language-plaintext highlighter-rouge">0x50</code>.</li>
      <li>1 byte for the pulse measurement in BPM “units” (min^-1). For example, a pulse of 53 bpm will be stored as <code class="language-plaintext highlighter-rouge">0x35</code>.</li>
      <li>5 bytes that appear to be nearly constant. In my unit these are <code class="language-plaintext highlighter-rouge">0E 20 04 3F 10</code>.</li>
      <li>6 more bytes that I don’t understand what they do. The two last look pretty random / uniformly distributed and could be a CRC, although I didn’t dig deeper.</li>
    </ul>
  </li>
  <li>There are other interesting addresses:
    <ul>
      <li>Address <code class="language-plaintext highlighter-rouge">0x60</code> stores the “last measurement file index”.  It is a pointer to the measurement that was last written to EEPROM. The mapping “pointer value → file offset” is pretty regular with an exception for the pointer value <code class="language-plaintext highlighter-rouge">0x00</code>:</li>
    </ul>

    <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>   mem[addr=0x60] -&gt; file offset
   -----------------------------
   0x01 -&gt; 0xAC (0xAC + 14*0)
   0x02 -&gt; 0xBA (0xAC + 14*1)
   0x03 -&gt; 0xC8 (0xAC + 14*2)
   0x04 -&gt; 0xD6 (0xAC + 14*3)
        ... regular ...
   0x0C -&gt; 0x146 (0xAC + 14*11)
   0x0D -&gt; 0x154 (0xAC + 14*12)
   0x00 -&gt; 0x162 (0xAC + 14*13) // last one, NB: discontinuity!
</code></pre></div>    </div>

    <ul>
      <li>Address <code class="language-plaintext highlighter-rouge">0x04</code> and <code class="language-plaintext highlighter-rouge">0x05</code> store the total measurement count in little endian. (This is duplicated in addresses <code class="language-plaintext highlighter-rouge">0x06-0x07</code>.) This counter advances even if the measurement is bad. Bad measurements aren’t written to a file.</li>
    </ul>
  </li>
</ul>

<p><strong>How to figure this out for yourself.</strong>  The easiest is to dump the EEPROM contents before and after a measurement and study how the contents change. You can read this EEPROM with any 24C04 driver. I had success with <a href="https://github.com/nopnop2002/esp-idf-24c">https://github.com/nopnop2002/esp-idf-24c</a> . The Toshiba microcontroller talks to the EEPROM at 320 kbit/s (but of course you can talk to it at a slower bitrate). For example, after a measurement the following EEPROM contents changed:</p>

<ul>
  <li>the pointer to last measurement (stored at <code class="language-plaintext highlighter-rouge">0x60</code>) advanced from <code class="language-plaintext highlighter-rouge">0x09</code> to <code class="language-plaintext highlighter-rouge">0x0A</code></li>
  <li>the file starting at address <code class="language-plaintext highlighter-rouge">0x12A</code> (corresponding to <code class="language-plaintext highlighter-rouge">0x0A</code>) changed</li>
  <li>the first 3 bytes of the file starting at <code class="language-plaintext highlighter-rouge">0x12A</code> are <code class="language-plaintext highlighter-rouge">6f 45 36</code> which mean:
    <ul>
      <li><code class="language-plaintext highlighter-rouge">0x6f</code>: systolic pressure of  0x6f+25 = 136 mmHg</li>
      <li><code class="language-plaintext highlighter-rouge">0x45</code>: diastolic pressure of 0x45 = 69 mmHg</li>
      <li><code class="language-plaintext highlighter-rouge">0x36</code>: pulse of 54 bpm</li>
    </ul>
  </li>
  <li>the contents at <code class="language-plaintext highlighter-rouge">0x04</code> advanced by one (total number of measurements).</li>
</ul>

<p><strong>Other references.</strong> The unit I have has the following markings: MODEL: BP710N, REF HEM-7121-Z. As of 2024 it looks like Omron is selling an updated model BP7100. I don’t know how the BP7100 looks like inside. Chances are that it is very similar.</p>

<p><strong>Warning</strong>. You are on your own. Any modification will likely void your warranty. THE INSTRUCTIONS ARE PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE INSTRUCTIONS.</p>]]></content><author><name></name></author><category term="misc" /><summary type="html"><![CDATA[This note explains how to read the internal EEPROM of an Omron Upper Arm Blood Pressure Monitor 3 Series (model BP710N) to achieve interoperability. You’ll be able to read your blood pressure measurements with your own microcontroller.]]></summary></entry><entry><title type="html">A cryptographic desk clock: agreement protocols</title><link href="/oscar/misc/quorum-clock" rel="alternate" type="text/html" title="A cryptographic desk clock: agreement protocols" /><published>2022-09-03T00:00:00+00:00</published><updated>2022-09-03T00:00:00+00:00</updated><id>/oscar/misc/quorum-clock</id><content type="html" xml:base="/oscar/misc/quorum-clock"><![CDATA[<p>Check out the different parts:</p>
<ul>
  <li>Part I: <strong><a href="/oscar/misc/desk-clock-protocol">protocol for a cryptographic desk clock</a></strong></li>
  <li>Part II: <strong><a href="/oscar/misc/time-reference-home">time reference</a></strong></li>
  <li>Part II: <strong>agreement protocols</strong> (this page)</li>
  <li>… sometime in the future … client implementation</li>
</ul>

<p>Say you have 3 imperfect clock measurements. They do not tell the exact same time.
Maybe they are drifting away, or to put things worse maybe one is totally broken.
Here we look at this basic problem: given several
clock measurements, how do you establish time? We’ll see three fundamental works
in fault-tolerant agreement protocols.</p>

<h4 id="agreement">Agreement</h4>

<p>Establishing a notion of <em>agreement</em> in discrete variables in easy.
For example, we can vote to decide where a group should go have dinner
(= quorum-based technique). Or we can say we trust certain message
when at least a certain threshold of parties trust it (and express
this via cryptographic signatures).
This works nicely. We can build very robust systems with this approach,
a la triple-modular redundancy. Commercial planes land autonomously
hundreds of people safely thanks to this.</p>

<p>Establishing agreement in continuous variables is a bit harder. After all,
what are outliers? Five-sigma deviation from the mean? Why not \(42 \sigma\)?
And how do you take into account the measurement confidence? Or the sensor
accuracy?</p>

<blockquote>
  <p>Never go to sea with two chronometers; take one or three</p>
</blockquote>

<h4 id="marzullo-1983">Marzullo 1983</h4>

<p>Keith Marzullo proposed a <a href="http://infolab.stanford.edu/pub/cstr/reports/csl/tr/83/247/CSL-TR-83-247.pdf">synchronization algorithm</a> based on interval intersection.
Marzullo algorithm provides accurate time synchronization, but does not really deal
with faulty clocks. This is the basic idea:</p>

<p><img src="/assets/agreement/marzullo_definition.png" alt="Marzullo" width="500" /></p>

<p>NTP uses Marzullo’s algorithm as inspiration.</p>

<h4 id="lamport-1987">Lamport 1987</h4>

<p>Leslie Lamport wrote in 1987 the absolutely delightful <a href="https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.33.4001&amp;rep=rep1&amp;type=pdf">Synchronizing Time Servers</a>.</p>

<p>The main advantage is that Lamport’s technique can simultaneously deal with faulty clocks
<em>and</em> provide similar time to nodes that are relatively close:
<img src="/assets/agreement/lamport_review.png" alt="Synchronizing Time Servers" width="500" /></p>

<p>Lamport noticed there was something off with Marzullo’s approach.
Lamport expresses with astonishing clarity that two nodes that receive
a slightly different view of the same clocks may end up with very different
end result. This is represented in this picture:</p>

<p><img src="/assets/agreement/lamport_marzullo.png" alt="Problem with Marzullo" width="500" /></p>

<p>In the context of distributed systems, this is important. You’d expect “close” nodes to have a “similar” view of the time.
Technically, this is because the agreement function is not ``Lipschitz continuous’’</p>

<p>The solution Lamport provides is essentially averaging midpoints when the outliers are thrown away. In precise terms:</p>

<p><img src="/assets/agreement/lamport_definition.png" alt="Lamport's average" width="500" /></p>

<p><strong>Drawback</strong>: Lamport could not give an “optimal” solution for this.
(It is telling the honesty of Lamport, hardly seen in other authors.)</p>

<p><img src="/assets/agreement/lamport_average.png" alt="Problem with Lamport" width="500" /></p>

<h4 id="schmid-and-schossmaier-1999">Schmid and Schossmaier 1999</h4>

<p>Fast forward, 20 something years later, Schmid and Schossmaier solved Lamport’s concern
in their paper
<a href="https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.6.4702&amp;rep=rep1&amp;type=pdf">How to Reconcile Fault-Tolerant Interval Intersection with the Lipschitz Condition</a>.</p>

<p>They define a very simple agreement function that <em>is</em> Lipschitz continuous, and that takes into account all available information:</p>

<p><img src="/assets/agreement/def_lipschitz.png" alt="Definition: How to Reconcile Fault-Tolerant Interval Intersection with the Lipschitz Condition" width="400" /></p>

<p>This is a comparison of Schmid and Schossmaier vs Lamport function showing Lipschitz continuity in action:</p>

<p><img src="/assets/agreement/example_lipschitz.png" alt="Example Lipschitz in action" width="500" /></p>

<p><img src="/assets/agreement/example_marzullo.png" alt="Example what can happen to Marzullo" width="500" /></p>]]></content><author><name></name></author><category term="misc" /><summary type="html"><![CDATA[Check out the different parts: Part I: protocol for a cryptographic desk clock Part II: time reference Part II: agreement protocols (this page) … sometime in the future … client implementation]]></summary></entry><entry><title type="html">A cryptographic desk clock: time reference</title><link href="/oscar/misc/time-reference-home" rel="alternate" type="text/html" title="A cryptographic desk clock: time reference" /><published>2022-08-05T00:00:00+00:00</published><updated>2022-08-05T00:00:00+00:00</updated><id>/oscar/misc/time-reference-home</id><content type="html" xml:base="/oscar/misc/time-reference-home"><![CDATA[<p>Check out the different parts:</p>
<ul>
  <li>Part I: <strong><a href="/oscar/misc/desk-clock-protocol">protocol for a cryptographic desk clock</a></strong></li>
  <li>Part II: <strong>time reference</strong> (this page)</li>
  <li>Part II: <strong><a href="/oscar/misc/quorum-clock">agreement protocols</a></strong></li>
  <li>… sometime in the future … client implementation</li>
</ul>

<p>Here we will talk about the roughtime time server, and touch
a bit on the <strong>architecture</strong> and <strong>security properties</strong>.
The time reference itself (the time server) is this beauty mess of wires, currently living behind a couch. It has a small cryptographic processor, a GPS and a raspberry pi. This reference propagates the House Standard Time via WiFi.</p>

<p><img src="/assets/reference/Screen_Shot_2022-03-09_at_12.10.55_AM.png" alt="" /></p>

<h3 id="server-architecture">Server architecture</h3>

<p>Time to build! We partition the time server into two clearly distinct blocks:</p>

<ul>
  <li><strong>trusted domain</strong>: comprised of a GPS module plus a small RTOS-based microcontroller. Here we implement the following functionality: parsing GPS module NMEA data, key storage, generate cryptographic signatures</li>
  <li>
    <p><strong>untrusted domain</strong>: a raspberry pi running a golang binary implementing the following functionality: talk to network peers over UDP, construct requests to the trusted domain for signing bundling multiple requests (Merkle tree construction)</p>

    <p><img src="/assets/reference/Screen_Shot_2022-03-27_at_8.08.58_PM.png" alt="" /></p>
  </li>
</ul>

<p>Why? The whole point of partitioning this way is to minimize attack surface and compact all the security-critical functionality in a small block. This partition is in a sense optimal: the trusted domain does the minimum amount of work needed, minimizing the trusted codebase. All networking code is offloaded to the raspberry (none of which is security critical), all security-sensitive is in the trusted domain. No secrets live in the raspberry. The worst possible impact of a compromised untrusted domain is availability.</p>

<p>The interface between those two blocks looks essentially like this (less important fields omitted):</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="n">signing_request_t</span> <span class="p">{</span>
  <span class="kt">uint8_t</span> <span class="n">merkle_root</span><span class="p">[</span><span class="mi">32</span><span class="p">];</span> <span class="c1">// (merkle-)hash of all client nonces</span>
<span class="p">};</span>

<span class="k">typedef</span> <span class="k">struct</span> <span class="n">signing_response_t</span> <span class="p">{</span>
  <span class="kt">uint64_t</span> <span class="n">midpoint</span><span class="p">;</span> <span class="c1">// time at the moment of signing</span>
  <span class="kt">uint8_t</span> <span class="n">signature</span><span class="p">[</span><span class="mi">64</span><span class="p">];</span> <span class="c1">// essentially, signature over merkle_root + midpoint</span>
<span class="p">};</span>
</code></pre></div></div>

<p>Note that data in <code class="language-plaintext highlighter-rouge">signing_request</code> can be 100% attacker controlled — this is fine and actually a scenario we accept. This means the raspi can be 100% popped (the “only” consequence being reduced availability). Time is <em>not</em> chosen by the untrusted domain, the time reference (the GNSS) is directly fed into the trusted domain. The communication between those two domains is a simple serial line, with no framer (yolo), and fixed size packets (difficult, but not impossible, to screw up). The raspi cannot reflash the microcontroller; a human has to plug a programmer to the microcontroller.</p>]]></content><author><name></name></author><category term="misc" /><summary type="html"><![CDATA[Check out the different parts: Part I: protocol for a cryptographic desk clock Part II: time reference (this page) Part II: agreement protocols … sometime in the future … client implementation]]></summary></entry><entry><title type="html">A cryptographic desk clock: time transfer protocol</title><link href="/oscar/misc/desk-clock-protocol" rel="alternate" type="text/html" title="A cryptographic desk clock: time transfer protocol" /><published>2022-08-01T00:00:00+00:00</published><updated>2022-08-01T00:00:00+00:00</updated><id>/oscar/misc/roughtime</id><content type="html" xml:base="/oscar/misc/desk-clock-protocol"><![CDATA[<p>Check out the different parts:</p>
<ul>
  <li>Part I: <strong>protocol for a cryptographic desk clock</strong> (this page)</li>
  <li>Part II: <strong><a href="/oscar/misc/time-reference-home">time reference</a></strong></li>
  <li>Part II: <strong><a href="/oscar/misc/quorum-clock">agreement protocols</a></strong></li>
  <li>… sometime in the future … client implementation</li>
</ul>

<p>Slightly annoying fact: clocks are normally synced with a protocol (NTP - Network Time Protocol) that provides <strong>zero</strong> <strong>cryptographic guarantees</strong>. On the surface this is totally fine, but when we start composing security-sensitive protocols (think certificate expiration checks, audit logs or time-bound credentials), we need to turn a blind eye…</p>

<p>Not today! let’s build a simple <strong>cryptographic desk clock</strong> to set the House Standard Time in proper cryptographic fashion.
Here is the limited edition, mission-critical clock sitting in my home office:</p>

<p><img src="/assets/reference/3D31F19A-935D-447B-A217-57A42699BD3C.jpeg" alt="" /></p>

<h3 id="protocol">Protocol</h3>

<p>The protocol we’ll be using for clock synchronization is <em>roughtime</em>. This is a pretty new protocol introduced in by Adam Langley (Google). It’s pretty barebones, and simple enough you can write a client in an evening — so much fun.</p>

<p><em>roughtime</em> is essentially a challenge-response protocol. The client sends a random 32-byte challenge to the server, which replies with a signed message containing the client challenge and the current time. (There’s some machinery to make this efficient, like packing requests in a Merkle tree, but this isn’t important now.)</p>

<p>Since you’re probably familiar with NTP, here are the key differences:</p>

<table>
  <thead>
    <tr>
      <th> </th>
      <th>Roughtime</th>
      <th>NTP</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Target time precision</td>
      <td>Coarse: ~seconds. Fine for human consumption, certificate expiration</td>
      <td>Excellent. Sub-second</td>
    </tr>
    <tr>
      <td>Security model</td>
      <td>Excellent. Architecture tolerates a few actively malicious servers, no SPOF. Network is considered untrusted: packets are cryptographically protected</td>
      <td>Bad. Network is assumed trusted (there are some extensions for adding a secure transport layer)</td>
    </tr>
    <tr>
      <td>Protocol maturity</td>
      <td>Very new, although very simple and solid</td>
      <td>Golden standard</td>
    </tr>
    <tr>
      <td>Software/libraries availability</td>
      <td>Bad. DIY land, mostly one-off efforts that get abandoned. Mostly targeting beefy machines, almost nothing for embedded targets</td>
      <td>Excellent, very mature. Reference implementation has been maintained for 20 years, multiple implementations</td>
    </tr>
    <tr>
      <td>Software/libraries quality</td>
      <td>Good. Mostly written in memory safe languages</td>
      <td>Vary a lot</td>
    </tr>
    <tr>
      <td>Server ecosystem maturity</td>
      <td>Very bad. Highly concentrated, few players. Main developer (Google) seems to have abandoned development. Cloudflare runs servers. Google is still offering this without any availability guarantees</td>
      <td>Very good. There are global pools of NTP servers.</td>
    </tr>
    <tr>
      <td>Who uses it? Current deployments</td>
      <td>??? probably a few nerds, unclear</td>
      <td>Virtually everywhere. time.apple.com, etc</td>
    </tr>
    <tr>
      <td>Client code complexity</td>
      <td>apart from crypto, easy</td>
      <td>very easy if high precision isn’t required, high otherwise. Reference implementation is extremely complex 100k LoC of C</td>
    </tr>
    <tr>
      <td>Suitable for embedded?</td>
      <td>Suboptimal. The public-key signature scheme is ed25519, which is notoriously RAM-hungry, plus needs SHA512 code</td>
      <td>yes</td>
    </tr>
    <tr>
      <td>History, context, motivation</td>
      <td>Introduced in 2016 by Adam Langley (Google) to solve Google needs. Little public involvement after initial release</td>
      <td>Introduced around 1985 by Dave L. Mills, who continued improving it on a life-long effort</td>
    </tr>
    <tr>
      <td>Algorithms</td>
      <td>Marzullo, ed25519, Merkle trees</td>
      <td>Marzullo, phase-locked loop</td>
    </tr>
  </tbody>
</table>]]></content><author><name></name></author><category term="misc" /><summary type="html"><![CDATA[Check out the different parts: Part I: protocol for a cryptographic desk clock (this page) Part II: time reference Part II: agreement protocols … sometime in the future … client implementation]]></summary></entry><entry><title type="html">nonce-sanitizer: using authenticated encryption without fear</title><link href="/oscar/misc/nonce-sanitizer.html" rel="alternate" type="text/html" title="nonce-sanitizer: using authenticated encryption without fear" /><published>2022-07-28T00:00:00+00:00</published><updated>2022-07-28T00:00:00+00:00</updated><id>/oscar/misc/nonce-sanitizer</id><content type="html" xml:base="/oscar/misc/nonce-sanitizer.html"><![CDATA[<p>This post describes <code class="language-plaintext highlighter-rouge">nonce-sanitizer</code>: a <strong>very simple tool</strong> that prevents the major screw-up everyone is scared to make: 😱 repeated nonces under the same key 😱. In short, <code class="language-plaintext highlighter-rouge">nonce-sanitizer</code> provides seat belts as a thin wrapper around the AEAD code that adds a hard asserd that nonces don’t repeat.</p>

<p><strong>But that’s such a n00b’s mistake!</strong> If you think “I’m a good programmer, I won’t ever make that mistake”, think harder. Even the <a href="https://www.daemonology.net/blog/2011-01-18-tarsnap-critical-security-bug.html">crypto grandmasters make this mistake</a>. It can happen to anyone anytime —even during an unrelated refactor. Do you have tests that would catch this bug? Do they run automatically every time you touch code?</p>

<h2 id="working-principle">Working principle</h2>

<p><code class="language-plaintext highlighter-rouge">nonce-sanitizer</code> implements an AEAD interface. The input/output behavior is identical to the AEAD mode selected. (Currently, ChaCha20Poly1305 and AES-GCM.) In addition, it will check in the background if the nonce passed is sane, and bail if it isn’t. For this, <code class="language-plaintext highlighter-rouge">nonce-sanitizer</code> keeps the passed nonces in an internal state.
The current definition of sane is: the same combination (key, nonce) hasn’t been passed before.<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup></p>

<p>Functionality-wise, this tool internally looks something like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> func encryptAEAD_NonceSanitizer(key, nonce, plaintext) -&gt; ciphertext {
    if isRepeatedNonce(key, nonce, plaintext) {
       bail();
    }
    ciphertext = encryptAEAD(key, nonce, plaintext)
 }
</code></pre></div></div>

<h2 id="developer-ergonomics">Developer ergonomics</h2>

<p>To use <code class="language-plaintext highlighter-rouge">nonce-sanitizer</code>, you just replace the calls to AEAD encrypt with the implementation provided by <code class="language-plaintext highlighter-rouge">nonce-sanitizer</code>. Everything else should just work. This is 100% backwards compatible —the encryption behavior remains identical. There’s no need to bump any protocol version.</p>

<ul>
  <li>In the happy path, if your nonces behave, one you set this up, you can forget about it. So this is kind of additive security precaution</li>
  <li>In the sad case (naughty nonces), <code class="language-plaintext highlighter-rouge">nonce-sanitizer</code> will bail and prevent the confidentiality loss.</li>
</ul>

<p>There’s no need to configure anything. Performance-wise, <code class="language-plaintext highlighter-rouge">nonce-sanitizer</code> takes a small hit. Check if this is significant in your application.</p>

<h2 id="golang-implementation">Golang implementation</h2>

<p>A PoC in golang is available at <a href="https://github.com/oreparaz/go-nonce-sanitizer">https://github.com/oreparaz/go-nonce-sanitizer</a>. This implementation wraps <code class="language-plaintext highlighter-rouge">golang.org/x/crypto/chacha20poly1305</code>.</p>

<p><strong>How to use it</strong>. The interface transparently wraps the AEAD mode, so you can use it as the AEAD mode. This is the single-line code modification needed to add <code class="language-plaintext highlighter-rouge">nonce-sanitizer</code> to <a href="https://github.com/FiloSottile/age">age</a>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>diff --git a/internal/stream/stream.go b/internal/stream/stream.go
index 7cf02c4..bc8a321 100644
--- a/internal/stream/stream.go
+++ b/internal/stream/stream.go
@@ -11,7 +11,7 @@ import (
        "fmt"
        "io"

-       "golang.org/x/crypto/chacha20poly1305"
+       "github.com/oreparaz/go-nonce-sanitizer/chacha20poly1305"
        "golang.org/x/crypto/poly1305"
 )
</code></pre></div></div>

<p><strong>Raw performance</strong>. On a GCP ec2-micro instance:</p>

<ul>
  <li>for long packets (8 kB), the overhead is small (less than 10% loss of throughput)</li>
  <li>for IP packets (1350 bytes), the overhead is about 33% less throughput</li>
  <li>for very short packets (64 bytes), the overhead is around 3x slowdown</li>
</ul>

<p>Raw data: output of <code class="language-plaintext highlighter-rouge">go test -bench=.</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>goos: linux
goarch: amd64
pkg: github.com/oreparaz/go-nonce-sanitizer/chacha20poly1305
cpu: Intel(R) Xeon(R) CPU @ 2.20GHz
Benchmark/Seal-WithoutNonceSanitizer-64-2     293.65 MB/s       0 B/op     0 allocs/op
Benchmark/Seal-WithNonceSanitizer-64-2        109.59 MB/s      49 B/op     2 allocs/op
Benchmark/Seal-WithoutNonceSanitizer-1350-2  1147.98 MB/s       0 B/op     0 allocs/op
Benchmark/Seal-WithNonceSanitizer-1350-2      857.35 MB/s      50 B/op     2 allocs/op
Benchmark/Seal-WithoutNonceSanitizer-8192-2  1407.33 MB/s       0 B/op     0 allocs/op
Benchmark/Seal-WithNonceSanitizer-8192-2     1323.52 MB/s      55 B/op     2 allocs/op
PASS
ok      github.com/oreparaz/go-nonce-sanitizer/chacha20poly1305      8.727s
</code></pre></div></div>

<p><strong>Performance of age</strong>. In a file encryption application like <a href="https://github.com/FiloSottile/age">age</a>,
the overhead is imperceptible to the human eye. With instrumentation, encryption of a 650 MB file takes 2.0 seconds. Without instrumentation, it takes 1.9 seconds.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># with instrumentation
$ time ./age -r age1ql3z7hjy54pw3hyww5ayyfg7zqgvc7w3j2elw8zmrj2kg5sfn9aqmcac8p &lt; /tmp/junk &gt; /tmp/junk.age

real	0m2.082s
user	0m0.645s
sys	0m1.001s

# without instrumentation
$ time ./age -r age1ql3z7hjy54pw3hyww5ayyfg7zqgvc7w3j2elw8zmrj2kg5sfn9aqmcac8p &lt; /tmp/junk &gt; /tmp/junk.age

real	0m1.969s
user	0m0.605s
sys	0m0.944s
</code></pre></div></div>

<h2 id="faq">FAQ</h2>

<p><strong>When should I use it? Is it good for me?</strong> If you are picking nonces, <code class="language-plaintext highlighter-rouge">nonce-sanitizer</code> is for you. If your library picks nonces for you, you’re fine.<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup></p>

<p><strong>I’m using libsodium, do I have to worry?</strong> You can use a well-established library like libsodium or tink and still misuse nonces. So refer to the previous point: if you’re picking nonces, yes, you can use <code class="language-plaintext highlighter-rouge">nonce-sanitizer</code>.</p>

<p><strong>Why should I <em>not</em> use <code class="language-plaintext highlighter-rouge">nonce-sanitizer</code>?</strong> Perhaps if you’re struggling for performance, or are very tight on RAM. But really, so many people have tripped here before, so think twice.</p>

<p><strong>Will this eat all my RAM?</strong>. The internal state that stores nonces grows as more nonces are passed. This grows until a threshold is hit, and from there on old nonces are discarded. So the RAM usage is capped and won’t grow unbounded. This is a tunable parameter.</p>

<p><strong>Is it going to ruin performance?</strong> There are too many different applications of AEAD to make general statements about the impact of this instrumentation. In general, <code class="language-plaintext highlighter-rouge">nonce-sanitizer</code> has small impact in client-side code (where an increased memory usage is tolerable), or applications that run in human time (and aren’t affected by slight increase in latency). Busy servers are trickier, but probably acceptable for debug builds/deployments.</p>

<p><strong>I’m using random nonces, I don’t need this?</strong> You can still screw it up, see <a href="Google's ECDSA bug in the chromebook's embedded controller for U2F">https://chromium-review.googlesource.com/c/chromiumos/platform/ec/+/1592990</a> (<a href="vuln page">https://www.chromium.org/chromium-os/u2f-ecdsa-vulnerability/</a>). Would you notice in that case?</p>

<p><strong>Tell me about misuse-resistant modes</strong>. You should use them! But sometimes it’s not easy to retrofit those.</p>

<p><strong>Why haven’t I heard about this before?</strong>
That’s a good question. We can only speculate. This isn’t rocket science, but probably keeping track of all nonces was too expensive. But today memory in phones and computers is cheap.</p>

<p><strong>Inspiration</strong>. This tool takes inspiration from memory sanitizers for non-memory safe languages. Memory sanitizers shadow every byte of memory from your program to hopefully detect memory misuse before they become a real problem. Thanks to tools like AddressSanitizer or Valgrind we can write C and not stress too much about it. Tools help us sleep well at night. (Note that <code class="language-plaintext highlighter-rouge">nonce-sanitizer</code> applies both to non-memory safe <em>and</em> memory safe languages – memory safety doesn’t have anything to do with misusing nonces.)</p>

<h2 id="limitations">Limitations</h2>

<p>The tool keeps some state in RAM. The tool won’t detect all nonce collisions since this state is pruned from time to time.</p>

<h2 id="extensions">Extensions</h2>

<p>I’d love to see <code class="language-plaintext highlighter-rouge">nonce-sanitizer</code> in other languages, or integrated into existing libraries. I’m not planning to work on this but ping me if you want some pointers to work on this.</p>

<h2 id="optimizations--internals">Optimizations / internals</h2>

<p>The following design decisions might be useful if you want to reimplement this:</p>

<ul>
  <li><em>Should I store the cryptographic key itself?</em> You can make all the data structures secret-free by storing a hash of the key if you need it. Unsure if this is worth it if the key lives in the same memory space</li>
  <li>
    <p><em>Should I include plaintexts also in the map?</em> The only reason for this is to not have false alerts where the same (key, nonce, plaintext) is passed. This isn’t a violation of AEAD usage —you’re performing exactly the same computation</p>
  </li>
  <li><em>What data structure should we use for storing nonces?</em> It just should be very fast. In golang we resort to a hash map. This PoC implementation can be for sure optimized; PRs are welcome.</li>
  <li><em>What prunning strategy should we use?</em> Right now we store the last 1000 nonces, and prune the cache from time to time, so that memory usage doesn’t grow unbounded. You can use a circular buffer to have consistent performance, tbd what the sweet thresholds are.
    <ul>
      <li>You <em>could</em> probably use <em>distinguished points</em> to (probabilistically) detect collisions without enormous amounts of memory. (Keeping track only of those nonces with a certain prefix.) Feels like a combined method would be a good choice: keep the last N nonces and keep the last N “distinguished” nonces.</li>
    </ul>
  </li>
  <li><em>Should we compress IVs?</em> too fancy for a V1.0</li>
  <li><em>Should we check for repeated nonces upon receiving?</em> Not duplicating nonces is a <em>sender</em> responsability (as the <em>sender</em> will bear the consequences of the confidentiality loss), but it seems cheap enough to also do it on the receiving side. This might help spot buggy implementation <em>on the remote peer</em>.</li>
</ul>
<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>There’s an exception for this: same (key, nonce) combinations are allowed if the plaintext is the same. This exception isn’t implemented yet. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p>There are cases (like TLS 1.3) in which a buggy implementation that reuses nonces just won’t work at all (won’t be interoperable) because the protocol mandates an implicit nonce (such as a sequence number). This is a good design principle that <em>by design</em> makes reusing nonces very difficult. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="misc" /><summary type="html"><![CDATA[This post describes nonce-sanitizer: a very simple tool that prevents the major screw-up everyone is scared to make: 😱 repeated nonces under the same key 😱. In short, nonce-sanitizer provides seat belts as a thin wrapper around the AEAD code that adds a hard asserd that nonces don’t repeat.]]></summary></entry></feed>