Set 6: RSA and DSA¶

This is the last of our original crypto challenges.

This set exclusively covers number-theoretic cryptography, and, in particular, RSA and DSA.

This set is hard. The concepts are again new. The attacks involve some math --- but nothing you didn't learn in 9th grade --- and a significant amount of programming.

But they're worth it. Two of these attacks in particular are among the most valuable in real-world cryptography.

  • Preliminaries
  • Challenge 41: Implement unpadded message recovery oracle
  • Challenge 42: Bleichenbacher's e=3 RSA Attack
  • Challenge 43: DSA key recovery from nonce
  • Challenge 44: DSA nonce recovery from repeated nonce
  • Challenge 45: DSA parameter tampering
  • Challenge 46: RSA parity oracle
  • Challenge 47: Bleichenbacher's PKCS 1.5 Padding Oracle (Simple Case)
  • Challenge 48: Bleichenbacher's PKCS 1.5 Padding Oracle (Complete Case)

Preliminaries¶

In [1]:
from base64 import b64decode
from collections import namedtuple
from fractions import Fraction
from itertools import chain, combinations
from random import randint

# From pyca/cryptography
from cryptography.hazmat.primitives.asymmetric import rsa
from cryptography.hazmat.primitives.asymmetric.padding import PKCS1v15
from cryptography.hazmat.primitives.hashes import Hash, SHA1

def i2b(n):
    # Integer to bytes
    num_bytes = 1
    while True:
        try:
            return n.to_bytes(num_bytes, "big")
        except OverflowError:
            num_bytes += 1

def b2i(b):
    # Bytes to integer
    return int.from_bytes(b, "big")

def rsa_encrypt_unpadded(ptext, public_key):
    e = public_key.public_numbers().e
    n = public_key.public_numbers().n
    p = b2i(ptext)
    assert p < n
    return i2b(pow(p, e, n))

def rsa_decrypt_unpadded(ctext, private_key):
    d = private_key.private_numbers().d
    n = private_key.public_key().public_numbers().n
    return i2b(pow(b2i(ctext), d, n))

def modinv(a, n):
    # Inverse of a modulo n (extended Euclidean algorithm)
    t, new_t = 0, 1
    r, new_r = n, a
    while new_r != 0:
        q = r//new_r
        t, new_t = new_t, t-q*new_t
        r, new_r = new_r, r-q*new_r
    if r > 1:
        raise ZeroDivisionError("no modular inverse")
    if t < 0:
        t += n
    return t

def sha1(message):
    h = Hash(SHA1())
    h.update(message)
    return h.finalize()

def ceil(a, b):
    # Return ceil(a/b) where integer arguments may be large
    return (a+b-1)//b

Challenge 41: Implement unpadded message recovery oracle¶

Nate Lawson says we should stop calling it "RSA padding" and start calling it "RSA armoring". Here's why.

Imagine a web application, again with the Javascript encryption, taking RSA-encrypted messages which (again: Javascript) aren't padded before encryption at all.

You can submit an arbitrary RSA blob and the server will return plaintext. But you can't submit the same message twice: let's say the server keeps hashes of previous messages for some liveness interval, and that the message has an embedded timestamp:

{
  time: 1356304276,
  social: '555-55-5555'
}

You'd like to capture other people's messages and use the server to decrypt them. But when you try, the server takes the hash of the ciphertext and uses it to reject the request. Any bit you flip in the ciphertext irrevocably scrambles the decryption.

This turns out to be trivially breakable:

  • Capture the ciphertext C
  • Let N and E be the public modulus and exponent respectively
  • Let S be a random number > 1 mod N. Doesn't matter what.
  • Now:
    C' = ((S**E mod N) C) mod N
    
  • Submit C', which appears totally different from C, to the server, recovering P', which appears totally different from P
  • Now:
          P'
    P = -----  mod N
          S
    

Oops!

Implement that attack.

Careful about division in cyclic groups.

Remember: you don't simply divide mod N; you multiply by the multiplicative inverse mod N. So you'll need a modinv() function.


With RSA, given plaintext $P$, we encrypt $C = P^e \bmod N$ and decrypt $P = C^d \bmod N$. This works because the basic RSA identity is $P^{ed} \equiv P \bmod N$.

For this exploit, $S$ need not be a random number; it just need be relatively prime to $N$. The mathematics work out as

$$ \begin{eqnarray*} C' &=& S^e \, C \bmod N \\ P' &=& (C')^d = (S^e \, C)^d = S^{ed} \, C^d = S \, C^d = SP \bmod N \end{eqnarray*} $$

which is how we arrive at $P = P'/S \bmod N$.

Background Write $N = pq$ as a product of primes. The identity $P^{ed} \equiv P \bmod N$ makes it look like $e$ and $d$ are some kind of inverses modulo $N$. But $d$ was chosen such that $e$ and $d$ are multiplicative inverses modulo $(p-1)(q-1)$, i.e., $ed \equiv 1 \bmod (p-1)(q-1)$. What's going on here? The RSA proof goes as follows. First, for the case when $P$ is coprime to $N$, because $N = pq$ and $p$ and $q$ are prime, Euler's totient function $\varphi(N) = (p-1)(q-1)$. Thus we can write $ed \equiv 1 \bmod \varphi(N)$, or equivalently, $ed = 1 + k\varphi(N)$ for some $k$. By Euler's theorem, because $P$ is coprime to $N$, $P^{\varphi(N)} \equiv 1 \bmod N$. Then

$$ P^{ed} = P^{1+k\varphi(N)} = P \, (P^{\varphi(N)})^k \equiv P \bmod N $$

The case when $P$ is a multiple of $p$ or $q$ (it can't be both, as $P < N$) is handled fairly easily, see RSA Theory.

In [2]:
key = rsa.generate_private_key(3, 1024)
plaintext = b"{time: 1356304276, social: '555-55-5555'}"
ciphertext = rsa_encrypt_unpadded(plaintext, key.public_key())

def recover_by_multiplying(ctext, public_key, decrypt_fn):
    S = 2  # arbitrary
    n = public_key.public_numbers().n
    e = public_key.public_numbers().e
    C_p = (pow(S, e, n)*b2i(ctext))%n
    P_p = b2i(decrypt_fn(i2b(C_p)))
    return i2b((P_p*modinv(S, n))%n)

recover_by_multiplying(
    ciphertext,
    key.public_key(),
    lambda ctext: rsa_decrypt_unpadded(ctext, key)
)
Out[2]:
b"{time: 1356304276, social: '555-55-5555'}"

How would padding help here? Understand that the purpose and form of RSA padding is entirely different from block cipher padding. In the latter, PKCS#7 padding is known and predictable (e.g., 0x030303) and is needed only for technical reasons. By contrast, RSA padding is random and serves to obscure the mathematical simplicity of the algorithm. PKCS#1 v1.5 padding prepends to the plaintext:

  • bytes 0x00 and 0x02;
  • a number of nonzero random bytes, but at least 8 bytes, such that the total message length is the same as that of the modulus; and
  • a terminating 0x00 byte.

Multiplying plaintext thusly padded by $S$ will tend to destroy that structure. (But, given modular arithmetic, isn't it possible that a carefully crafted $S$ could recreate the right delimiting bytes in the right places?) In any case, we can confirm the protection afforded by padding by repeating the above using regular RSA encryption. Note that because the padding is random, every execution will differ.

In [3]:
ciphertext = key.public_key().encrypt(plaintext, PKCS1v15())

result = recover_by_multiplying(
    ciphertext,
    key.public_key(),
    lambda ctext: key.decrypt(ctext, PKCS1v15())
)

result == plaintext
Out[3]:
False

Challenge 42: Bleichenbacher's e=3 RSA Attack¶

Crypto-tourism informational placard.

This attack broke Firefox's TLS certificate validation several years ago. You could write a Python script to fake an RSA signature for any certificate. We find new instances of it every other year or so.

RSA with an encrypting exponent of 3 is popular, because it makes the RSA math faster.

With e=3 RSA, encryption is just cubing a number mod the public encryption modulus:

c = m ** 3 % n

e=3 is secure as long as we can make assumptions about the message blocks we're encrypting. The worry with low-exponent RSA is that the message blocks we process won't be large enough to wrap the modulus after being cubed. The block 00:02 (imagine sufficient zero-padding) can be "encrypted" in e=3 RSA; it is simply 00:08.

When RSA is used to sign, rather than encrypt, the operations are reversed; the verifier "decrypts" the message by cubing it. This produces a "plaintext" which the verifier checks for validity.

When you use RSA to sign a message, you supply it a block input that contains a message digest. The PKCS1.5 standard formats that block as:

00h 01h ffh ffh ... ffh ffh 00h ASN.1 GOOP HASH

As intended, the ffh bytes in that block expand to fill the whole block, producing a "right-justified" hash (the last byte of the hash is the last byte of the message).

There was, 7 years ago, a common implementation flaw with RSA verifiers: they'd verify signatures by "decrypting" them (cubing them modulo the public exponent) and then "parsing" them by looking for 00h 01h ... ffh 00h ASN.1 HASH.

This is a bug because it implies the verifier isn't checking all the padding. If you don't check the padding, you leave open the possibility that instead of hundreds of ffh bytes, you have only a few, which if you think about it means there could be squizzilions of possible numbers that could produce a valid-looking signature.

How to find such a block? Find a number that when cubed (a) doesn't wrap the modulus (thus bypassing the key entirely) and (b) produces a block that starts "00h 01h ffh ... 00h ASN.1 HASH".

There are two ways to approach this problem:

  • You can work from Hal Finney's writeup, available on Google, of how Bleichenbacher explained the math "so that you can do it by hand with a pencil".
  • You can implement an integer cube root in your language, format the message block you want to forge, leaving sufficient trailing zeros at the end to fill with garbage, then take the cube-root of that block.

Forge a 1024-bit RSA signature for the string "hi mom". Make sure your implementation actually accepts the signature!


ASN.1 is a standard binary syntax for serializing structured data, and technically the hash is part of the ASN.1 packet. The aforementioned "goop" is the prefix that describes what and how many bytes come next, i.e., the hash. It would be understandable (though not entirely blameless) if an application that parses ASN.1 neglects to look at anything after a self-describing and self-delimiting ASN.1 packet.

Hal Finney's writeup is here. We use the second approach, where the key word is "garbage." We don't need to find a number that, when cubed, matches a given, entire padded signature. Assuming that an ASN.1-parsing application does not look beyond the ASN.1 packet, the number cubed need match only up to and including the packet; the remaining bytes can be anything that makes the cube root work. That means, at minimum, if we use only a single 0xFF byte and SHA1 for the hash, matching just 39 bytes: the initial 0x0001FF00, a 15-byte ASN.1 prefix, and a 20-byte SHA1 hash. The solution below does just that. Minimizing what must be matched is critical for this to work. Our initial approach of using SHA256, which required matching the first 55 bytes, failed unless we used a 2,048-bit modulus.

Below we use RSA to sign our string, then decrypt the signature and extract the ASN.1 prefix and SHA1 hash. These are just conveniences; we could in principle construct both ourselves. The ASN.1 packet decodes (courtesy of this service) as:

EncryptedPrivateKeyInfo SEQUENCE (2 elem)
    encryptionAlgorithm AlgorithmIdentifier SEQUENCE (2 elem)
        algorithm OBJECT IDENTIFIER 1.3.14.3.2.26 sha1 (OIW)
        parameters ANY NULL
    encryptedData EncryptedData OCTET STRING (20 byte) hash...

Note that our integer-to-byte conversion function does not preserve the leading 0x00 byte.

In [4]:
def approx_integer_cube_root(x):
    # For positive x, return the closest integer cube root
    def f(n):
        return n**3
    r = 1
    while f(r) <= x:
        r *= 2
    l = r//2
    while l <= r:
        m = (l+r)//2
        if f(m) < x:
            l = m+1
        elif f(m) > x:
            r = m-1
        else:
            return m
    return l if abs(f(l)-x) < abs(f(r)-x) else r

message = b"hi mom"
signature = key.sign(message, PKCS1v15(), SHA1())

# Extract ASN.1 packet from signature
# N.B.: leading 0x00 byte is missing
p = rsa_encrypt_unpadded(signature, key.public_key())
i = p.index(0x00)
assert p[0] == 0x01 and all(p[j] == 0xFF for j in range(1, i))
asn1 = p[i+1:]

# Forge a new signature from desired value + 0.5
p = bytes([0x00, 0x01, 0xFF, 0x00]) + asn1 + bytes([0x80])
p += bytes(len(signature)-len(p))
forged = i2b(approx_integer_cube_root(b2i(p)))

Unfortunately our OpenSSL-based implementation is too smart to be fooled by our forged signature. But in lieu of that, we follow the same procedure as before by extracting the ASN.1 packet from the signature, and verify that the forged and original packets match.

In [5]:
# As before, but with forged signature
p = rsa_encrypt_unpadded(forged, key.public_key())
i = p.index(0x00)
assert p[0] == 0x01 and all(p[j] == 0xFF for j in range(1, i))
forged_asn1 = p[i+1:i+1+len(asn1)]

forged_asn1 == asn1
Out[5]:
True

Challenge 43: DSA key recovery from nonce¶

Step 1: Relocate so that you are out of easy travel distance of us.

Step 2: Implement DSA, up to signing and verifying, including parameter generation.

Hah-hah you're too far away to come punch us.

Just kidding you can skip the parameter generation part if you want; if you do, use these params:

p = 800000000000000089e1855218a0e7dac38136ffafa72eda7
    859f2171e25e65eac698c1702578b07dc2a1076da241c76c6
    2d374d8389ea5aeffd3226a0530cc565f3bf6b50929139ebe
    ac04f48c3c84afb796d61e5a4f9a8fda812ab59494232c7d2
    b4deb50aa18ee9e132bfa85ac4374d7f9091abc3d015efc87
    1a584471bb1

q = f4f47f05794b256174bba6e9b396a7707e563c5b

g = 5958c9d3898b224b12672c0b98e06c60df923cb8bc999d119
    458fef538b8fa4046c8db53039db620c094c9fa077ef389b5
    322a559946a71903f990f1f7e0e025e2d7f7cf494aff1a047
    0f5b64c36b625a097f1651fe775323556fe00b3608c887892
    878480e99041be601a62166ca6894bdd41a7054ec89f756ba
    9fc95302291

("But I want smaller params!" Then generate them yourself.)

The DSA signing operation generates a random subkey "k". You know this because you implemented the DSA sign operation.

This is the first and easier of two challenges regarding the DSA "k" subkey.

Given a known "k", it's trivial to recover the DSA private key "x":

    (s * k) - H(msg)
x = ----------------  mod q
            r

Do this a couple times to prove to yourself that you grok it. Capture it in a function of some sort.

Now then. I used the parameters above. I generated a keypair. My pubkey is:

y = 84ad4719d044495496a3201c8ff484feb45b962e7302e56a392aee4
    abab3e4bdebf2955b4736012f21a08084056b19bcd7fee56048e004
    e44984e2f411788efdc837a0d2e5abb7b555039fd243ac01f0fb2ed
    1dec568280ce678e931868d23eb095fde9d3779191b8c0299d6e07b
    bb283e6633451e535c45513b2d33c99ea17

I signed

For those that envy a MC it can be hazardous to your health
So be friendly, a matter of life and death, just like a etch-a-sketch

(My SHA1 for this string was d2d0714f014a9784047eaeccf956520045c45265; I don't know what NIST wants you to do, but when I convert that hash to an integer I get: 0xd2d0714f014a9784047eaeccf956520045c45265).

I get:

r = 548099063082341131477253921760299949438196259240
s = 857042759984254168557880549501802188789837994940

I signed this string with a broken implemention of DSA that generated "k" values between 0 and 2^16. What's my private key?

Its SHA1 fingerprint (after being converted to hex) is:

0954edd5e0afe5542a4adf012611a91912a3ec16

Obviously, it also generates the same signature for that string.


DSA is defined in FIPS 186-4: Digital Signature Standard (DSS).

DSA employs randomness, but in a different way from RSA. As we saw above, RSA encryption adds randomness in the form of random padding bytes which can be structurally identified and removed during decryption. By contrast, DSA signing uses a per-signing random $k$ value that is directly incorporated into the mathematical computation. Perhaps this explains why a DSA signature is a pair $(r, s)$: $k$ factors into each component (in different ways), and through a fancy verification computation $k$ is effectively factored back out. An implication of this approach is that signing the same message with DSA results in a different signature every time.

In [6]:
DSAParams = namedtuple("DSAParams", "p q g")

params = DSAParams(
    p=int(
        "800000000000000089e1855218a0e7dac38136ffafa72eda7"
        "859f2171e25e65eac698c1702578b07dc2a1076da241c76c6"
        "2d374d8389ea5aeffd3226a0530cc565f3bf6b50929139ebe"
        "ac04f48c3c84afb796d61e5a4f9a8fda812ab59494232c7d2"
        "b4deb50aa18ee9e132bfa85ac4374d7f9091abc3d015efc87"
        "1a584471bb1",
        16
    ),
    q=int("f4f47f05794b256174bba6e9b396a7707e563c5b", 16),
    g=int(
        "5958c9d3898b224b12672c0b98e06c60df923cb8bc999d119"
        "458fef538b8fa4046c8db53039db620c094c9fa077ef389b5"
        "322a559946a71903f990f1f7e0e025e2d7f7cf494aff1a047"
        "0f5b64c36b625a097f1651fe775323556fe00b3608c887892"
        "878480e99041be601a62166ca6894bdd41a7054ec89f756ba"
        "9fc95302291",
        16
    )
)

def dsa_generate_keys(params):
    # Return private key, public key
    x = randint(1, params.q-1)
    y = pow(params.g, x, params.p)
    return x, y

def dsa_sign(message, params, x):
    while True:
        while True:
            k = randint(1, params.q-1)
            r = pow(params.g, k, params.p) % params.q
            if r != 0:
                break
        h = b2i(sha1(message))
        s = (modinv(k, params.q) * (h + x*r)) % params.q
        if s != 0:
            break
    return r, s

def dsa_verify(message, r, s, params, y):
    if r <= 0 or r >= params.q or s <= 0 or s >= params.q:
        return False
    w = modinv(s, params.q)
    h = b2i(sha1(message))
    u1 = (h*w) % params.q
    u2 = (r*w) % params.q
    v = (
        (pow(params.g, u1, params.p) * pow(y, u2, params.p))
        % params.p % params.q
    )
    return v == r

message = b"""For those that envy a MC it can be hazardous to your health
So be friendly, a matter of life and death, just like a etch-a-sketch
"""

# Try it out
x, y = dsa_generate_keys(params)
r, s = dsa_sign(message, params, x)
dsa_verify(message, r, s, params, y)
Out[6]:
True

Notice that in a DSA signature, $r = g^k \bmod p \bmod q$ is a function of $k$ only (other than the public, application-wide parameters), and thus if a repeated $r$ value is ever observed it's a good indication that $k$ values are being reused. If there is a small pool of possible $k$ values as in this challenge, this becomes a simple dictionary attack. We identify the private key by seeing which candidate key produces the given public key.

In [7]:
def recover_x(message, r, s, k, params):
    return (modinv(r, params.q) * (s*k - b2i(sha1(message)))) % params.q

given_y = int(
    "84ad4719d044495496a3201c8ff484feb45b962e7302e56a392aee4"
    "abab3e4bdebf2955b4736012f21a08084056b19bcd7fee56048e004"
    "e44984e2f411788efdc837a0d2e5abb7b555039fd243ac01f0fb2ed"
    "1dec568280ce678e931868d23eb095fde9d3779191b8c0299d6e07b"
    "bb283e6633451e535c45513b2d33c99ea17",
    16
)
given_r = 548099063082341131477253921760299949438196259240
given_s = 857042759984254168557880549501802188789837994940

# Rotate range just to make this notebook faster
for k in chain(range(2**14, 2**16), range(2**14)):
    x = recover_x(message, given_r, given_s, k, params)
    y = pow(params.g, x, params.p)
    if y == given_y:
        print(
            "Key found, SHA1 fingerprint is",
            sha1(i2b(x).hex().encode("ASCII")).hex()
        )
        break
Key found, SHA1 fingerprint is 0954edd5e0afe5542a4adf012611a91912a3ec16

Challenge 44: DSA nonce recovery from repeated nonce¶

Cryptanalytic MVP award.

This attack (in an elliptic curve group) broke the PS3. It is a great, great attack.

In this file find a collection of DSA-signed messages. (NB: each msg has a trailing space.)

These were signed under the following pubkey:

y = 2d026f4bf30195ede3a088da85e398ef869611d0f68f07
    13d51c9c1a3a26c95105d915e2d8cdf26d056b86b8a7b8
    5519b1c23cc3ecdc6062650462e3063bd179c2a6581519
    f674a61f1d89a1fff27171ebc1b93d4dc57bceb7ae2430
    f98a6a4d83d8279ee65d71c1203d2c96d65ebbf7cce9d3
    2971c3de5084cce04a2e147821

(using the same domain parameters as the previous exercise)

It should not be hard to find the messages for which we have accidentally used a repeated "k". Given a pair of such messages, you can discover the "k" we used with the following formula:

    (m1 - m2)
k = --------- mod q
    (s1 - s2)

9th Grade Math: Study It!

If you want to demystify this, work out that equation from the original DSA equations.

Basic cyclic group math operations want to screw you

Remember all this math is mod q; s2 may be larger than s1, for instance, which isn't a problem if you're doing the subtraction mod q. If you're like me, you'll definitely lose an hour to forgetting a paren or a mod q. (And don't forget that modular inverse function!)

What's my private key? Its SHA1 (from hex) is:

ca8f6f7c66fa362d40760d135b763eb8527d3d52

Both this challenge and the previous challenge exploit the computation of $s$:

$$ s = \frac{H + xr}{k} \bmod q $$

The previous challenge rewrote this equation to solve for $x$ given known $k$. Here, on the assumption that $k$ is repeated (and therefore $r$ is also repeated), we have

$$ \begin{eqnarray*} s_1 &=& (H_1 + xr)/k \bmod q \\ s_2 &=& (H_2 + xr)/k \bmod q \end{eqnarray*} $$

and from there

$$ k = \frac{H_1 - H_2}{s_1 - s_2} \bmod q $$

This quantity can be computed regardless whether $k$ was reused or not (as long as $s_1 \ne s_2$), so calling it $k$ is a slight misnomer as it may or may not actually be $k$. But if $k$ was reused, then it can be computed thusly, and from there, $x$ as before.

There's something odd with this challenge that we weren't able to figure out. The input file contains 11 signed messages, of which 3 pairs have the same $r$ value. Yet of those, only one pair reflects a reuse of $k$. How can that be? Recall that $r = g^k \bmod p \bmod q$. The uncomfortable conclusion we come to is that the other two pairs use different $k$ values, and get different values for $g^k \bmod p$, that are nevertheless equivalent modulo $q$.

In [8]:
given_y = int(
    "2d026f4bf30195ede3a088da85e398ef869611d0f68f07"
    "13d51c9c1a3a26c95105d915e2d8cdf26d056b86b8a7b8"
    "5519b1c23cc3ecdc6062650462e3063bd179c2a6581519"
    "f674a61f1d89a1fff27171ebc1b93d4dc57bceb7ae2430"
    "f98a6a4d83d8279ee65d71c1203d2c96d65ebbf7cce9d3"
    "2971c3de5084cce04a2e147821",
    16
)

DSASignedMessage = namedtuple("DSASignedMessage", "message r s hash")

lines = [l[:-1] for l in open("44.in").readlines()]
messages = [
    DSASignedMessage(
        msg.split(": ")[1].encode("ASCII"),
        int(r.split(": ")[1]),
        int(s.split(": ")[1]),
        int(m.split(": ")[1], 16)
    )
    for msg, s, r, m in [lines[i:i+4] for i in range(0, len(lines), 4)]
]

for m1, m2 in combinations(messages, 2):
    k = (modinv(m1.s-m2.s, params.q) * (m1.hash-m2.hash)) % params.q
    x = recover_x(m1.message, m1.r, m1.s, k, params)
    if pow(params.g, x, params.p) == given_y:
        print(
            "Key found, SHA1 fingerprint is",
            sha1(i2b(x).hex().encode("ASCII")).hex()
        )
        break
Key found, SHA1 fingerprint is ca8f6f7c66fa362d40760d135b763eb8527d3d52

Challenge 45: DSA parameter tampering¶

Take your DSA code from the previous exercise. Imagine it as part of an algorithm in which the client was allowed to propose domain parameters (the p and q moduli, and the g generator).

This would be bad, because attackers could trick victims into accepting bad parameters. Vaudenay gave two examples of bad generator parameters: generators that were 0 mod p, and generators that were 1 mod p.

Use the parameters from the previous exercise, but substitute 0 for "g". Generate a signature. You will notice something bad. Verify the signature. Now verify any other signature, for any other string.

Now, try (p+1) as "g". With this "g", you can generate a magic signature s, r for any DSA public key that will validate against any string. For arbitrary z:

r = ((y**z) % p) % q

      r
s =  --- % q
      z

Sign "Hello, world". And "Goodbye, world".


If $g = 0$ then the signing operation, per the DSA algorithm, goes into an infinite loop. But if it were somehow possible that $g = 0$, then public key $y = 0$ and every signature would have $r = 0$, and then every signature verification would succeed because $v = r = 0$. For the second problem, we're not sure what the above equations are about. If $g = p+1$, then $y = 1$ and every signature would have $r = 1$, and then every signature verification would succeed because $v = r = 1$.

(In DSA, not only must $g > 1$, but $g$ must be "a generator of the order $q$ in the multiplicative group of $\textrm{GF}(p)$.")

Challenge 46: RSA parity oracle¶

When does this ever happen?

This is a bit of a toy problem, but it's very helpful for understanding what RSA is doing (and also for why pure number-theoretic encryption is terrifying). Trust us, you want to do this before trying the next challenge. Also, it's fun.

Generate a 1024 bit RSA key pair.

Write an oracle function that uses the private key to answer the question "is the plaintext of this message even or odd" (is the last bit of the message 0 or 1). Imagine for instance a server that accepted RSA-encrypted messages and checked the parity of their decryption to validate them, and spat out an error if they were of the wrong parity.

Anyways: function returning true or false based on whether the decrypted plaintext was even or odd, and nothing else.

Take the following string and un-Base64 it in your code (without looking at it!) and encrypt it to the public key, creating a ciphertext:

VGhhdCdzIHdoeSBJIGZvdW5kIHlvdSBkb24ndCBwbGF5IGFyb3VuZCB3aXRoIHRoZSBGdW5reSBDb2xkIE1lZGluYQ==

With your oracle function, you can trivially decrypt the message.

Here's why:

  • RSA ciphertexts are just numbers. You can do trivial math on them. You can for instance multiply a ciphertext by the RSA-encryption of another number; the corresponding plaintext will be the product of those two numbers.
  • If you double a ciphertext (multiply it by (2**e)%n), the resulting plaintext will (obviously) be either even or odd.
  • If the plaintext after doubling is even, doubling the plaintext didn't wrap the modulus --- the modulus is a prime number. That means the plaintext is less than half the modulus.

You can repeatedly apply this heuristic, once per bit of the message, checking your oracle function each time.

Your decryption function starts with bounds for the plaintext of [0,n].

Each iteration of the decryption cuts the bounds in half; either the upper bound is reduced by half, or the lower bound is.

After log2(n) iterations, you have the decryption of the message.

Print the upper bound of the message as a string at each iteration; you'll see the message decrypt "hollywood style".

Decrypt the string (after encrypting it to a hidden private key) above.


In more detail, if plaintext $P = C^d \bmod n$, then

$$(2^e \, C)^d \bmod n = 2^{ed} \, C^d \bmod n = 2P \bmod n$$

Now, if $0 < P < n/2$, then $0 < 2P < n$ and $2P \bmod n = 2P$ is even. Otherwise, if $n/2 < P < n$, then $n < 2P < 2n$ and $2P \bmod n = 2P-n$ is odd because $n$ is odd. In this way the parity of $2P \bmod n$ indicates in which half of $[0,n]$ $P$ resides. Next consider $4P$:

  • If $0 < P < n/4$, then $0 < 4P < n$ and $4P \bmod n = 4P$ is even.
  • If $n/4 < P < n/2$, then $n < 4P < 2n$ and $4P \bmod n = 4P-n$ is odd.
  • If $n/2 < P < 3n/4$, then $2n < 4P < 3n$ and $4P \bmod n = 4P-2n$ is again even.
  • And if $3n/4 < P < n$, then $3n < 4P < 4n$ and $4P \bmod n = 4P-3n$ is again odd.

Combining the $2P$ and $4P$ tests, we can determine in which fractional quarter of $[0,n]$ $P$ resides, and so on iteratively.

The loop below executes 1,024 times corresponding to the modulus bit length. We attempted to perform the search using integer bounds, but found that after 1,018 iterations the bounds drifted away from the solution, perhaps because enough roundoff error had accumulated to cause the bounds to deviate from the neat fractional power-of-2 subdivisions of $[0,n]$. Using fractions instead of integers solves the problem. Because $n$ is coprime to $2$, the bounds will always be pure fractions. The loop terminates when the bounds differ by less than a unit, i.e., the bounds straddle the integer solution which is retrievable by taking the floor of the upper or right bound.

In [9]:
from IPython.display import HTML

plaintext = b64decode(
    "VGhhdCdzIHdoeSBJIGZvdW5kIHlvdSBkb24ndCBwbGF5IGFyb3"
    "VuZCB3aXRoIHRoZSBGdW5reSBDb2xkIE1lZGluYQ=="
)
ciphertext = rsa_encrypt_unpadded(plaintext, key.public_key())

def display_line(line):
    rp = repr(plaintext)
    i = 0
    while i < len(rp):
        if rp[i] != line[i]:
            break
        i += 1
    return line[:i] + "<span style='color: red'>" + line[i:] + "</span>"

n = key.public_key().public_numbers().n
e = key.public_key().public_numbers().e
l, r = Fraction(0), Fraction(n)
c = b2i(ciphertext)
i = 0
output = []
while r-l > 1:
    m = (l+r)/2
    c2e = (c*(2**e)) % n
    if b2i(rsa_decrypt_unpadded(i2b(c2e), key)) % 2 == 0:
        r = m
    else:
        l = m
    c = c2e
    if i >= 1000:
        output.append(display_line(repr(i2b(int(r)))))
    i += 1

display(HTML("<code>" + "\n".join(output) + "</code>"))
b"That's why I found you don't play around with the Funky Cold Med\xbeu\x92" b"That's why I found you don't play around with the Funky Cold Med\x82\xca'" b"That's why I found you don't play around with the Funky Cold Med\x82\xca'" b"That's why I found you don't play around with the Funky Cold Meds\xdfM" b"That's why I found you don't play around with the Funky Cold Medli\xe0" b"That's why I found you don't play around with the Funky Cold Medli\xe0" b"That's why I found you don't play around with the Funky Cold Medj\x8c\x84" b"That's why I found you don't play around with the Funky Cold Medi\x9d\xd7" b"That's why I found you don't play around with the Funky Cold Medi\x9d\xd7" b"That's why I found you don't play around with the Funky Cold Medi\x9d\xd7" b"That's why I found you don't play around with the Funky Cold Medi\x80\x01" b"That's why I found you don't play around with the Funky Cold Mediq\x16" b"That's why I found you don't play around with the Funky Cold Mediq\x16" b"That's why I found you don't play around with the Funky Cold Mediq\x16" b"That's why I found you don't play around with the Funky Cold Medio9" b"That's why I found you don't play around with the Funky Cold Medio9" b"That's why I found you don't play around with the Funky Cold Medin\xc1" b"That's why I found you don't play around with the Funky Cold Medin\x86" b"That's why I found you don't play around with the Funky Cold Medinh" b"That's why I found you don't play around with the Funky Cold Medinh" b"That's why I found you don't play around with the Funky Cold Medinh" b"That's why I found you don't play around with the Funky Cold Medind" b"That's why I found you don't play around with the Funky Cold Medinb" b"That's why I found you don't play around with the Funky Cold Medina"

And for confirmation:

In [10]:
i2b(int(r)) == plaintext
Out[10]:
True

Challenge 47: Bleichenbacher's PKCS 1.5 Padding Oracle (Simple Case)¶

Degree of difficulty: moderate

These next two challenges are the hardest in the entire set.

This is Bleichenbacher from CRYPTO '98: Chosen Ciphertext Attacks Against Protocols Based on the RSA Encryption Standard PKCS #1.

Read the paper. It describes a padding oracle attack on PKCS#1v1.5. The attack is similar in spirit to the CBC padding oracle you built earlier; it's an "adaptive chosen ciphertext attack", which means you start with a valid ciphertext and repeatedly corrupt it, bouncing the adulterated ciphertexts off the target to learn things about the original.

This is a common flaw even in modern cryptosystems that use RSA.

It's also the most fun you can have building a crypto attack. It involves 9th grade math, but also has you implementing an algorithm that is complex on par with finding a minimum cost spanning tree.

The setup:

  • Build an oracle function, just like you did in the last exercise, but have it check for plaintext[0] == 0 and plaintext[1] == 2.
  • Generate a 256 bit keypair (that is, p and q will each be 128 bit primes), [n, e, d].
  • Plug d and n into your oracle function.
  • PKCS1.5-pad a short message, like "kick it, CC", and call it "m". Encrypt to to get "c".
  • Decrypt "c" using your padding oracle.

For this challenge, we've used an untenably small RSA modulus (you could factor this keypair instantly). That's because this exercise targets a specific step in the Bleichenbacher paper --- Step 2c, which implements a fast, nearly O(log n) search for the plaintext.

Things you want to keep in mind as you read the paper:

  • RSA ciphertexts are just numbers.
  • RSA is "homomorphic" with respect to multiplication, which means you can multiply c * RSA(2) to get a c' that will decrypt to plaintext * 2. This is mindbending but easy to see if you play with it in code --- try multiplying ciphertexts with the RSA encryptions of numbers so you know you grok it.
  • What you need to grok for this challenge is that Bleichenbacher uses multiplication on ciphertexts the way the CBC oracle uses XORs of random blocks.
  • A PKCS#1v1.5 conformant plaintext, one that starts with 00:02, must be a number between 02:00:00...00 and 02:FF:FF..FF --- in other words, 2B and 3B-1, where B is the bit size of the modulus minus the first 16 bits. When you see 2B and 3B, that's the idea the paper is playing with.

To decrypt "c", you'll need Step 2a from the paper (the search for the first "s" that, when encrypted and multiplied with the ciphertext, produces a conformant plaintext), Step 2c, the fast O(log n) search, and Step 3.

Your Step 3 code is probably not going to need to handle multiple ranges.

We recommend you just use the raw math from paper (check, check, double check your translation to code) and not spend too much time trying to grok how the math works.


Our OpenSSL-based implementation refuses to create such a small key, so we reprise our own RSA code from challenge 39 in set 5, and also add support for PKCS#1 v1.5 padding.

In [11]:
def is_prime(n, rounds=40):
    # Miller-Rabin primality test for n >= 2
    if n < 2: return False
    if n <= 3: return True
    if n%2 == 0: return False
    s, d = 0, n-1
    while d%2 == 0:
        s += 1
        d //= 2
    for _ in range(rounds):
        a = randint(2, n-1)
        x = pow(a, d, n)
        for _ in range(s):
            y = pow(x, 2, n)
            if y == 1 and x != 1 and x != n-1:
                return False
            x = y
        if y != 1:
            return False
    return True

def random_rsa_prime(num_bits, e):
    # Return prime p such that gcd(p-1, e) == 0
    n = randint(2**(num_bits-1), 2**num_bits-1)
    if n%2 == 0:
        n += 1
    while True:
        if (n-1)%e != 0:
            if is_prime(n):
                return n
        n += 2

def rsa_encrypt(e, n, ptext):
    # Accept bytes, return integer
    return pow(b2i(ptext), e, n)

def rsa_decrypt(d, n, ctext_val):
    # Accept integer, return bytes
    return i2b(pow(ctext_val, d, n))

def pad_pkcs1v15(ptext, length):
    n = length-len(ptext)-3
    assert n >= 8, "message too long; insufficient padding"
    return (
        bytes([0x00, 0x02])
        + bytes(randint(1, 255) for _ in range(n))
        + bytes([0x00])
        + ptext
    )

def check_pkcs1v15_padding(ptext, length):
    # Return true if padding is valid
    # Assumes no leading 0x00 byte
    if len(ptext) != length-1 or 0x00 not in ptext:
        return False
    i = ptext.index(0x00)
    return ptext[0] == 0x02 and all(ptext[j] != 0x00 for j in range(1, i))

def unpad_pkcs1v15(ptext):
    # Assumes no leading 0x00 byte and that padding is valid
    i = ptext.index(0x00)
    return ptext[i+1:]

Below is a complete solution for both this challenge and the next. The implementation is directly from Bleichenbacher's paper. We omit step 1, since we are starting from a message that is already PKCS conforming, and therefore in step 4 we do not need to undo any adjustment made in step 1.

The role of intervals in the algorithm is perplexing. If there is one interval we search within it (see step 2.c), in which case we care what that interval is. But if there are multiple intervals then they are ignored entirely (step 2.b); it matters not what the intervals actually are. A mystery. But how can multiple intervals be formed in the first place? Looking at the formula it doesn't seem possible, except perhaps (and presumably) as a result of the effects of discrete arithmetic due to dividing by $s$ and due to the floor and ceiling operations. Another mystery. In our observation of all runs of this code, a set of multiple intervals is formed exactly once in the algorithm's first loop, then one interval remains thereafter. Which is all to say, we're not really understanding the mathematics behind the algorithm here.

Running time of the following is highly variable, ranging from seconds to minutes.

In [12]:
# Set up global parameters
K = 256//8  # modulus size in bytes
E = 3
p = random_rsa_prime(K*4, E)
q = random_rsa_prime(K*4, E)
N = p*q
D = modinv(E, (p-1)*(q-1))

m = b"kick it, CC"
c = rsa_encrypt(E, N, pad_pkcs1v15(m, K))

def union_interval(intervals, a, b):
    # Merge closed interval [a, b] into list of disjoint intervals and
    # maintain disjointness.  List `intervals` is modified.
    for i, (ia, ib) in enumerate(intervals):
        if ia <= b+1 and a <= ib+1:
            del intervals[i]
            union_interval(intervals, min(a, ia), max(b, ib))
            return
    intervals.append((a, b))

def solve(c0):
    def pkcs_conforming(s):
        return check_pkcs1v15_padding(
            rsa_decrypt(D, N, (c0 * pow(s, E, N)) % N),
            K
        )
    B = 2**(8*(K-2))
    M = [(2*B, 3*B-1)]
    i = 1
    while True:
        # Step 2: Searching for PKCS conforming messages
        if i == 1:
            # Step 2.a: Starting the search
            s = ceil(N, 3*B)
            while not pkcs_conforming(s):
                s += 1
        elif len(M) >= 2:
            # Step 2.b: Searching with more than one interval left
            s += 1
            while not pkcs_conforming(s):
                s += 1
        elif len(M) == 1:
            # Step 2.c: Searching with one interval left
            a, b = M[0]
            r = ceil(2*(b*s - 2*B), N)
            found = False
            while not found:
                s = ceil(2*B + r*N, b)
                s_limit = ceil(3*B + r*N, a)
                while s < s_limit:
                    if pkcs_conforming(s):
                        found = True
                        break
                    s += 1
                r += 1
        # Step 3: Narrowing the set of solutions
        M_new = []
        for a, b in M:
            for r in range(ceil(a*s - 3*B + 1, N), (b*s - 2*B)//N + 1):
                l = max(a, ceil(2*B + r*N, s))
                u = min(b, (3*B - 1 + r*N)//s)
                union_interval(M_new, l, u)
        M = M_new
        # Step 4: Computing the solution
        if len(M) == 1 and M[0][0] == M[0][1]:
            return M[0][0]
        else:
            i += 1

unpad_pkcs1v15(i2b(solve(c)))
Out[12]:
b'kick it, CC'

Challenge 48: Bleichenbacher's PKCS 1.5 Padding Oracle (Complete Case)¶

Cryptanalytic MVP award

This is an extraordinarily useful attack. PKCS#1v15 padding, despite being totally insecure, is the default padding used by RSA implementations. The OAEP standard that replaces it is not widely implemented. This attack routinely breaks SSL/TLS.

This is a continuation of challenge #47); it implements the complete BB'98 attack.

Set yourself up the way you did in #47, but this time generate a 768 bit modulus.

To make the attack work with a realistic RSA keypair, you need to reproduce step 2b from the paper, and your implementation of Step 3 needs to handle multiple ranges.

The full Bleichenbacher attack works basically like this:

  • Starting from the smallest 's' that could possibly produce a plaintext bigger than 2B, iteratively search for an 's' that produces a conformant plaintext.
  • For our known 's1' and 'n', solve m1=m0s1-rn (again: just a definition of modular multiplication) for 'r', the number of times we've wrapped the modulus.
  • 'm0' and 'm1' are unknowns, but we know both are conformant PKCS#1v1.5 plaintexts, and so are between [2B,3B].
  • We substitute the known bounds for both, leaving only 'r' free, and solve for a range of possible 'r' values. This range should be small!
  • Solve m1=m0s1-rn again but this time for 'm0', plugging in each value of 'r' we generated in the last step. This gives us new intervals to work with. Rule out any interval that is outside 2B,3B.
  • Repeat the process for successively higher values of 's'. Eventually, this process will get us down to just one interval, whereupon we're back to exercise #47.

What happens when we get down to one interval is, we stop blindly incrementing 's'; instead, we start rapidly growing 'r' and backing it out to 's' values by solving m1=m0s1-rn for 's' instead of 'r' or 'm0'. So much algebra! Make your teenage son do it for you! *Note: does not work well in practice*


All that's needed for this challenge is to set K = 768//8 and re-run the cell above. However, it takes order 10 minutes to complete so we omit it here.