Analyzing and coping with a SSDP amplification DDoS attack

A while ago we were hit by an amplification/reflection DDoS attack against our public-facing network. I was familiar with NTP and DNS based reflection DDoS attacks, but this one employed the Simple Service Discovery Protocol (SSDP) to flood our tubes, a  name name I’ve heard before and saw in packet traces randomly, but hardly knew anything about to be honest.
SSDP is a UDP-based protocol for service discovery and UPnP functionality with an HTTP-like syntax. It’s deployed by modern operating systems and embedded systems like home routers, where it is sometimes enabled even on their external interfaces, which makes this kind of attack possible.

The Shadowserver Foundation has a nice website with lots of information and statistics of public SSDP-enabled devices: While the number of open or vulnerable DNS and NTP is going down steadily, there are currently around 14 million IPs around the world that respond to SSDP requests, and the number is only declining very slowly:

Due to this we can expect that SSDP will be abused for DDoS attacks more often in the future.

Details of the Attack

The attack came as a surprise on a Sunday night around 01:00 AM and lasted for approximately one hour. Fortunately this is well outside of our business hours, so it had no real impact. However, it did not stop there. There was another instance around Sunday noon that only lasted for 10 minutes.
On Monday, another short flood took place in the evening that I could experience live. Our Check Point Firewall showed a huge spike in the number of packets on the external interface. Normally we see about 130 logged connections per second during business hours, but here we have 36,000 per second.

Firewall logsI managed to grab a few sample packets during one of the attack windows. The packet’s destination IP was that of an web server of ours, which hosts our most popular site. The destination port was UDP/80 with the source port UDP/1900, meaning the attacker sent a query with source port 80 to the SSDP devices and they responded accordingly.
The packets were around 320-370 bytes in size and had the following payload, a typical response to an SSDP M-SEARCH request:

HTTP/1.1 200 OK
Cache-Control: max-age=120
EXT:
Location: http://192.168.0.1:65535/rootDesc.xml
Server: Linux/2.4.22-1.2115.nptl UPnP/1.0 miniupnpd/1.0
ST: urn:schemas-upnp-org:device:WANDevice:
USN: uuid:22222222-2222-2222-2222-222222222222::urn:schemas-upnp-org:device:WANDevice:

HTTP/1.1 200 OK
Cache-Control: max-age=120
EXT:
Location: http://192.168.10.1:65535/rootDesc.xml
Server: Linux/2.4.22-1.2115.nptl UPnP/1.0 miniupnpd/1.0
ST: urn:schemas-upnp-org:service:Layer3Forwarding:
USN: uuid:11111111-1111-1111-1111-111111111111::urn:schemas-upnp-org:service:Layer3Forwarding:

HTTP/1.1 200 OK
Cache-Control: max-age=120
EXT:
Location: http://192.168.10.1:65535/rootDesc.xml
Server: Linux/2.4.22-1.2115.nptl UPnP/1.0 miniupnpd/1.0
ST: urn:schemas-upnp-org:service:WANPPPConnection:
USN: uuid:33333333-3333-3333-3333-333333333333::urn:schemas-upnp-org:service:WANPPPConnection:

HTTP/1.1 200 OK
CACHE-CONTROL: max-age=1800
DATE: Tue, 11 Dec 2007 09:13:18 GMT
EXT: 
LOCATION: http://192.168.2.88:49152/description.xml
SERVER: Linux/2.6.5-it0, UPnP/1.0, Intel SDK for UPnP devices /1.2
ST: upnp:rootdevice
USN: uuid:{59765E24-0C12-4d20-BsdsEE-71578AE02CB9}_e0:61:b2:12:16:c1::upnp:rootdevice

The next day we were left alone, but after that it took off again during Wednesday night and in the morning hours. I made some statistics with self-written Perl scripts and the flot library from exported firewall logs to showcase the scale of the attack though the amount of logged connections per minute:

fwstatsI should emphasize that this statistic is based on logged connection objects from the firewalls point of view, and not on raw IP packets. Multiple packets with same same source/destination IP and port within the UDP session timeout are counted as a single dropped connection, and usually multiple packets were sent from one source at the same time as I could confirm with packet traces.
When I checked the active firewall interfaces during the live attack, I saw 200,000-350,000 packets per second on the external interface. For comparison, we usually have 50,000-60,000 during peak hours.
The flood of packets also caused a huge number of interface RX drops (netstat -ni, NIC driver RX ring buffers overflowing), packets which were probably not even included in the total packet rate.

As an interesting side note and as you can see in the above graph, the number of ICMP packets also jumped up considerably. These come from ICMP Port Unreachable packets that are generated by systems the attacker sends a spoofed UDP/1900 SSDP packet to, when the system actually does not have a service enabled on this port.
The attack caused our firewalls to log locally instead of sending the logs to the centralized management, even having to delete older log files to cope with the steady rate of new logs (blue bar).

On this Wednesday in total we had a total of 263 million firewall log entries, of which 247 million were UDP/1900 SSDP and 6 million were ICMP traffic. This does not include the data from the deleted log files, which should ramp up the numbers by at least another 25%.

From analyzing the logs in more details I found out that on this day, a total of 3.7 million unique IP addresses were abused in this DDoS attack. That’s quite something. Of these IPs, 3.2 million had at least 10 SSDP log entries, 800,000 had at least 100, and only 42 IPs had at least 1000 log entries. This does not include IPs that only responded with ICMP port unreachable messages.
So the problem wasn’t that a smaller group of IPs in particular bombed us during the attacks, but the large number of total end points that each “only” sent maybe a few hundred packets on average. I’ve compiled a breakdown of the top 100,000 IPs by country:

Rank Number of IPs in the top 100,000 Country
1 51807 China
2 16042 Argentina
3 14104 Bulgaria
4 1743 United States
5 1602 Japan
6 1345 Colombia
7 1263 Panama
8 1138 Taiwan
9 1077 Turkey
10 1075 India
11 1025 Romania
12 1012 Ecuador
13 738 Tunisia
14 669 Mexico
15 576 Bolivia
16 533 Ukraine
17 458 United Kingdom
18 342 Korea, Republic of
19 322 Australia
20 257 Italy
[…] […] […]
38 51 Germany
[…] […] […]
112 1 Zambia

Impact of the Attack and Mitigation

As mentioned earlier the packets were destined for port UDP/80 to our web server. As we do obviously only allow TCP/80 for normal HTTP to our web servers, the packets were dropped right at our Check Point internet firewall (R77.20 for the record).

The high rate of packets that begged to be thrown in the digital bin caused the firewall CPU utilization to jump straight to 100% during each attack window. During the first attack. the ClusterXL performed a failover too. Subsequently the CUL (Cluster Under Load) mechanism automatically prevented unnecessary flapping between the gateways, logging messages like:

[cul_load_freeze][CUL - Cluster] Setting CUL FREEZE_ON, high kernel CPU usage (99%) on local Member 0, threshold = 80%

The SSDP packets generated a good amount of traffic with approximately 300-600 Mbit/s, but not enough to completely fill our 1Gbit/s uplink (which even during peak hours is only utilized to about 40%).
The firewall however was under heavy load and mainly busy discarding the SSDP junk that was being thrown at it, causing legitimate connections to be slow or arbitrarily failing. Connectivity to the internet was fortunately not completely down but moderately slow/unstable. It was clear some kind of action was needed.

After the first attacks on Monday I activated the SecureXL optimized drops feature on the firewall cluster, hoping it will enable the systems to better cope with these floods should they occur again. I made sure it’s enabled on the node after installing the policy:

# fwaccel stat
Accelerator Status : on
Accept Templates   : enabled
Drop Templates     : enabled

Unfortunately this didn’t help much in our case. When the attacks came up on Wednesday again, I saw that packets were being dropped by SecureXL drop templates (fwaccel stats -d), but the load didn’t change much. It seems like only a fraction of the packets were dropped on the accelerated SecureXL layer.
I assume the generic optimized drop feature that dynamically creates drop templates wasn’t very effective because of the large amount of source IPs participating in this DDoS attack.

Next up I activated a SecureXL accelerated drop rule to handle the traffic, and this did the trick. The CPU utilization immediately declined to acceptable levels and connectivity became stable again. Meanwhile I could see that the attack was still ongoing with cpview showing that all of the traffic was dropped on the SecureXL layer.
In the earlier graph up above the activation of the accelerated drop rule is indicated by the green bar. As you can see, no more UDP/1900 packets were logged, as we did not enable logging for these packets, but the ICMP log amount shows that the attack continued unsuccessfully for almost an our until it finally subsided.

The configuration of these accelerated drop rules is fairly simple, in my case I just set it to filter any UDP traffic with destination port 80 and activated the rule on the external interface only:

# cat /root/dropcfg
dport 80 proto 17

# sim dropcfg -e -f /root/dropcfg

Into the trash it goes.
The rule is active immediately and works well, I haven’t noticed any negative impact. I could have included a destination directive as well but we don’t have any such legitimate traffic at all and I wanted to keep it simple, also for the sake of not having to bother a busy firewall with matching the packet’s destination IP.
Unfortunately there seems to be a bug with displaying the current sim dropcfg destination, but I can live with that for now:

# sim dropcfg -l
ioctl getdropcfg#1 failed

Keep in mind that you should configure these rules on passive cluster members as well, so they can make use of it once they  becomes active and that these rules don’t survive a reboot, so put the command in a startup script (/etc/rc.d/rc.local) as described in the SK article.

As a general performance optimization, I later also increased the RX ring size buffer on the Check Point Firewall NICs from the default 256 to 1024. This helps with driver RX drops and a decreases the softIRQ burden on the CPU during high packet rate situations.

Lessons Learned

SSDP is a protocol that few admins know about, and probably even less suspect it of being used in a DDoS attack. Expect it to be used more frequently for DDoS attacks.

Check Point already provides a number of basic, but effective out of the box features you can use to protect against some DDoS threats without having to buy additional licenses/software blades or dedicated DDoS protector appliances.
There are:
– Optimized Drops
– Accelerated Drop Rules
– Penalty Box – This can be especially useful when dealing with zombies that each send large amounts of traffic (only if it is dropped/triggers IPS).
– Rate Limiting
These and other general performance optimizations especially in relation to DDoS attacks are also described in this Check Point whitepaper.

The characteristic of this DDoS was that it employed a huge number of 3.7 million victim IPs to amplify the attack, but most of these IPs were only abused for a relatively small number of requests each.

We have no idea who was behind this or why. As they sent spoofed packets to the SSDP-enabled devices, even they can’t tell the actual attacker source. There are other odd things like how the attack was launched sporadically and never lasted very long, how attacks were launched at hours accomplishing pretty much no damage. Whatever was the attackers goal, he wasn’t very determined about it. Besides I see absolutely no point in targeting a the web site of a non-profit and unpolitical organization such as ours.

Too many ISPs around the world still haven’t implemented BCP 38, Network Ingress Filtering to prevent IP-spoofing from their networks. Without being able to spoof the victim IP address, such reflection/amplification attacks using DNS, NTP, SSDP or other connectionless (UDP-based) protocols would be impossible.

Into the trash it goes.

discarded

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s