Bug 734991

Summary:	Large numbers of "TCP: Possible SYN flooding on port X. Dropping request"
Product:	[Fedora] Fedora	Reporter:	Naoki <naoki>
Component:	kernel	Assignee:	Kernel Maintainer List <kernel-maint>
Status:	CLOSED NOTABUG	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	16	CC:	gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, nhorman
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2011-09-27 10:41:07 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Naoki 2011-09-01 05:37:43 UTC

Description of problem:
High volume of "kernel: [6289924.871236] TCP: Possible SYN flooding on port 8080. Dropping request" messages being logged on a high traffic server however syn cookies is supposedly disabled:

# cat /proc/sys/net/ipv4/tcp_syncookies 
0
# cat /proc/sys/net/ipv4/tcp_max_syn_backlog
65536


As with this thread it's the same issue (different server software): http://squid-web-proxy-cache.1019090.n4.nabble.com/possible-SYN-flooding-on-port-3128-Sending-cookies-td2242687.html

Version-Release number of selected component (if applicable):
2.6.38.8-32.fc15.x86_64 & 2.6.40.3-0.fc15.x86_64 (probably others)

How reproducible:
Reliably when sessions exceed ~2000. This is _not_ deemed to be a DoS of any kind.

Steps to Reproduce:
1. Install apache,squid,tomcat,etc
2. Send a large amount of traffic.

  
Actual results:
Dopped connections due to SYN flood protection (incorrectly?) activating.

Expected results:
Connections handled as normal.

Comment 1 Dave Jones 2011-09-01 15:54:02 UTC

If syncookies were enabled, you should also be seeing "sending cookies" messages.

Comment 2 Chuck Ebbert 2011-09-06 11:58:28 UTC

Was the max syn backlog set before starting the network daemon, or afterward?

Comment 3 Naoki 2011-09-12 03:21:59 UTC

With cookies enabled we see "Sending cookies", the proposed solution to this as read in various forums is to set cookies=0. After doing so the message simply changes to "dropping" which results in the same thing, namely failed network connections and an unreliable service. My traffic is valid and I'd really prefer to just turn off this protection as it does me more harm than good.

Here are the relevant sysctl sections:

net.ipv4.tcp_max_syn_backlog = 2048
net.ipv4.tcp_synack_retries = 5
net.ipv4.tcp_syncookies = 0
net.ipv4.tcp_syn_retries = 5
net.ipv6.conf.all.max_desync_factor = 600
net.ipv6.conf.default.max_desync_factor = 600
net.ipv6.conf.dummy0.max_desync_factor = 600
net.ipv6.conf.em1/203.max_desync_factor = 600
net.ipv6.conf.em1.max_desync_factor = 600
net.ipv6.conf.em2.max_desync_factor = 600
net.ipv6.conf.lo.max_desync_factor = 600
net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 60
net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 120

net.netfilter.nf_conntrack_acct = 0
net.netfilter.nf_conntrack_buckets = 16384
net.netfilter.nf_conntrack_checksum = 1
net.netfilter.nf_conntrack_count = 10442
net.netfilter.nf_conntrack_events = 1
net.netfilter.nf_conntrack_events_retry_timeout = 15
net.netfilter.nf_conntrack_expect_max = 256
net.netfilter.nf_conntrack_frag6_high_thresh = 262144
net.netfilter.nf_conntrack_frag6_low_thresh = 196608
net.netfilter.nf_conntrack_frag6_timeout = 60
net.netfilter.nf_conntrack_generic_timeout = 600
net.netfilter.nf_conntrack_icmp_timeout = 30
net.netfilter.nf_conntrack_icmpv6_timeout = 30
net.netfilter.nf_conntrack_log_invalid = 0
net.netfilter.nf_conntrack_max = 262144
net.netfilter.nf_conntrack_tcp_be_liberal = 0
net.netfilter.nf_conntrack_tcp_loose = 1
net.netfilter.nf_conntrack_tcp_max_retrans = 3
net.netfilter.nf_conntrack_tcp_timeout_close = 10
net.netfilter.nf_conntrack_tcp_timeout_close_wait = 60
net.netfilter.nf_conntrack_tcp_timeout_established = 432000
net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_last_ack = 30
net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300
net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 60
net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 120
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300
net.netfilter.nf_conntrack_timestamp = 0
net.netfilter.nf_conntrack_udp_timeout = 30
net.netfilter.nf_conntrack_udp_timeout_stream = 180
net.nf_conntrack_max = 262144

Dead end threads on the subject of TCP sessions being affected by this include;
- https://bbs.archlinux.org/viewtopic.php?id=123807
- http://www.spinics.net/lists/squid/msg54852.html

Comment 4 Naoki 2011-09-12 03:24:20 UTC

Ah sorry Chuck I missed the comment - we've restarted the daemon but I think not the entire network service (assuming "service network restart"). We will try that and report back.

Comment 5 Naoki 2011-09-12 06:58:09 UTC

I get the feeling this isn't a syn_cookie issue at all but rather a conntrack issue which would mean the kernel message is perhaps misleading. I cannot see a variable limit that we might be hitting that would result in "Dropping request" errors though so it's all a bit of a guess.

Comment 6 Chuck Ebbert 2011-09-12 14:35:43 UTC

Before 2.6.40, connections would be silently dropped due to a bug in the code. That was fixed, so maybe the connections were always being dropped and you just weren't aware of it until now.

Comment 7 Naoki 2011-09-13 01:54:56 UTC

That's possible but we did not previously have iptables (and therefore conntrack even loaded) so we wouldn't have run into any old bugs. We certainly are now though. 

The question is why am I (and seemingly others) overloading conntrack to the point of dropping connections due to a false positive SYN flood attack response with 300-400Mbps of legitimate traffic.

Load is nominal:
# uptime
 10:46:10 up 19:23,  2 users,  load average: 0.41, 0.46, 0.43

SYNs aren't even particularly high:
# netstat -tan | grep ':8080 ' | awk '{print $6}' | sort | uniq -c
      2 CLOSE_WAIT
   1175 ESTABLISHED
      2 FIN_WAIT1
     55 LAST_ACK
      1 LISTEN
     41 SYN_RECV
    125 TIME_WAIT

Yet we have an obvious degradation of service (resulting in monitoring checks timing out and fail-over responses) especially at higher traffic times of day:
# for i in `seq -w 0 23`; do echo -en "Hour $i:\t"; grep " $i:" /var/log/messages| grep -c "Possible SYN "; done
Hour 00:	7197
Hour 01:	242
Hour 02:	0
Hour 03:	0
Hour 04:	0
Hour 05:	0
Hour 06:	0
Hour 07:	0
Hour 08:	0
Hour 09:	0
Hour 10:	56
Hour 11:	19
Hour 12:	192
Hour 13:	10
Hour 14:	0
Hour 15:	100
Hour 16:	1281
Hour 17:	1390
Hour 18:	355
Hour 19:	151
Hour 20:	1375
Hour 21:	9108
Hour 22:	11933
Hour 23:	9306
(this matches like our normal traffic distribution).

Our next step is to test a system without iptables loaded - but I do not see that as a long term solution because a) it's poor security form for and b) doesn't help resolve the issue for anybody else seeing it.

Comment 8 Naoki 2011-09-13 02:33:53 UTC

Oh dear. I see this is actually part of ipv{4,6}/tcp_ipv{4,6}.c and not conntrack which means we're stuck with it.

Comment 9 Naoki 2011-09-13 02:43:40 UTC

So I think you are totally correct that we were hitting it before, however the layout of the older service was many servers on 100Mbit the new setup is fewer servers on 1Gbit and so these boxes are much more prone to hit it. Sadly of course they all hit this wall the same time.

Comment 10 Naoki 2011-09-13 04:35:38 UTC

Some relevant code sections in tcp_ipv4.c: 

        /* TW buckets are converted to open requests without
         * limitations, they conserve resources and peer is
         * evidently real one.
         */
        if (inet_csk_reqsk_queue_is_full(sk) && !isn) {
                if (net_ratelimit())
                        syn_flood_warning(skb);
#ifdef CONFIG_SYN_COOKIES
                if (sysctl_tcp_syncookies) {
                        want_cookie = 1;
                } else
#endif
                goto drop;
        }


Our TW buckets are set to = net.ipv4.tcp_max_tw_buckets = 262144 which I had assumed would be high enough. What other parameters come into play when deciding on dropping or not?

Comment 11 Chuck Ebbert 2011-09-14 11:07:30 UTC

(In reply to comment #10)
> Our TW buckets are set to = net.ipv4.tcp_max_tw_buckets = 262144 which I had
> assumed would be high enough. What other parameters come into play when
> deciding on dropping or not?

/proc/sys/net/core/somaxconn ?

The application should also have settings for that, but it seems to be capped by the sysctl value.

Comment 12 Naoki 2011-09-15 01:15:53 UTC

Hi Chuck, I'd not even come across somaxconn before, and it's at the default 128. Looking this up I see recommendations up to 3000 which makes 128 seem incredibly low. I've pushed it up to 1024 and will give it 24 hours for comparison and shall report back. 

I get the feeling from all of this that the tcp stack defaults in Fedora (RHEL6) are too low for even half gig speeds, unless it's just our particular brand of short-lived low-packet count traffic that's throwing it out of course.

You help is much appreciated by the way.

Comment 13 Naoki 2011-09-15 06:21:52 UTC

Note I changed the bug subject to hopefully better reflect the issue. Also still had "Dropping" at somaxconn 1024 to pushed up to 3000. See how we go.

Comment 14 Naoki 2011-09-16 01:52:24 UTC

Sadly no improvement. I have the same number of drops recorded in messages on a 'tuned' machine with somaxconn=3000 as an untuned one of the same spec set to default 128.

srv1
# cat /proc/sys/net/core/somaxconn
3000
# for i in `seq -w 10 23`; do echo -en "Hour $i:\t"; grep "Sep 15 $i:" /var/log/messages| grep -c "Possible SYN "; done
Hour 10:	10
Hour 11:	94
Hour 12:	0
Hour 13:	0
Hour 14:	0
Hour 15:	0
Hour 16:	0
Hour 17:	0
Hour 18:	20
Hour 19:	208
Hour 20:	3570
Hour 21:	6711
Hour 22:	7000
Hour 23:	6452


srv2
# cat /proc/sys/net/core/somaxconn
128
# for i in `seq -w 10 23`; do echo -en "Hour $i:\t"; grep "Sep 15 $i:" /var/log/messages| grep -c "Possible SYN "; done
Hour 10:	44
Hour 11:	80
Hour 12:	0
Hour 13:	0
Hour 14:	0
Hour 15:	0
Hour 16:	0
Hour 17:	0
Hour 18:	53
Hour 19:	188
Hour 20:	3708
Hour 21:	6940
Hour 22:	7027
Hour 23:	6778

(I am aware these aren't total numbers due to ratelimit consolidation but it's indicative of pattern)

A rundown of the parameters thus far;

net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_sack = 1
net.ipv4.tcp_retrans_collapse = 1
net.ipv4.ip_default_ttl = 64
net.ipv4.ip_no_pmtu_disc = 0
net.ipv4.ip_nonlocal_bind = 0
net.ipv4.tcp_syn_retries = 5
net.ipv4.tcp_synack_retries = 5
net.ipv4.tcp_max_orphans = 262144
net.ipv4.tcp_max_tw_buckets = 262144
net.ipv4.tcp_keepalive_time = 7200
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_retries1 = 3
net.ipv4.tcp_retries2 = 15
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_syncookies = 0
net.ipv4.tcp_max_syn_backlog = 65536
net.ipv4.ip_local_port_range = 32768    61000
net.ipv4.tcp_mem = 144888       193186  289776
net.ipv4.tcp_wmem = 4096        16384   4194304
net.ipv4.tcp_rmem = 4096        87380   4194304
net.nf_conntrack_max = 262144


Additionally there are no errors on the Ethernet sides (peak 17MB/s) and packets  are around 6k/second at peak. 

The application server running is Tomcat with;
        connectionTimeout="60000" 
        maxThreads="1024" 
        socket.directBuffer="true"      
        socket.appReadBufSize="1024" 
        socket.appWriteBufSize="1024" 
And using APR. 

I wouldn't mind just disabling SYN flood protection entirely and comparing the results.

Comment 15 Naoki 2011-09-27 01:50:05 UTC

After all the kernel tuning what we found that actually worked was altering "acceptCount" in Tomcat. Now obviously not a kernel bug so we can probably just close this bz. However I must note that a generic kernel warning about being unable to handle traffic volumes -which could be due to any number of things- probably should reflect this in the log message instead of "Possible SYN flood". Well yes, but if we listed all the things it "possibly" could be we'd quickly run out of log space.

Comment 16 Neil Horman 2011-09-27 10:41:07 UTC

Thats exactly what you should do.  This isn't a problem at all, its a reflection of the backlog parameter passed to the listen call in an application, and the rate at which the application then accepts connections on that socket.  That backlog parameter defines how many outstanding requests can be queued on a listening socket before they start to get dropped.  All thats happening here is that the application, Tomcat in this case, is saying I want to have no more than X outstanding connection requests on my listening socket at any one time, but its calling accept on that socket at a sufficiently slow rate that the queue is overflowing and dropping subsequent requests.  The fix is to increase the queue length (as you've done with the acceptCount parameter), or to make tomcat accept requests more quickly.

Comment 17 Naoki 2011-09-28 04:53:26 UTC

Hello Neil yes that's exactly right, we've come to the end of the trail here and implemented a solution so this can be closed. I do wonder if the incorrect "Syn flood" message though doesn't constitute a trivial level bug of it's own though.