Bug 734991
Summary: | Large numbers of "TCP: Possible SYN flooding on port X. Dropping request" | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Naoki <naoki> |
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
Status: | CLOSED NOTABUG | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 16 | CC: | gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, nhorman |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2011-09-27 10:41:07 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Naoki
2011-09-01 05:37:43 UTC
If syncookies were enabled, you should also be seeing "sending cookies" messages. Was the max syn backlog set before starting the network daemon, or afterward? With cookies enabled we see "Sending cookies", the proposed solution to this as read in various forums is to set cookies=0. After doing so the message simply changes to "dropping" which results in the same thing, namely failed network connections and an unreliable service. My traffic is valid and I'd really prefer to just turn off this protection as it does me more harm than good. Here are the relevant sysctl sections: net.ipv4.tcp_max_syn_backlog = 2048 net.ipv4.tcp_synack_retries = 5 net.ipv4.tcp_syncookies = 0 net.ipv4.tcp_syn_retries = 5 net.ipv6.conf.all.max_desync_factor = 600 net.ipv6.conf.default.max_desync_factor = 600 net.ipv6.conf.dummy0.max_desync_factor = 600 net.ipv6.conf.em1/203.max_desync_factor = 600 net.ipv6.conf.em1.max_desync_factor = 600 net.ipv6.conf.em2.max_desync_factor = 600 net.ipv6.conf.lo.max_desync_factor = 600 net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 60 net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 120 net.netfilter.nf_conntrack_acct = 0 net.netfilter.nf_conntrack_buckets = 16384 net.netfilter.nf_conntrack_checksum = 1 net.netfilter.nf_conntrack_count = 10442 net.netfilter.nf_conntrack_events = 1 net.netfilter.nf_conntrack_events_retry_timeout = 15 net.netfilter.nf_conntrack_expect_max = 256 net.netfilter.nf_conntrack_frag6_high_thresh = 262144 net.netfilter.nf_conntrack_frag6_low_thresh = 196608 net.netfilter.nf_conntrack_frag6_timeout = 60 net.netfilter.nf_conntrack_generic_timeout = 600 net.netfilter.nf_conntrack_icmp_timeout = 30 net.netfilter.nf_conntrack_icmpv6_timeout = 30 net.netfilter.nf_conntrack_log_invalid = 0 net.netfilter.nf_conntrack_max = 262144 net.netfilter.nf_conntrack_tcp_be_liberal = 0 net.netfilter.nf_conntrack_tcp_loose = 1 net.netfilter.nf_conntrack_tcp_max_retrans = 3 net.netfilter.nf_conntrack_tcp_timeout_close = 10 net.netfilter.nf_conntrack_tcp_timeout_close_wait = 60 net.netfilter.nf_conntrack_tcp_timeout_established = 432000 net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 120 net.netfilter.nf_conntrack_tcp_timeout_last_ack = 30 net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300 net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 60 net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 120 net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120 net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300 net.netfilter.nf_conntrack_timestamp = 0 net.netfilter.nf_conntrack_udp_timeout = 30 net.netfilter.nf_conntrack_udp_timeout_stream = 180 net.nf_conntrack_max = 262144 Dead end threads on the subject of TCP sessions being affected by this include; - https://bbs.archlinux.org/viewtopic.php?id=123807 - http://www.spinics.net/lists/squid/msg54852.html Ah sorry Chuck I missed the comment - we've restarted the daemon but I think not the entire network service (assuming "service network restart"). We will try that and report back. I get the feeling this isn't a syn_cookie issue at all but rather a conntrack issue which would mean the kernel message is perhaps misleading. I cannot see a variable limit that we might be hitting that would result in "Dropping request" errors though so it's all a bit of a guess. Before 2.6.40, connections would be silently dropped due to a bug in the code. That was fixed, so maybe the connections were always being dropped and you just weren't aware of it until now. That's possible but we did not previously have iptables (and therefore conntrack even loaded) so we wouldn't have run into any old bugs. We certainly are now though. The question is why am I (and seemingly others) overloading conntrack to the point of dropping connections due to a false positive SYN flood attack response with 300-400Mbps of legitimate traffic. Load is nominal: # uptime 10:46:10 up 19:23, 2 users, load average: 0.41, 0.46, 0.43 SYNs aren't even particularly high: # netstat -tan | grep ':8080 ' | awk '{print $6}' | sort | uniq -c 2 CLOSE_WAIT 1175 ESTABLISHED 2 FIN_WAIT1 55 LAST_ACK 1 LISTEN 41 SYN_RECV 125 TIME_WAIT Yet we have an obvious degradation of service (resulting in monitoring checks timing out and fail-over responses) especially at higher traffic times of day: # for i in `seq -w 0 23`; do echo -en "Hour $i:\t"; grep " $i:" /var/log/messages| grep -c "Possible SYN "; done Hour 00: 7197 Hour 01: 242 Hour 02: 0 Hour 03: 0 Hour 04: 0 Hour 05: 0 Hour 06: 0 Hour 07: 0 Hour 08: 0 Hour 09: 0 Hour 10: 56 Hour 11: 19 Hour 12: 192 Hour 13: 10 Hour 14: 0 Hour 15: 100 Hour 16: 1281 Hour 17: 1390 Hour 18: 355 Hour 19: 151 Hour 20: 1375 Hour 21: 9108 Hour 22: 11933 Hour 23: 9306 (this matches like our normal traffic distribution). Our next step is to test a system without iptables loaded - but I do not see that as a long term solution because a) it's poor security form for and b) doesn't help resolve the issue for anybody else seeing it. Oh dear. I see this is actually part of ipv{4,6}/tcp_ipv{4,6}.c and not conntrack which means we're stuck with it. So I think you are totally correct that we were hitting it before, however the layout of the older service was many servers on 100Mbit the new setup is fewer servers on 1Gbit and so these boxes are much more prone to hit it. Sadly of course they all hit this wall the same time. Some relevant code sections in tcp_ipv4.c: /* TW buckets are converted to open requests without * limitations, they conserve resources and peer is * evidently real one. */ if (inet_csk_reqsk_queue_is_full(sk) && !isn) { if (net_ratelimit()) syn_flood_warning(skb); #ifdef CONFIG_SYN_COOKIES if (sysctl_tcp_syncookies) { want_cookie = 1; } else #endif goto drop; } Our TW buckets are set to = net.ipv4.tcp_max_tw_buckets = 262144 which I had assumed would be high enough. What other parameters come into play when deciding on dropping or not? (In reply to comment #10) > Our TW buckets are set to = net.ipv4.tcp_max_tw_buckets = 262144 which I had > assumed would be high enough. What other parameters come into play when > deciding on dropping or not? /proc/sys/net/core/somaxconn ? The application should also have settings for that, but it seems to be capped by the sysctl value. Hi Chuck, I'd not even come across somaxconn before, and it's at the default 128. Looking this up I see recommendations up to 3000 which makes 128 seem incredibly low. I've pushed it up to 1024 and will give it 24 hours for comparison and shall report back. I get the feeling from all of this that the tcp stack defaults in Fedora (RHEL6) are too low for even half gig speeds, unless it's just our particular brand of short-lived low-packet count traffic that's throwing it out of course. You help is much appreciated by the way. Note I changed the bug subject to hopefully better reflect the issue. Also still had "Dropping" at somaxconn 1024 to pushed up to 3000. See how we go. Sadly no improvement. I have the same number of drops recorded in messages on a 'tuned' machine with somaxconn=3000 as an untuned one of the same spec set to default 128. srv1 # cat /proc/sys/net/core/somaxconn 3000 # for i in `seq -w 10 23`; do echo -en "Hour $i:\t"; grep "Sep 15 $i:" /var/log/messages| grep -c "Possible SYN "; done Hour 10: 10 Hour 11: 94 Hour 12: 0 Hour 13: 0 Hour 14: 0 Hour 15: 0 Hour 16: 0 Hour 17: 0 Hour 18: 20 Hour 19: 208 Hour 20: 3570 Hour 21: 6711 Hour 22: 7000 Hour 23: 6452 srv2 # cat /proc/sys/net/core/somaxconn 128 # for i in `seq -w 10 23`; do echo -en "Hour $i:\t"; grep "Sep 15 $i:" /var/log/messages| grep -c "Possible SYN "; done Hour 10: 44 Hour 11: 80 Hour 12: 0 Hour 13: 0 Hour 14: 0 Hour 15: 0 Hour 16: 0 Hour 17: 0 Hour 18: 53 Hour 19: 188 Hour 20: 3708 Hour 21: 6940 Hour 22: 7027 Hour 23: 6778 (I am aware these aren't total numbers due to ratelimit consolidation but it's indicative of pattern) A rundown of the parameters thus far; net.ipv4.tcp_timestamps = 1 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_sack = 1 net.ipv4.tcp_retrans_collapse = 1 net.ipv4.ip_default_ttl = 64 net.ipv4.ip_no_pmtu_disc = 0 net.ipv4.ip_nonlocal_bind = 0 net.ipv4.tcp_syn_retries = 5 net.ipv4.tcp_synack_retries = 5 net.ipv4.tcp_max_orphans = 262144 net.ipv4.tcp_max_tw_buckets = 262144 net.ipv4.tcp_keepalive_time = 7200 net.ipv4.tcp_keepalive_probes = 9 net.ipv4.tcp_keepalive_intvl = 75 net.ipv4.tcp_retries1 = 3 net.ipv4.tcp_retries2 = 15 net.ipv4.tcp_fin_timeout = 30 net.ipv4.tcp_syncookies = 0 net.ipv4.tcp_max_syn_backlog = 65536 net.ipv4.ip_local_port_range = 32768 61000 net.ipv4.tcp_mem = 144888 193186 289776 net.ipv4.tcp_wmem = 4096 16384 4194304 net.ipv4.tcp_rmem = 4096 87380 4194304 net.nf_conntrack_max = 262144 Additionally there are no errors on the Ethernet sides (peak 17MB/s) and packets are around 6k/second at peak. The application server running is Tomcat with; connectionTimeout="60000" maxThreads="1024" socket.directBuffer="true" socket.appReadBufSize="1024" socket.appWriteBufSize="1024" And using APR. I wouldn't mind just disabling SYN flood protection entirely and comparing the results. After all the kernel tuning what we found that actually worked was altering "acceptCount" in Tomcat. Now obviously not a kernel bug so we can probably just close this bz. However I must note that a generic kernel warning about being unable to handle traffic volumes -which could be due to any number of things- probably should reflect this in the log message instead of "Possible SYN flood". Well yes, but if we listed all the things it "possibly" could be we'd quickly run out of log space. Thats exactly what you should do. This isn't a problem at all, its a reflection of the backlog parameter passed to the listen call in an application, and the rate at which the application then accepts connections on that socket. That backlog parameter defines how many outstanding requests can be queued on a listening socket before they start to get dropped. All thats happening here is that the application, Tomcat in this case, is saying I want to have no more than X outstanding connection requests on my listening socket at any one time, but its calling accept on that socket at a sufficiently slow rate that the queue is overflowing and dropping subsequent requests. The fix is to increase the queue length (as you've done with the acceptCount parameter), or to make tomcat accept requests more quickly. Hello Neil yes that's exactly right, we've come to the end of the trail here and implemented a solution so this can be closed. I do wonder if the incorrect "Syn flood" message though doesn't constitute a trivial level bug of it's own though. |