Description of problem: Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Sorry, I have to have network problems or something: Description of problem: Gkrellmd process eats around 75% cpu time Found on x86_64 arch Version-Release number of selected component (if applicable): 2.3.1-3.fc9 How reproducible: start gkrellm daemon Steps to Reproduce: 1. service gkrellmd start 2. ps aux | grep gkrellmd 3. look for the third column, which indicates CPU usage Actual results: gkrellmd 30301 76.9 0.0 50924 1100 ? Ss 12:14 12:07 /usr/sbin/gkrellmd -d Expected results: Some reasonable amount of CPU time Additional info: Not able to reproduce the error on dualcore i386 architecture. There it eats about 2-3%
Thanks for reporting this some stracing of gkrellmd has found that reading from /proc/net/tcp and reading from /proc/net/tcp6 is the culprit, try this on your x86_64 machine to confirm: "time cat /proc/net/tcp" To give you an idea on my rawhide x86_64 machine: [hans@localhost devel]$ time cat /proc/net/tcp <snip> real 0m0.520s user 0m0.000s sys 0m0.446s Thats amazingly slow, esp as I only have 8 tcp connections open. Some maybe usefull info: top reports a very high load (50%) from soft IRQ's. Anyways changing this to a kernel bug.
Got similar results [root@becks ~]# time cat /proc/net/tcp <snip> real 0m0.486s user 0m0.002s sys 0m0.482s And forget to say, that I'm using rawhide kernel kernel-2.6.27-0.244.rc2.git1.fc10
Can you try the rc4 kernel at http://koji.fedoraproject.org/koji/buildinfo?buildID=60268
Not able to boot not even the rc5 kernel build. Both ends with white screen shortly after beginning of kernel inicialization.
Got new rc6 kernel version installed... problem still persist... [root@becks ~]# cat /proc/version && time cat /proc/net/tcp Linux version 2.6.27-0.323.rc6.fc10.x86_64 (mockbuild.phx.redhat.com) (gcc version 4.3.2 20080905 (Red Hat 4.3.2-3) (GCC) ) #1 SMP Thu Sep 11 06:07:35 EDT 2008 <snip> real 0m0.535s user 0m0.000s sys 0m0.534s
does the samething happen if you do: time cat /proc/net/tcp6 Also can you provide the output contents of /proc/net/tcp? I'm interested to know how many time-wait sockets you have. Lastly could you take a quick tcpdump of all the interfaces on your system? I would be nice to see what your incomming traffic looks like We may need to run some stap scripts to figure out where you're locking but my guess is you have a socket that is getting a time-wait socket lock contended for (they use the same lock that established sockets use). Its also possible that perhaps your box is getting pummeled with a SYN flood, and thats upping the contention for the icsk_accept_queue, which could be slowing doen access to data on listening sockets. Thanks.
Created attachment 316745 [details] quickdump of all interfaces
[root@becks ~]# time cat /proc/net/tcp6 sl local_address remote_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode 0: 00000000000000000000000000000000:CD4E 00000000000000000000000000000000:0000 0A 00000000:00000000 00:00000000 00000000 708 0 27266 1 ffff880072099400 299 0 0 2 -1 1: 00000000000000000000000000000000:4ACE 00000000000000000000000000000000:0000 0A 00000000:00000000 00:00000000 00000000 496 0 8579 1 ffff8800758a0a00 299 0 0 2 -1 2: 00000000000000000000000000000000:1771 00000000000000000000000000000000:0000 0A 00000000:00000000 00:00000000 00000000 708 0 26818 1 ffff880072098a00 299 0 0 2 -1 3: 00000000000000000000000000000000:0016 00000000000000000000000000000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 8046 1 ffff8800758a0000 299 0 0 2 -1 4: 00000000000000000000000000000000:93F7 00000000000000000000000000000000:0000 0A 00000000:00000000 00:00000000 00000000 42 0 9784 1 ffff8800758a1400 299 0 0 2 -1 5: 00000000000000000000000001000000:177A 00000000000000000000000000000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 36413 1 ffff880072099e00 299 0 0 2 -1 6: 00000000000000000000000001000000:177B 00000000000000000000000000000000:0000 0A 00000000:00000000 00:00000000 00000000 708 0 20745 1 ffff880072098000 299 0 0 2 -1 7: 00000000000000000000000001000000:177C 00000000000000000000000000000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 39678 1 ffff88007209a800 299 0 0 2 -1 real 0m0.746s user 0m0.000s sys 0m0.504s [root@becks ~]# time cat /proc/net/tcp sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode 0: 00000000:EA62 00000000:0000 0A 00000000:00000000 00:00000000 00000000 42 0 9785 1 ffff880077974280 299 0 0 2 -1 1: 00000000:E8E6 00000000:0000 0A 00000000:00000000 00:00000000 00000000 708 0 27267 1 ffff880077894280 299 0 0 2 -1 2: 00000000:EC26 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 6738 1 ffff880077970980 299 0 0 2 -1 3: 00000000:170D 00000000:0000 0A 00000000:00000000 00:00000000 00000000 708 0 26825 1 ffff880077893900 299 0 0 2 -1 4: 00000000:4ACE 00000000:0000 0A 00000000:00000000 00:00000000 00000000 496 0 8578 1 ffff880077971c80 299 0 0 2 -1 5: 00000000:006F 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 6679 1 ffff880077970000 299 0 0 2 -1 6: 00000000:1771 00000000:0000 0A 00000000:00000000 00:00000000 00000000 708 0 26819 1 ffff880077892f80 299 0 0 2 -1 7: 017AA8C0:0035 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 8850 1 ffff880077972f80 299 0 0 2 -1 8: 00000000:0016 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 8044 1 ffff880077971300 299 0 0 2 -1 9: 0100007F:0277 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 8859 1 ffff880077973900 299 0 0 2 -1 10: 0100007F:0019 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 8329 1 ffff880077890980 299 0 0 2 -1 11: 0100007F:177A 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 36414 1 ffff880077895f00 299 0 0 2 -1 12: 0100007F:177B 00000000:0000 0A 00000000:00000000 00:00000000 00000000 708 0 20746 1 ffff880077892600 299 0 0 2 -1 13: 0100007F:177C 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 39679 1 ffff880077896880 299 0 0 2 -1 14: 7B20220A:0016 1620220A:C006 01 00000000:00000000 02:00094619 00000000 0 0 20667 3 ffff880077891c80 20 4 30 4 -1 15: 0100007F:0016 0100007F:ED8B 01 00000000:00000000 02:000A7A43 00000000 0 0 39595 2 ffff880077975580 20 3 29 4 -1 16: 0100007F:ED8B 0100007F:0016 01 00000000:00000000 02:000A7A36 00000000 708 0 39594 2 ffff880077974c00 20 3 26 4 -1 17: 0100007F:EB7D 0100007F:170D 01 00000000:00000000 00:00000000 00000000 708 0 29882 1 ffff880077894c00 20 3 31 3 -1 18: 0100007F:170D 0100007F:EB7D 01 00000000:00000000 00:00000000 00000000 708 0 29883 1 ffff880077895580 20 3 8 3 19 19: 7B20220A:4ACE 1620220A:B3CD 01 00000000:00000000 00:00000000 00000000 496 0 10568 1 ffff880077890000 20 3 0 5 -1 20: 7B20220A:0016 1620220A:AFE3 01 00000000:00000000 02:0004FF6D 00000000 0 0 36338 2 ffff880077891300 20 3 7 5 -1 real 0m0.775s user 0m0.002s sys 0m0.769s
This is work desktop, where is only ssh, vnc, gkrellmd port and somethimes the iceccd ports used and open. No other traffic then normal stuff like browsing, nfs using or messaging thru company xmpp is used... There is some problems within ipv6 network in our office, so it's not used now.
Note to self. The problem appears to be happening in tcp_seq_next (which is common to both the v4 and v6 version of the file). It doesn't always take a long time to run, but it appears that it occasionally takes about 100ms to execute. Subsequent calls to tcp_seq_stop and tcp4_seq_show are both seen afterwards, so it doesn't seem related to the end of a list traversal or anything obvious like that
Further note to self. It looks like we start getting 100ms delays when we travers a list of sockets and the sockets we traverse are in the established state.
It appears that each pass through established_get_next requires anywhere from 50-100ms. Looking into why that is.
Looks like we're always taking the longest path through extablished_get_next, going through sk_for_each_node, setting our state to TIMED_WAIT, and repeating the loop. Need to figure out why
note to self: we seem to be going through the loop defined by goto get_tw thousands of times from a cleanly booted system. It appears the tcpinfo_ehash size can be rather large (I think).
ok, this isn't looking good. The latency is the result of needing to do a read_lock on every hash bucket in the ehash set for the tcp_hashinfo structure. Since thats 65536 entries long on large memory systems, its 131,000 lock operation regardless of how many entries are actually used, and that accounts for most of that time. I need to do some looking to see if we can improve on that, but I don't think it looks good.
Neil, The locking problem has been discussed intensively on the Linux-network list, and a patch for it has been written by someone from Suse, so you'd better go and check the linux-network subsystem mailinglist archives.
I found it after I posted yesterday afternoon: http://lists.openwall.net/netdev/2008/08/26/104 And quite honestly, At first I didn't quite trust it, but since we still have to take the lock before we dereference anything, I think we're good. I'll backport the patch.
scratch the backporting, Its early and I was thinking this was RHEL-5. Davej should get this in the next F-9 & rawhide kernel updates . Closing as nextrelease