Bug 459782

Summary: cat /proc/net/tcp takes 0.5 seconds on x86_64, 0.5 seconds !!
Product: [Fedora] Fedora Reporter: Zbysek MRAZ <zmraz>
Component: kernelAssignee: Neil Horman <nhorman>
Status: CLOSED NEXTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: low Docs Contact:
Priority: low    
Version: 9CC: ebenes, hdegoede, nhorman
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-09-18 13:19:49 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
quickdump of all interfaces none

Description Zbysek MRAZ 2008-08-22 10:28:08 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Zbysek MRAZ 2008-08-22 11:37:50 UTC
Sorry, I have to have network problems or something: 

Description of problem:
Gkrellmd process eats around 75% cpu time
Found on x86_64 arch 

Version-Release number of selected component (if applicable):
2.3.1-3.fc9

How reproducible:
start gkrellm daemon

Steps to Reproduce:
1. service gkrellmd start
2. ps aux | grep gkrellmd
3. look for the third column, which indicates CPU usage
  
Actual results:
gkrellmd 30301 76.9  0.0  50924  1100 ?        Ss   12:14  12:07 /usr/sbin/gkrellmd -d

Expected results:
Some reasonable amount of CPU time

Additional info:
Not able to reproduce the error on dualcore i386 architecture. There it eats about 2-3%

Comment 2 Hans de Goede 2008-08-26 15:49:30 UTC
Thanks for reporting this some stracing of gkrellmd has found that reading from /proc/net/tcp and reading from /proc/net/tcp6 is the culprit, try this on your x86_64 machine to confirm:

"time cat /proc/net/tcp"

To give you an idea on my rawhide x86_64 machine:
[hans@localhost devel]$ time cat /proc/net/tcp
<snip>
real    0m0.520s
user    0m0.000s
sys     0m0.446s

Thats amazingly slow, esp as I only have 8 tcp connections open.

Some maybe usefull info: top reports a very high load (50%) from soft IRQ's.

Anyways changing this to a kernel bug.

Comment 3 Zbysek MRAZ 2008-08-26 16:41:31 UTC
Got similar results

[root@becks ~]# time cat /proc/net/tcp
<snip>
real	0m0.486s
user	0m0.002s
sys	0m0.482s

And forget to say, that I'm using rawhide kernel kernel-2.6.27-0.244.rc2.git1.fc10

Comment 4 Dave Jones 2008-08-26 16:54:12 UTC
Can you try the rc4 kernel at 
http://koji.fedoraproject.org/koji/buildinfo?buildID=60268

Comment 5 Zbysek MRAZ 2008-09-01 08:35:21 UTC
Not able to boot not even the rc5 kernel build. Both ends with white screen shortly after beginning of kernel inicialization.

Comment 6 Zbysek MRAZ 2008-09-15 10:12:54 UTC
Got new rc6 kernel version installed... problem still persist...

[root@becks ~]# cat /proc/version && time cat /proc/net/tcp
Linux version 2.6.27-0.323.rc6.fc10.x86_64 (mockbuild.phx.redhat.com) (gcc version 4.3.2 20080905 (Red Hat 4.3.2-3) (GCC) ) #1 SMP Thu Sep 11 06:07:35 EDT 2008
<snip>                   

real	0m0.535s
user	0m0.000s
sys	0m0.534s

Comment 7 Neil Horman 2008-09-15 11:04:27 UTC
does the samething happen  if you do:
time cat /proc/net/tcp6

Also can you provide the output contents of /proc/net/tcp?  I'm interested to know how many time-wait sockets you have.

Lastly could you take a quick tcpdump of all the interfaces on your system?  I would be nice to see what your incomming traffic looks like



We may need to run some stap scripts to figure out where you're locking but my guess is you have a  socket that is getting a time-wait socket lock contended for (they use the same lock that established sockets use).  Its also possible that perhaps your  box is getting pummeled with a SYN flood, and thats upping the contention for the icsk_accept_queue, which could be slowing doen  access to data on listening sockets.

Thanks.

Comment 8 Zbysek MRAZ 2008-09-15 12:41:10 UTC
Created attachment 316745 [details]
quickdump of all interfaces

Comment 9 Zbysek MRAZ 2008-09-15 12:43:02 UTC
[root@becks ~]# time cat /proc/net/tcp6 
  sl  local_address                         remote_address                        st tx_queue rx_queue tr tm->when retrnsmt   uid  timeout inode
   0: 00000000000000000000000000000000:CD4E 00000000000000000000000000000000:0000 0A 00000000:00000000 00:00000000 00000000   708        0 27266 1 ffff880072099400 299 0 0 2 -1
   1: 00000000000000000000000000000000:4ACE 00000000000000000000000000000000:0000 0A 00000000:00000000 00:00000000 00000000   496        0 8579 1 ffff8800758a0a00 299 0 0 2 -1
   2: 00000000000000000000000000000000:1771 00000000000000000000000000000000:0000 0A 00000000:00000000 00:00000000 00000000   708        0 26818 1 ffff880072098a00 299 0 0 2 -1
   3: 00000000000000000000000000000000:0016 00000000000000000000000000000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 8046 1 ffff8800758a0000 299 0 0 2 -1
   4: 00000000000000000000000000000000:93F7 00000000000000000000000000000000:0000 0A 00000000:00000000 00:00000000 00000000    42        0 9784 1 ffff8800758a1400 299 0 0 2 -1
   5: 00000000000000000000000001000000:177A 00000000000000000000000000000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 36413 1 ffff880072099e00 299 0 0 2 -1
   6: 00000000000000000000000001000000:177B 00000000000000000000000000000000:0000 0A 00000000:00000000 00:00000000 00000000   708        0 20745 1 ffff880072098000 299 0 0 2 -1
   7: 00000000000000000000000001000000:177C 00000000000000000000000000000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 39678 1 ffff88007209a800 299 0 0 2 -1

real	0m0.746s
user	0m0.000s
sys	0m0.504s
[root@becks ~]# time cat /proc/net/tcp
  sl  local_address rem_address   st tx_queue rx_queue tr tm->when retrnsmt   uid  timeout inode                                                     
   0: 00000000:EA62 00000000:0000 0A 00000000:00000000 00:00000000 00000000    42        0 9785 1 ffff880077974280 299 0 0 2 -1                      
   1: 00000000:E8E6 00000000:0000 0A 00000000:00000000 00:00000000 00000000   708        0 27267 1 ffff880077894280 299 0 0 2 -1                     
   2: 00000000:EC26 00000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 6738 1 ffff880077970980 299 0 0 2 -1                      
   3: 00000000:170D 00000000:0000 0A 00000000:00000000 00:00000000 00000000   708        0 26825 1 ffff880077893900 299 0 0 2 -1                     
   4: 00000000:4ACE 00000000:0000 0A 00000000:00000000 00:00000000 00000000   496        0 8578 1 ffff880077971c80 299 0 0 2 -1                      
   5: 00000000:006F 00000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 6679 1 ffff880077970000 299 0 0 2 -1                      
   6: 00000000:1771 00000000:0000 0A 00000000:00000000 00:00000000 00000000   708        0 26819 1 ffff880077892f80 299 0 0 2 -1                     
   7: 017AA8C0:0035 00000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 8850 1 ffff880077972f80 299 0 0 2 -1                      
   8: 00000000:0016 00000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 8044 1 ffff880077971300 299 0 0 2 -1                      
   9: 0100007F:0277 00000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 8859 1 ffff880077973900 299 0 0 2 -1                      
  10: 0100007F:0019 00000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 8329 1 ffff880077890980 299 0 0 2 -1                      
  11: 0100007F:177A 00000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 36414 1 ffff880077895f00 299 0 0 2 -1                     
  12: 0100007F:177B 00000000:0000 0A 00000000:00000000 00:00000000 00000000   708        0 20746 1 ffff880077892600 299 0 0 2 -1                     
  13: 0100007F:177C 00000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 39679 1 ffff880077896880 299 0 0 2 -1                     
  14: 7B20220A:0016 1620220A:C006 01 00000000:00000000 02:00094619 00000000     0        0 20667 3 ffff880077891c80 20 4 30 4 -1                     
  15: 0100007F:0016 0100007F:ED8B 01 00000000:00000000 02:000A7A43 00000000     0        0 39595 2 ffff880077975580 20 3 29 4 -1                     
  16: 0100007F:ED8B 0100007F:0016 01 00000000:00000000 02:000A7A36 00000000   708        0 39594 2 ffff880077974c00 20 3 26 4 -1                     
  17: 0100007F:EB7D 0100007F:170D 01 00000000:00000000 00:00000000 00000000   708        0 29882 1 ffff880077894c00 20 3 31 3 -1                     
  18: 0100007F:170D 0100007F:EB7D 01 00000000:00000000 00:00000000 00000000   708        0 29883 1 ffff880077895580 20 3 8 3 19                      
  19: 7B20220A:4ACE 1620220A:B3CD 01 00000000:00000000 00:00000000 00000000   496        0 10568 1 ffff880077890000 20 3 0 5 -1                      
  20: 7B20220A:0016 1620220A:AFE3 01 00000000:00000000 02:0004FF6D 00000000     0        0 36338 2 ffff880077891300 20 3 7 5 -1                      

real	0m0.775s
user	0m0.002s
sys	0m0.769s

Comment 10 Zbysek MRAZ 2008-09-15 12:46:01 UTC
This is work desktop, where is only ssh, vnc, gkrellmd port and somethimes the iceccd ports used and open. No other traffic then normal stuff like browsing, nfs using or messaging thru company xmpp is used...
There is some problems within ipv6 network in our office, so it's not used now.

Comment 11 Neil Horman 2008-09-15 18:59:27 UTC
Note to self.  The problem appears to be happening in tcp_seq_next (which is common to both the v4 and v6 version of the file).  It doesn't always take a long time to run, but it appears that it occasionally takes about 100ms to execute.  Subsequent calls to tcp_seq_stop and tcp4_seq_show are both seen afterwards, so it doesn't seem related to the end of a list traversal or anything obvious like that

Comment 12 Neil Horman 2008-09-15 19:05:40 UTC
Further note to self.  It looks like we start getting 100ms delays when we travers a list of sockets and the sockets we traverse are in the established state.

Comment 13 Neil Horman 2008-09-15 19:41:42 UTC
It appears that each pass through established_get_next requires anywhere from 50-100ms.  Looking into why that is.

Comment 14 Neil Horman 2008-09-15 20:52:28 UTC
Looks like we're always taking the longest path through extablished_get_next, going through sk_for_each_node, setting our state to TIMED_WAIT, and repeating the loop.  Need to figure out why

Comment 15 Neil Horman 2008-09-17 18:04:08 UTC
note to self: we seem to be going through the loop defined by goto get_tw thousands of times from a cleanly booted system.  It appears the tcpinfo_ehash size can be rather large (I think).

Comment 16 Neil Horman 2008-09-17 20:49:54 UTC
ok, this isn't looking good.  The latency is the result of needing to do a read_lock on every hash bucket in the ehash set for the tcp_hashinfo structure.  Since thats 65536 entries long on large memory systems, its 131,000 lock operation regardless of how many entries are actually used, and that accounts for most of that time.  I need to do some looking to see if we can improve on that, but I don't think it looks good.

Comment 17 Hans de Goede 2008-09-18 12:36:34 UTC
Neil,

The locking problem has been discussed intensively on the Linux-network list, and a patch for it has been written by someone from Suse, so you'd better go and check the linux-network subsystem mailinglist archives.

Comment 18 Neil Horman 2008-09-18 13:17:54 UTC
I found it after I posted yesterday afternoon:
http://lists.openwall.net/netdev/2008/08/26/104
And quite honestly, At first I didn't quite trust it, but since we still have to take the lock before we dereference anything, I think we're good.  I'll backport the patch.

Comment 19 Neil Horman 2008-09-18 13:19:49 UTC
scratch the backporting, Its early and I was thinking this was RHEL-5.  Davej should get this in the next F-9 & rawhide kernel updates .  Closing as nextrelease