Description of problem: With typical TCP tuning parameters allowing socket buffers up to 16MB, CPU spends 90-95% of its time in softirq handling, and the application gets only 1.1% of CPU. Contrary to normal TCP tuning guidance, reducing the maximum socket buffer size to L2 cache size improves application performance dramatically (by 15x or so). Version-Release number of selected component (if applicable): kernel-2.6.20-1.2962.fc6 How reproducible: Apply TCP tuning parameters. Steps to Reproduce: 1. 2. 3. Actual results: Dramatically decreased performance. Expected results: Increased performance. Additional info: Interestingly enough, large tcp_[rw]mem maximums are not always good. An example: An i686 rsync server accepting data performs slowly with tcp_rmem=tcp_wmem=4K 128K 16M settings, with tcp_moderate_rcvbuf=1 (i.e. receiver autotuning is on). After a minute of rsync startup, the machine spends 90-95% of its time in softirq handling and the rsync application makes very slow progress. Profiling the kernel shows that 50% is spent in skb_copy_bits() and another 30% in tcp_collapse(), which gets called by tcp_prune_queue() when socket buffer space needs to be recovered. When tcp_rmem is 16M, this means frequent reading, restructuring, and writing of skb bits beyond the machine's cache size (L2=256K in this case, and cache lines are 128 bytes due to RDRAM memory). Memory accesses are much slower than cache accesses. As a result, the rsync application itself gets only 1.1% of CPU: samples % image name app name symbol name 1753996 49.8357 vmlinux vmlinux skb_copy_bits 1042035 29.6070 vmlinux vmlinux tcp_collapse 149239 4.2403 vmlinux vmlinux kfree 93044 2.6436 vmlinux vmlinux kmem_cache_free 45869 1.3033 vmlinux vmlinux __alloc_skb 41836 1.1887 vmlinux vmlinux __kmalloc 39248 1.1151 rsync rsync (no symbols) ... 6282 0.1785 raid456.ko raid456 copy_data With tcp_rmem=4K 128K 256K and tcp_wmem=4K 64K 256K, performance improves to the point where rsync takes 16% of CPU: samples % image name app name symbol name 140303 16.1519 rsync rsync (no symbols) 85338 9.8243 vmlinux vmlinux skb_copy_bits 43253 4.9794 vmlinux vmlinux _raw_spin_unlock 34675 3.9918 vmlinux vmlinux _raw_spin_lock 34022 3.9167 vmlinux vmlinux tcp_collapse 20596 2.3710 raid456.ko raid456 copy_data While skb_copy_bits() and tcp_collapse() are still prominent, at least it doesn't dominate rsync and RAID module, which now progress at a decent pace, perhaps 15 times faster (probably limited by network and disk speeds). BTW, while this used MTU=1500 over 1GbE link, network delay is very low, so that bandwidth delay product BDP < 256K under normal circumstances. Please reconsider kernel logic which invokes skb_copy_bits() too often and kills performance with large socket buffers.
Is this any different with 2.6.22 kernels available for FC6? Latest test kernel is here: http://people.redhat.com/cebbert/kernels/FC6/kernel-2.6.22.5-49.fc6.i686.rpm
Fedora apologizes that these issues have not been resolved yet. We're sorry it's taken so long for your bug to be properly triaged and acted on. We appreciate the time you took to report this issue and want to make sure no important bugs slip through the cracks. If you're currently running a version of Fedora Core between 1 and 6, please note that Fedora no longer maintains these releases. We strongly encourage you to upgrade to a current Fedora release. In order to refocus our efforts as a project we are flagging all of the open bugs for releases which are no longer maintained and closing them. http://fedoraproject.org/wiki/LifeCycle/EOL If this bug is still open against Fedora Core 1 through 6, thirty days from now, it will be closed 'WONTFIX'. If you can reporduce this bug in the latest Fedora version, please change to the respective version. If you are unable to do this, please add a comment to this bug requesting the change. Thanks for your help, and we apologize again that we haven't handled these issues to this point. The process we are following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp We will be following the process here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this doesn't happen again. And if you'd like to join the bug triage team to help make things better, check out http://fedoraproject.org/wiki/BugZappers
This bug is open for a Fedora version that is no longer maintained and will not be fixed by Fedora. Therefore we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen thus bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.