Bug 503124
Summary: | No buffer space available with many concurrent sockets over ipsec connection | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Joe Nall <joe> | ||||||||
Component: | kernel | Assignee: | Neil Horman <nhorman> | ||||||||
Status: | CLOSED WONTFIX | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | low | ||||||||||
Version: | 11 | CC: | itamar, kernel-maint, nhorman, paul.moore | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2010-06-28 12:44:31 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Joe Nall
2009-05-29 01:48:11 UTC
Can you provide a sosreport of the system in question? I'd like to see how much memory you have on board, and what your core networking memory sysctls are set to. Thanks! Created attachment 345910 [details]
sosreport
sosreport from VM from machine ab was run from. If was a clone of the server machine in this test. Problem has been duplicated between 8GB/Quad-core ab client and 32GB/24 Core server so I don't think it is a physical resource issue.
Thanks for that. Lookin at the code, my first guess would be that you're simply not able to allocate buffers under load on this system (although that would seem unlikely). I need to get two systems to set an ipsec tunnel up when I have time, but I tried the test on localhost here and it worked just fine (not that that proves anything). Looking at your sosreport shows you seem to be using a good chunk of memory, but by no means all in your system. Unfortunately for some reason the skbuff slab caches aren't appearing in /proc/slabinfo, so I can't tell how much you have allocated there (I'll need to look into that further). While I'm getting systems rounded up here, can you tell me if removing the ipsec tunnel on your setup and simply running the benchmark provides the same faliing results? Knowing that will help narrow down the amount of code to look through for this problem. Thanks Same two machines and loopback have no issues without ipsec. See 'Expected results: " above. Updated to reflect that this affects Fedora 10 and 11 XfrmInError 0 XfrmInBufferError 0 XfrmInHdrError 0 XfrmInNoStates 0 XfrmInStateProtoError 0 XfrmInStateModeError 0 XfrmInStateSeqError 207 XfrmInStateExpired 0 XfrmInStateMismatch 0 XfrmInStateInvalid 0 XfrmInTmplMismatch 0 XfrmInNoPols 0 XfrmInPolBlock 0 XfrmInPolError 0 XfrmOutError 0 XfrmOutBundleGenError 1144 XfrmOutBundleCheckError 0 XfrmOutNoStates 6000 XfrmOutStateProtoError 0 XfrmOutStateModeError 0 XfrmOutStateSeqError 0 XfrmOutStateExpired 0 XfrmOutPolBlock 0 XfrmOutPolDead 0 XfrmOutPolError 0 Ok, I think I've found the problem, or at least part of it. The error is likely stemming from the XfrmOutBundleGenError stats you see increasing. Those stats get incremented when the xfrm code goes to allocate a dst_entry for the route cache, but winds up not being able to. In that case the GenError stat gets incremented and we return ENOBUFS. There are two reasons this happens, either we are above 2x the garbage collection threshold, or we simply can't get any more ram out of heap. Since this appears to be your only issue with this system, I'm going to assume the former is happening. It would be good to monitor your free memory count when the problem happens though, just to be certain. That said, we need to make this tunable, I think, i.e. we need to let the gc thresh be adjustable for the xfrm code the same way its adjustable for the route cache. Currently its hard-coded to 1024 entries. I can start working on this patch soon, but in the interim, would you mind attaching the below patch? Its not a final solution, but it will tell us if we're on the right track. It just bumps up the size of the gc threshold to 8192 from 1024. If that solves the problem we'll know we're on the right path Created attachment 354568 [details]
patch to increase gc threshold on xfrm code
heres the patch. This isn't a permanent solution, just something to make sure that we've identified the problem. If this fixes the issue, I'll go ahead and write the larger patch to make xfrm gc limits tunable.
I see the same error when setting up a he ipv6 tunnel. No buffer space available using Linux-net tools works and using linux iproute2 give the No buffer space available I patched the fedora 10 2.6.29 kernel in updates-testing and it looks good so far. Ok, cool, I'll start working on a more appropriate port for upstream shortly. Thanks! I've sent the patch upstream, and cc'd you on it. As soon as its accepted, I'll backport it to F-11: http://marc.info/?l=linux-netdev&m=124871899201909&w=2 If possible a backport to F10 would be appreciated. This message is a reminder that Fedora 11 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 11. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '11'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 11's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 11 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping Fedora 11 changed to end-of-life (EOL) status on 2010-06-25. Fedora 11 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed. |