Bug 499887
Summary: | IPSEC dosen't work with a big SPD/SAD | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Marc Milgram <mmilgram> |
Component: | kernel | Assignee: | Neil Horman <nhorman> |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Red Hat Kernel QE team <kernel-qe> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | 5.1 | CC: | dzickus, herbert.xu, tao, tgraf |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2009-10-21 19:57:58 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 533192 |
Description
Marc Milgram
2009-05-08 18:35:44 UTC
Looks like you might be right about the git commits. Do you already have this setup to reproduce on a set of systems somewhere, or do I need to set it up myself? I don't have a setup to test this. So, I don't have enough hosts to actually validate all the links, but I setup a few hundred SA and SPD entries (1 24 bit subnet worth of each) and started racoon in the foreground with debug maxed out. I was able to observe racoon read all of the resultant entries produced by the SPDDUMP it issues at startup, so I'm not sure whats going on here. I agree the above commits look like they might improve performance in reading those entries, but I'm hesitant to take them just because something might be a bit slow. That said, I am using newer kernel that has some fixes for UNIX sockets in it (although I could have sworn setkey uses the PF_KEY address family rather than PF_UNIX, I'll need to check on that). Anywho, can the customer try this with the latest kernel? I can read the bugzilla, I'm saying I can't reproduce the problem. I loaded 254 SPD rules and 254 associated SA rules on a system, and started racoon in the foreground with full debug. Parsing through the output, I see that racoon reads all 254 entries from the X_SPDDUMP request. (te results are on amd-toonie2-01.rhts.bos.redhat.com:/root/resutls if you want to see, just grep sub: results). Anywho, it obviously works for me on the system I'm using. Looking at the patches above, I see how they might help, but the first looks like an abi breaker, so its out. The second looks doable, but before we take it, I'd really like to see the problem occur consistently, and then cease to occur when we take the patch. My guess is that there is a load aspect to the problem that isn't being considered here. As an alternative, the second patch should apply pretty cleanly to the rhel5 kernel I think. Can the customer try a test kernel out with the second of the above patches included to confirm the fix? The customer is willing to try a test kernel in order to confirm the fix. Gah, the second patch is pretty non-descript, but it requires the first patch to work properly, which makes the whole thing an ABI breaker. I'm going to try to hack something together to make this work, but I can't promise anything. In the meantime, I expect that the customer can work around this issue (assuming the problem is what we assume it is), but setting /proc/sys/net/core/rmem_default and rmem_max to very large numbers. If racoon doesn't explicitly reset those values on any sockets that it opens, that should prevent blocks/drops on the pf_key protocol and avoid this issue. If so, the raccoon startup script can be adjusted to ensure that it starts with a sufficiently large buffer space to make the problem avoidable. Please relay that to the customer and let me know how that works out. |