Description of problem: Running IPSec (OpenSWAN) on 2.6.26.5-45 kernel. System dies at random times. No panic, no aiieee. Network is dead and user space is dead. It is possible to reboot the system using Alt-SysRq if that has been enabled. Problem is NOT present in 2.6.26.3-29 kernel. Examining the Changelog for 2.6.26.6 there appears to have been a serious problem in a bottom half handler fixing a recursive lock that is IPSec related. From the changelog: commit fc69b36cd5d05d78c7aa34fd490e8f156be9e5f6 Author: Herbert Xu <herbert.org.au> Date: Mon Sep 15 11:48:46 2008 -0700 udp: Fix rcv socket locking [ Upstream commit 93821778def10ec1e69aa3ac10adee975dad4ff3 ] The previous patch in response to the recursive locking on IPsec reception is broken as it tries to drop the BH socket lock while in user context. This patch fixes it by shrinking the section protected by the socket lock to sock_queue_rcv_skb only. The only reason we added the lock is for the accounting which happens in that function. Signed-off-by: Herbert Xu <herbert.org.au> Signed-off-by: David S. Miller <davem> Signed-off-by: Greg Kroah-Hartman <gregkh> Uncertain if this fixes the problem or if there is some other problem lurking in this new lock. Version-Release number of selected component (if applicable): 2.6.26.3 - Not broken 2.6.26.5 - Broken 2.6.26.6 - Undetermined How reproducible: Highly reliable, just time consuming. Steps to Reproduce: 1. Set up an IPSec connection between two machines 2. Restart OpenSWAN on one of the machines a few times (dozen or so) 3. Machine eventually freezes Alternative... Just bring up the connection and wait for a few hours. System will eventually die on it's own, it just takes longer. Additional info: I'm getting ready to test a kernel.org 2.6.26.6 kernel. When that's built and tested, I'll add those test results.
Another IPSec related fix in 2.6.26.6 which could possibly account for the problem: commit b047cf6dfa81ca03b62f2e3ae63793ef5c300158 Author: Herbert Xu <herbert.org.au> Date: Tue Sep 30 02:03:19 2008 -0700 ipsec: Fix pskb_expand_head corruption in xfrm_state_check_space [ Upstream commit d01dbeb6af7a0848063033f73c3d146fec7451f3 ] We're never supposed to shrink the headroom or tailroom. In fact, shrinking the headroom is a fatal action. Signed-off-by: Herbert Xu <herbert.org.au> Signed-off-by: David S. Miller <davem> Signed-off-by: Greg Kroah-Hartman <gregkh>
A 2.6.26.6 based Fedora kernel should be in updates-testing really soon.
That's good. It took over 3 hours to build a stock kernel.org kernel for 2.6.26.6 but it's now been running in my VMware test environment for over an hour with restarting the IPSec environment every 5 minutes. I won't claim that's definitive but the 2.5.26.5 kernel would have never lasted this long. Looking good and looking forward to laying my hands on those kernel rpms.
Ignore the typo in the past message... 2.6.26.6 not 2.5.26.5. Duh.
I'm going to assume that 2.6.26.6 fixes the problem.
I think that's a very good assumption. I have not had a single lockup, IPSec related or otherwise, in any of my testbeds running 2.6.26.6 either stock kernel.org kernels or the 2.6.26.6-67 from Koji. I've just build my own 2.6.27-3 kernels for F9 from the Koji srpm and will be tested that next.
kernel-2.6.26.6-46.fc8 has been submitted as an update for Fedora 8. http://admin.fedoraproject.org/updates/kernel-2.6.26.6-46.fc8
kernel-2.6.26.6-71.fc9 has been submitted as an update for Fedora 9. http://admin.fedoraproject.org/updates/kernel-2.6.26.6-71.fc9
kernel-2.6.26.6-79.fc9 has been pushed to the Fedora 9 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update kernel'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F9/FEDORA-2008-8929
kernel-2.6.26.6-79.fc9 has been pushed to the Fedora 9 stable repository. If problems still persist, please make note of it in this bug report.