Bug 169600

Summary: SMP kernel crash when use as LVS router
Product: Red Hat Enterprise Linux 4 Reporter: soul916
Component: kernelAssignee: Thomas Graf <tgraf>
Status: CLOSED DUPLICATE QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 4.0CC: davem, ian, jbaron, lhh, pgauthier, rkhan
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-03-29 18:31:55 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 181409    

Description soul916 2005-09-30 02:59:32 UTC
Description of problem:
When use LVS in RHEL4 with SMP kernel and SMP machine, the kernel will crash 
without error message.
I use RHEL4 as LVS router, on DELL 1850 server.
I use kernel version 2.6.9-17.EL and 2.6.9-11.EL all the SMP kernels have the 
problem, none-SMP kernels seem haven't this issue.
When use SMP kernel as LVS router, after a few hours, the machine will crash 
with out error message, the hole system have no response, the monitor have no 
display and the keyboard NumLock couldn't light.

Version-Release number of selected component (if applicable):
2.6.9-17.EL and 2.6.9-11.EL

How reproducible:
Config a lvs router use SMP machine and use SMP kernel.
Use some script to generate load on the router
After afew hours the router will crash

Steps to Reproduce:
1.config a lvs router use SMP machine and SMP kernel
2.write script generate load on the router
3.After afew hours the router will crash
  
Actual results:


Expected results:
The system will crash after a few hours

Additional info:

Comment 1 Jason Baron 2005-10-07 15:45:50 UTC
ok. it would be helpful if we could get a trace of the crash, if there is
one...Can you hook up a serial console? Is there anything in /var/log/messages?
The relevant script might also be helpful. thanks.

Comment 2 soul916 2005-10-12 04:44:47 UTC
Are your sure before release RHEL4, redhat have tested the LVS function?
I'm sure when the system crash, there is not any thing about pulse, kernel or
nanny in /var/log/messages. I read the log carefully, when the system crash the
last log  is cron excute /usr/bin/mrtg or /usr/lib/sa/sa1.
I find the crash will appear under non-SMP kernel also, but the system uptime is
longer than SMP kernel before crash.
I test with 2.6.9-22.EL SMP and none-SMP kernel the system crashed.
I find the script is not necessary. You could config a LVS cluster use RHEL4,
and then the system will crash in few days without any load.
I'm sorry, but I couldn't hook up serial console to the server.

Comment 3 Ralf Sticklies 2005-12-04 09:18:35 UTC
On console following shows up:

Kernel panic - not syncing: fs/block_dev.c:396: spin_lock 
fs/block_dev.c:c035d88

see also 
http://archive.linuxvirtualserver.org/html/lvs-users/2005-07/msg00124.html

probably solution: new kernel needed: 
http://archive.linuxvirtualserver.org/html/lvs-users/2005-07/msg00131.html

Comment 4 Pascal Gauthier 2006-03-26 22:24:10 UTC
Is there any developement on this one?? I have the same problem. When I check on
the CentOS bugzilla, they have added these two patches:


http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9/2.6.9-mm1/broken-out/ipvs-deadlock-fix.patch
[^]
http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9/2.6.9-mm1/broken-out/cancel_rearming_delayed_work.patch
[^]


REF: http://bugs.centos.org/view.php?id=1201

Comment 5 Ian Neubert 2006-03-29 18:25:57 UTC
I am seeing the same crashes on two single CPU i386 boxes running LVS +
keepalived together. They crashed about every other day. I have since
reinstalled both with Fedora Core 4 with kernel 2.6.15-1.1833_FC4, and have had
no problems.

Comment 6 Jason Baron 2006-03-29 18:31:55 UTC
ok, this looks like a dup of 174990, which we have patch for in the current
rhel4 kernel. please find test kernels at: http://people.redhat.com/~jbaron/rhel4/

*** This bug has been marked as a duplicate of 174990 ***

Comment 7 Ian Neubert 2006-03-30 00:59:42 UTC
Jason: just to clarify is this patch included in 2.6.9-34.EL? Or is it pending
inclusion and is currently only in your test kernel?

Unfortunetly, I'm not able to access bug 174990 to check myself. Thanks!

Comment 8 Jason Baron 2006-05-05 10:47:04 UTC
Patch is not in -34. Its currently only in my test kernel, but this is the beta
kernel for U4.

Comment 9 Linda Wang 2006-05-09 21:48:33 UTC
errata tool clean up, add to U4 CANFIX list for tracking purposes.