Bug 176939

Summary: Crash while running IPVS
Product: Red Hat Enterprise Linux 4 Reporter: daryl herzmann <akrherz>
Component: kernelAssignee: Thomas Graf <tgraf>
Status: CLOSED DUPLICATE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0CC: jbaron, rkhan
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-02-01 18:06:53 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
netdump log
none
netdump log none

Description daryl herzmann 2006-01-04 15:48:47 UTC
Description of problem:
I have a Sun Fire x2100 that runs as a redundant LVS Director.  The machine will
lock up after various amounts of time as running as the LVS Director.  The
machine appears to be stable when not running IPVS.

While running rhel4u2 kernel, the machine would hard lock and not invoke
netdump.  I installed the rhel4u3 beta kernel and now am getting a netdump.

Version-Release number of selected component (if applicable):
2.6.9-27.EL #1 Tue Dec 20 19:11:47 EST 2005 x86_64 x86_64 x86_64 GNU/Linux

How reproducible:
After running IPVS for a while, the machine will hard lock.


Additional info:
Will attach netdump log file.  Note that netdump never will fully reboot the
machine.  The netdump server logs lots of these messages:

Dec 30 23:32:01 server netdump[2472]: Got too many timeouts in handshaking,
ignoring client x.x.x.x 

Dec 30 23:32:04 server netdump[2472]: Got too many timeouts waiting for
SHOW_STATUS for client x.x.x.x, rebooting it

thanks!

Comment 1 daryl herzmann 2006-01-04 15:48:47 UTC
Created attachment 122761 [details]
netdump log

Comment 2 daryl herzmann 2006-01-26 15:18:27 UTC
Greetings,

The lockups continue.  They seem to be about 7-10 days apart.  The netdump
failures continue as well.  Perhaps I should file a seperate bug on netdump failing?

daryl

Comment 3 Jason Baron 2006-01-26 15:28:07 UTC
Yes, please file a separate bug for the netdump failuers. thanks.

Comment 4 daryl herzmann 2006-01-26 16:17:20 UTC
thanks.  Done

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=179016

Comment 5 daryl herzmann 2006-02-13 17:10:15 UTC
Created attachment 124570 [details]
netdump log 

Wow, for whatever reason netdump worked this time when the machine crashed!  I
have attached what appears to be the full 'log'.  The machine also rebooted
properly as well.  Thanks!

Comment 6 daryl herzmann 2006-04-10 15:29:59 UTC
Hi, 

Just for an update.  This machine hasn't crashed since the upgrade to
2.6.9-34.ELsmp.  Uptime is currently at 33 days, which is at least 3 times as
long as it would ever stay up previously.  I am getting some of these messages
in dmesg now:

eth1: too many iterations (6) in nv_nic_irq.

Not sure if that is related or not.  eth1 is using 'forcedeth'

thanks,
  daryl

Comment 7 daryl herzmann 2006-04-19 03:52:34 UTC
Oh well, just crashed again and netdump didn't work :(  My luck ran out

Comment 8 daryl herzmann 2006-06-07 20:30:56 UTC
Another crash and netdump failed again.

Comment 9 Jason Baron 2006-08-18 15:22:16 UTC
hmmh, have you tried U4? There are likely relevant fixes there. 

Comment 10 daryl herzmann 2006-08-18 15:24:42 UTC
Hi Jason,

I reinstalled both machines to EL3 and haven't had a single crash on either IPVS
director. Knock on wood!

daryl

Comment 11 Lon Hohberger 2007-02-01 14:51:28 UTC
Jason,

This looks strikingly related to #220149

Comment 13 daryl herzmann 2007-02-01 14:58:30 UTC
Hi,

Just to update.  I continue to use EL3 on both directors without issue.  Sorry
that I can't help with this bug anymore.

daryl

Comment 14 Lon Hohberger 2007-02-01 15:08:08 UTC
Daryl, I am thinking this is just a repeat of #167398.  If the box runs out of
memory because of IPVS, that can cause pretty serious problems.


Comment 15 Lon Hohberger 2007-02-01 18:06:53 UTC
Closing -> Dup of 167398

*** This bug has been marked as a duplicate of 167398 ***