Bug 176939 - Crash while running IPVS
Summary: Crash while running IPVS
Keywords:
Status: CLOSED DUPLICATE of bug 167398
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Thomas Graf
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-01-04 15:48 UTC by daryl herzmann
Modified: 2014-06-18 08:28 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-02-01 18:06:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
netdump log (1.64 KB, text/plain)
2006-01-04 15:48 UTC, daryl herzmann
no flags Details
netdump log (31.49 KB, text/plain)
2006-02-13 17:10 UTC, daryl herzmann
no flags Details

Description daryl herzmann 2006-01-04 15:48:47 UTC
Description of problem:
I have a Sun Fire x2100 that runs as a redundant LVS Director.  The machine will
lock up after various amounts of time as running as the LVS Director.  The
machine appears to be stable when not running IPVS.

While running rhel4u2 kernel, the machine would hard lock and not invoke
netdump.  I installed the rhel4u3 beta kernel and now am getting a netdump.

Version-Release number of selected component (if applicable):
2.6.9-27.EL #1 Tue Dec 20 19:11:47 EST 2005 x86_64 x86_64 x86_64 GNU/Linux

How reproducible:
After running IPVS for a while, the machine will hard lock.


Additional info:
Will attach netdump log file.  Note that netdump never will fully reboot the
machine.  The netdump server logs lots of these messages:

Dec 30 23:32:01 server netdump[2472]: Got too many timeouts in handshaking,
ignoring client x.x.x.x 

Dec 30 23:32:04 server netdump[2472]: Got too many timeouts waiting for
SHOW_STATUS for client x.x.x.x, rebooting it

thanks!

Comment 1 daryl herzmann 2006-01-04 15:48:47 UTC
Created attachment 122761 [details]
netdump log

Comment 2 daryl herzmann 2006-01-26 15:18:27 UTC
Greetings,

The lockups continue.  They seem to be about 7-10 days apart.  The netdump
failures continue as well.  Perhaps I should file a seperate bug on netdump failing?

daryl

Comment 3 Jason Baron 2006-01-26 15:28:07 UTC
Yes, please file a separate bug for the netdump failuers. thanks.

Comment 4 daryl herzmann 2006-01-26 16:17:20 UTC
thanks.  Done

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=179016

Comment 5 daryl herzmann 2006-02-13 17:10:15 UTC
Created attachment 124570 [details]
netdump log 

Wow, for whatever reason netdump worked this time when the machine crashed!  I
have attached what appears to be the full 'log'.  The machine also rebooted
properly as well.  Thanks!

Comment 6 daryl herzmann 2006-04-10 15:29:59 UTC
Hi, 

Just for an update.  This machine hasn't crashed since the upgrade to
2.6.9-34.ELsmp.  Uptime is currently at 33 days, which is at least 3 times as
long as it would ever stay up previously.  I am getting some of these messages
in dmesg now:

eth1: too many iterations (6) in nv_nic_irq.

Not sure if that is related or not.  eth1 is using 'forcedeth'

thanks,
  daryl

Comment 7 daryl herzmann 2006-04-19 03:52:34 UTC
Oh well, just crashed again and netdump didn't work :(  My luck ran out

Comment 8 daryl herzmann 2006-06-07 20:30:56 UTC
Another crash and netdump failed again.

Comment 9 Jason Baron 2006-08-18 15:22:16 UTC
hmmh, have you tried U4? There are likely relevant fixes there. 

Comment 10 daryl herzmann 2006-08-18 15:24:42 UTC
Hi Jason,

I reinstalled both machines to EL3 and haven't had a single crash on either IPVS
director. Knock on wood!

daryl

Comment 11 Lon Hohberger 2007-02-01 14:51:28 UTC
Jason,

This looks strikingly related to #220149

Comment 13 daryl herzmann 2007-02-01 14:58:30 UTC
Hi,

Just to update.  I continue to use EL3 on both directors without issue.  Sorry
that I can't help with this bug anymore.

daryl

Comment 14 Lon Hohberger 2007-02-01 15:08:08 UTC
Daryl, I am thinking this is just a repeat of #167398.  If the box runs out of
memory because of IPVS, that can cause pretty serious problems.


Comment 15 Lon Hohberger 2007-02-01 18:06:53 UTC
Closing -> Dup of 167398

*** This bug has been marked as a duplicate of 167398 ***


Note You need to log in before you can comment on or make changes to this bug.