Description of problem: I have a Sun Fire x2100 that runs as a redundant LVS Director. The machine will lock up after various amounts of time as running as the LVS Director. The machine appears to be stable when not running IPVS. While running rhel4u2 kernel, the machine would hard lock and not invoke netdump. I installed the rhel4u3 beta kernel and now am getting a netdump. Version-Release number of selected component (if applicable): 2.6.9-27.EL #1 Tue Dec 20 19:11:47 EST 2005 x86_64 x86_64 x86_64 GNU/Linux How reproducible: After running IPVS for a while, the machine will hard lock. Additional info: Will attach netdump log file. Note that netdump never will fully reboot the machine. The netdump server logs lots of these messages: Dec 30 23:32:01 server netdump[2472]: Got too many timeouts in handshaking, ignoring client x.x.x.x Dec 30 23:32:04 server netdump[2472]: Got too many timeouts waiting for SHOW_STATUS for client x.x.x.x, rebooting it thanks!
Created attachment 122761 [details] netdump log
Greetings, The lockups continue. They seem to be about 7-10 days apart. The netdump failures continue as well. Perhaps I should file a seperate bug on netdump failing? daryl
Yes, please file a separate bug for the netdump failuers. thanks.
thanks. Done https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=179016
Created attachment 124570 [details] netdump log Wow, for whatever reason netdump worked this time when the machine crashed! I have attached what appears to be the full 'log'. The machine also rebooted properly as well. Thanks!
Hi, Just for an update. This machine hasn't crashed since the upgrade to 2.6.9-34.ELsmp. Uptime is currently at 33 days, which is at least 3 times as long as it would ever stay up previously. I am getting some of these messages in dmesg now: eth1: too many iterations (6) in nv_nic_irq. Not sure if that is related or not. eth1 is using 'forcedeth' thanks, daryl
Oh well, just crashed again and netdump didn't work :( My luck ran out
Another crash and netdump failed again.
hmmh, have you tried U4? There are likely relevant fixes there.
Hi Jason, I reinstalled both machines to EL3 and haven't had a single crash on either IPVS director. Knock on wood! daryl
Jason, This looks strikingly related to #220149
Hi, Just to update. I continue to use EL3 on both directors without issue. Sorry that I can't help with this bug anymore. daryl
Daryl, I am thinking this is just a repeat of #167398. If the box runs out of memory because of IPVS, that can cause pretty serious problems.
Closing -> Dup of 167398 *** This bug has been marked as a duplicate of 167398 ***