Bug 102961 - rh9, latest kernel, machine lockup during 'up2date' operation.
Summary: rh9, latest kernel, machine lockup during 'up2date' operation.
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 9
Hardware: i386
OS: Linux
medium
high
Target Milestone: ---
Assignee: Arjan van de Ven
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2003-08-23 14:42 UTC by Jeff MacDonald
Modified: 2007-04-18 16:57 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-09-30 15:41:28 UTC
Embargoed:


Attachments (Terms of Use)

Description Jeff MacDonald 2003-08-23 14:42:27 UTC
Description of problem:

I have a fresh install of rh9 from official media. after install at the console,
I ran 'up2date -u -f' as root. this started the process of updating the system
with the latest rpms. after apprxoximately 20 minutes, the power LED on the case
of the machine begins to flash as though the APM has kicked in. shortly after
that, right in the middle of up2date, the machine becomes unresponsive: the
keyboard does not respond, and it is unpingable from the LAN. I have to hit the
reset button on the machine.

Version-Release number of selected component (if applicable):
kernel-2.4.20-20.9

How reproducible:


Steps to Reproduce:
1.
2.
3.
    
Actual results:


Expected results:


Additional info:

thinking that APM might be the issue, I rebooted the machine after a crash and
went to bed. when I got up in the morning, everything was fine, however once I
started heavy activity on the machine, the green light on the case begins to
flash, and again the lockup.

I rebooted again and disabled APM features of the BIOS and started the up2date.
again, I had the same lockup problem.

I altered the grub.conf file to pass "apm=off" to the kernel command line, and
tried again. again, after a short time, the machine locks up.

knowing that I had a short window within which to operate, I rebooted again and
this time ran 'up2date -u -f kernel'. this update was successful, and I rebooted
the machine via the LAN. I thought all was well.

I logged in at the console and again ran 'up2date -u', hoping the updated kernel
would resolve the issue, but alas, it locked up while doing something with
glibc-common. I am not sure if it was 'installing' at the time or not, but
hopefully since I turned on transactions it will be fine.

I have done some testing, and if I do not run 'up2date -u', but rather stay
logged in remotely and hit return at a shell prompt now and again (i.e., machine
quite idle), it is stable. the green light does not flash on the box, and all
seems well. it has been up for over an hour now without an issue.

I suspect APM because the green light on the case flashes. I suspect kernel
because this only happens when the machine is under load (>1.00).

note that when the green light flashes on the box, the keyboard of the console
is not responsive, nor is the network interface, yet the BIOS is responsive
enough to reboot if I hit the reset button.

well, the machine just now rebooted due to a power fluctuation, so the uptime is
a little over an hour without an issue. if I were to run up2date, it would take
only about 20 minutes for the machine to lock up.

Comment 1 Jeff MacDonald 2003-08-24 01:01:44 UTC
I thought I had solved this problem by installing up2date-gnome, since it seemed
to proceed normally for quite a while.. but it locked up eventually :(

it seems that it *always* locks up while running up2date. it seems I can do
anything else I want on the machine without trouble, but if I run up2date, the
machine goes down soon after, usually midway through an install or something,
which is annoying.

the machine is not really doing anything all that amazing right now, as it just
runs a DNS cache for the lan.. so I can reboot it a lot, and it doesn't really
impact public services. 

[..much time passes..]

ok, quark is up and seems stable. 

I basically upgraded the system in a series of transactions. I upgraded glibc
and glibc-common, then proceeded to start a new transaction that would upgrade
everything else. 

it crashed during this 2nd phase, but when I rebooted again and ran up2date via
strace, I had no problems, and it upgraded everything as it was supposed to.

it would be nice to know how exactly the machine could have been put into sleep
mode (or whatever mode it was in) even though I have APM turned off in the BIOS,
and passed 'apm=off' on the command line.


bottom line is that my machine is stable now (or at least it seems to be), but
I'd still like to know where/why up2date seems to lock up, especially since it
is a total "denial of service".. I had quick access to the reset button,
fortunately, but what if I hadn't?

note that I am changing the component for this bug to 'up2date', when what I
really want is to select both, since I suspect it is some sort of odd interaction.


Comment 2 Jeff MacDonald 2003-08-24 01:03:18 UTC
[~] [9:02pm] [quark] % cat /proc/cpuinfo 
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 5
model           : 8
model name      : AMD-K6(tm) 3D processor
stepping        : 12
cpu MHz         : 350.802
cache size      : 64 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr mce cx8 pge mmx syscall 3dnow k6_mtrr
bogomips        : 699.59


Comment 3 Barry K. Nathan 2003-08-24 01:19:50 UTC
What do you mean by the "machine just rebooted due to a power fluctuation"? If
the machine just rebooted out-of-the-blue, it could be some kind of power supply
or motherboard defect that could also be what's causing the up2date lockups.
stracing up2date might have changed timings enough to prevent you from hitting
whatever hardware problem is causing the lockup. (I've seen that kind of thing
with hardware failures before.)

I'm not saying I'm 100% confident it's hardware, but when I've seen machines act
like this, it's usually if not always hardware...

(And if the power is really fluctuating to the point that you can see it in any
nearby lights or the like, and the machine rebooted at the same time, it would
be a good idea to put it on a UPS.)

Comment 4 Jeff MacDonald 2003-08-24 22:57:18 UTC
very simply put, I have a UPS, but I have *way* too much stuff plugged into it,
and I sometimes see "spontaneous" reboots of several machines connected to it. I
do not think that is causing my problem, though.

note that quark has been up and stable for 24 hours now, so I am tempted to
close this bug.


Comment 5 Jeff MacDonald 2003-08-24 22:58:44 UTC
quark is one of my oldest machines, so I am not suprised it has issues. if it
stays up for another 24 hours, I'll close this.


Comment 6 Adrian Likins 2003-09-17 19:01:32 UTC
either way, this is would be a kernel bug, not an up2date one, reassigning
to kernel (if not just a hardware issue)

Comment 7 Jeff MacDonald 2003-09-18 01:25:53 UTC
well, I don't know for sure what the heck the problem is, but quark has been
stable for quite some time.. it has rebooted a couple of times, so the uptime is
only three days, but I have not noticed the same symptoms that I opened this
ticket about. I agree that this is a kernel issue, not an up2date one. I feel
strongly that the power fluctuations and "bad hardware" are not the issue,
especially since the machine has been fine for weeks now.. then again, I'm not
exactly pushing quark above 1.00, so who knows?

would it be helpful if I attached dmesg output, or other stuff? 

I suppose I could go out of my way to run some sort of stress on the machine and
see if I can get it to break, but I'd rather not :)

Comment 8 Julien Olivier 2003-11-27 17:43:59 UTC
I'm not sure it is related but I have a rather similar problem. If I
let my laptop idle while running GNOME, the hard-drive led will start
blinking, my keyboard won't work anymore and I'll have to reset the
machine... It only happens when I'm using GNOME though. I have tried
to reporduce by just running TWM and XMMS but it didn't happen. Then,
I guess that it only happens if the CPU is much used (Evolution +
Rhythmbox + GNOME uses more CPU than TWM + xmms).

I have re-produces the problem using rhythmbox-xine,
rhythmbox-gstreamer or xmms, and OSS or ALSA.

Comment 9 Bugzilla owner 2004-09-30 15:41:28 UTC
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/



Note You need to log in before you can comment on or make changes to this bug.