Bug 155827 - (x86-64 1258+) spontaneous reboots, deadlocks, panics
(x86-64 1258+) spontaneous reboots, deadlocks, panics
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
rawhide
x86_64 Linux
medium Severity high
: ---
: ---
Assigned To: Dave Jones
Brian Brock
:
: 155844 (view as bug list)
Depends On:
Blocks: FC4Blocker
  Show dependency treegraph
 
Reported: 2005-04-24 07:49 EDT by Sylvain Rouillard
Modified: 2015-01-04 17:19 EST (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-04-28 17:25:29 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
lspci (1.46 KB, text/plain)
2005-04-24 12:42 EDT, Gene Czarcinski
no flags Details
lsmod and lspci, as requested (3.41 KB, text/plain)
2005-04-24 13:04 EDT, Sylvain Rouillard
no flags Details
lsmod without the proprietary nvidia driver (1.78 KB, text/plain)
2005-04-24 16:27 EDT, Sylvain Rouillard
no flags Details

  None (edit)
Description Sylvain Rouillard 2005-04-24 07:49:04 EDT
Description of problem:
The system randomly reboots or randomly freezes (no mouse nor keyboard response
at all).

Version-Release number of selected component (if applicable):
This seemingly started after the last yum update (23/04). the 1261 kernel seems
affected, and so does 1258. Booting on 1253 restaures the system stability.

How reproducible:
Always, sooner or later

Steps to Reproduce:
1. Boot the system on the 1261 or 1258 version of the kernel
2. Wait, between 5 minutes and an hour
  
Actual results:
The system becomes completely unresponsive or even reboots on its own.

Expected results:
The system stays up and running

Additional info:
I couldn't find what triggers the reboot. This has been happening when I was
using various applications, and even when I was away (I found the computer
frozen on the screen saver). So, this seems to be unrelated to the user actions.
I also have to mention that I've been using the 1258 kernel the day before and
that it has been stable all the day, although it is now as unstable as 1261 is.
Nothing has changed on my system (hardware/software) over the last week or so,
except for the yum updates that I do daily. Of course I should also mention that
those reboots are so sudden that they let /var/log/messages all clean.
I checked my RAM with memtest and it is clean. Anyway I doubt this is a hardware
problem (though I have to admit this very well looks like one) since the exact
same thing happens to 3 other x86_64 users of the fedora-test list.

This is my first bugzilla report, so feel free to ask whatever info I could have
forgotten.

Cheers
Comment 1 Arjan van de Ven 2005-04-24 12:10:45 EDT
can you post an lsmod output?
With some luck we can find common ground between the users that way
Comment 2 Gene Czarcinski 2005-04-24 12:40:16 EDT
Also see this on 1261 (x86_64) but 1253 OK.
Comment 3 Gene Czarcinski 2005-04-24 12:42:54 EDT
Created attachment 113610 [details]
lspci

ASUS SK8V with Opteron 140
Comment 4 Sylvain Rouillard 2005-04-24 13:04:21 EDT
Created attachment 113613 [details]
lsmod and lspci, as requested
Comment 5 Arjan van de Ven 2005-04-24 16:03:14 EDT
nvidia               4580544  12
Comment 6 Sylvain Rouillard 2005-04-24 16:12:00 EDT
01:00.0 VGA compatible controller: ATI Technologies Inc Radeon R200 QM [Radeon
9100] (rev 80)

(from Gene's lspci)
Comment 7 Sylvain Rouillard 2005-04-24 16:27:05 EDT
Created attachment 113617 [details]
lsmod without the proprietary nvidia driver
Comment 8 Sylvain Rouillard 2005-04-24 16:31:00 EDT
I rebooted with 1261 and without the ugly proprietary nvidia beast (nv instead).
See the lsmod I got with this config above (Comment #7). Again, it crashed: one
freeze and one reboot, in no time.

I hope that helps.
Comment 9 Sylvain Rouillard 2005-04-26 11:25:51 EDT
Add today's 1267 to the list of not useable kernels.
Comment 10 John Pearson 2005-04-26 16:45:43 EDT
04/26/2005-03:28:47 PM-EDT 
 
I booted up the 1267 kernel revision about an hour ago.  The machine has 
locked or spontaneously rebooted 5 times. 
 
uname -a 
Linux Katchoo.crooks1722.hab 2.6.11-1.1267_FC4 #1 
Mon Apr 25 19:41:39 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux 
 
 
The only usable thing is this message from the monitor: 
 
Call Trace: <#DF><0> kernel panic - not syncing; Kernel/module.c : 2074: 
sin_lock 
(Kernel module.c: ffffffff80489420) already locked by Kernel/module.c/2039. 
(Not tainted) 
 
-- The message repeats 4 times -- 
 
Call Trace: <#SS>[<000 ... 00001>] <ffffffff8013a015>{panic+133} 
 
--End of screen -- 
 
-Jpearson 
 
Comment 11 Warren Togami 2005-04-27 05:10:52 EDT
*** Bug 155844 has been marked as a duplicate of this bug. ***
Comment 12 Warren Togami 2005-04-27 05:24:02 EDT
Confirmed seeing this on my Athlon64 with MSI motherboard.  1253 and earlier
kernels were unaffected, while 1258 through 1268 are confirmed broken in the
same way.  I am able to reproduce this readily without X running by rebuilding
the perl package, at the end during check-rpaths (from fedora-rpmdevtools) it
almost always triggers this bug.

8/15 times it deadlocked.  5/15 times it rebooted spontaneously, and twice it
had a kernel panic with traceback.

I have been unable to capture the traceback however because netconsole is not
working, and I can't get a serial console working for some strange reason.

This bug is very critical and maybe should block FC4test3 because it is unlikely
to survive long enough to install.
Comment 13 Warren Togami 2005-04-27 05:54:36 EDT
#!/bin/bash
RPM_BUILD_ROOT=`pwd` /usr/lib/rpm/check-rpaths
exec ./test.sh

After the perl build causes the system to crash during check-rpaths, I can
reproduce it readily by going into that directory and running the above script.
 It crashes within a minute.  HOWEVER this looping script in single user mode
does not cause the system to crash.

Runlevel 3 running the script does crash the system.  My totally wild guess is
that it takes disk activity plus something else to cause a race condition and
this crash.  My x86_64 system is an "everything" install so Bug 155893 may be
causing constant background disk activity, which explains the spontaneous
reboots while sitting idle at the gdm screen (and my nearly 1GB var/log/messages.)
Comment 14 Warren Togami 2005-04-27 14:25:30 EDT
Upstream linux-2.6.12-rc3 seems affected by the same issue.
Comment 15 Warren Togami 2005-04-27 17:34:07 EDT
http://people.redhat.com/wtogami/temp/kernel-x86_64/
kernel-2.6.11-1.1275_FC4.x86_64 seems to fix this for me.  Everyone please test
and report back.
Comment 16 Sylvain Rouillard 2005-04-27 19:00:48 EDT
It's been up for an hour now, with no particular problem. If it's still buggy,
then I got damn lucky (I rarely got uptimes of 1h before a crash). I'll let it
up for the night to be sure tho, and I'll post back when I wake up.

Good job
Comment 17 Sylvain Rouillard 2005-04-28 04:20:40 EDT
It's been up all the night (*thumbs up*). So, from my perspective, that is
fixed. Well done!
Comment 18 Gene Czarcinski 2005-04-28 11:36:38 EDT
OK, things are better with 1275 since the system  stays up.  BUT, it is still
hosed since running any 32 bit application results in a Segmentation fault.
Comment 19 Jurgen Kramer 2005-04-28 13:10:51 EDT
1275 seems to be running ok. No more lockups, looks good! (I have lots of other
problems but that is another mather).
Comment 20 Dave Jones 2005-04-28 17:25:29 EDT
The 32 bit segfault bug is 155790



Note You need to log in before you can comment on or make changes to this bug.