Bug 155827 - (x86-64 1258+) spontaneous reboots, deadlocks, panics
Summary: (x86-64 1258+) spontaneous reboots, deadlocks, panics
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
Assignee: Dave Jones
QA Contact: Brian Brock
URL:
Whiteboard:
: 155844 (view as bug list)
Depends On:
Blocks: FC4Blocker
TreeView+ depends on / blocked
 
Reported: 2005-04-24 11:49 UTC by Sylvain Rouillard
Modified: 2015-01-04 22:19 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-04-28 21:25:29 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
lspci (1.46 KB, text/plain)
2005-04-24 16:42 UTC, Gene Czarcinski
no flags Details
lsmod and lspci, as requested (3.41 KB, text/plain)
2005-04-24 17:04 UTC, Sylvain Rouillard
no flags Details
lsmod without the proprietary nvidia driver (1.78 KB, text/plain)
2005-04-24 20:27 UTC, Sylvain Rouillard
no flags Details

Description Sylvain Rouillard 2005-04-24 11:49:04 UTC
Description of problem:
The system randomly reboots or randomly freezes (no mouse nor keyboard response
at all).

Version-Release number of selected component (if applicable):
This seemingly started after the last yum update (23/04). the 1261 kernel seems
affected, and so does 1258. Booting on 1253 restaures the system stability.

How reproducible:
Always, sooner or later

Steps to Reproduce:
1. Boot the system on the 1261 or 1258 version of the kernel
2. Wait, between 5 minutes and an hour
  
Actual results:
The system becomes completely unresponsive or even reboots on its own.

Expected results:
The system stays up and running

Additional info:
I couldn't find what triggers the reboot. This has been happening when I was
using various applications, and even when I was away (I found the computer
frozen on the screen saver). So, this seems to be unrelated to the user actions.
I also have to mention that I've been using the 1258 kernel the day before and
that it has been stable all the day, although it is now as unstable as 1261 is.
Nothing has changed on my system (hardware/software) over the last week or so,
except for the yum updates that I do daily. Of course I should also mention that
those reboots are so sudden that they let /var/log/messages all clean.
I checked my RAM with memtest and it is clean. Anyway I doubt this is a hardware
problem (though I have to admit this very well looks like one) since the exact
same thing happens to 3 other x86_64 users of the fedora-test list.

This is my first bugzilla report, so feel free to ask whatever info I could have
forgotten.

Cheers

Comment 1 Arjan van de Ven 2005-04-24 16:10:45 UTC
can you post an lsmod output?
With some luck we can find common ground between the users that way

Comment 2 Gene Czarcinski 2005-04-24 16:40:16 UTC
Also see this on 1261 (x86_64) but 1253 OK.

Comment 3 Gene Czarcinski 2005-04-24 16:42:54 UTC
Created attachment 113610 [details]
lspci

ASUS SK8V with Opteron 140

Comment 4 Sylvain Rouillard 2005-04-24 17:04:21 UTC
Created attachment 113613 [details]
lsmod and lspci, as requested

Comment 5 Arjan van de Ven 2005-04-24 20:03:14 UTC
nvidia               4580544  12

Comment 6 Sylvain Rouillard 2005-04-24 20:12:00 UTC
01:00.0 VGA compatible controller: ATI Technologies Inc Radeon R200 QM [Radeon
9100] (rev 80)

(from Gene's lspci)

Comment 7 Sylvain Rouillard 2005-04-24 20:27:05 UTC
Created attachment 113617 [details]
lsmod without the proprietary nvidia driver

Comment 8 Sylvain Rouillard 2005-04-24 20:31:00 UTC
I rebooted with 1261 and without the ugly proprietary nvidia beast (nv instead).
See the lsmod I got with this config above (Comment #7). Again, it crashed: one
freeze and one reboot, in no time.

I hope that helps.

Comment 9 Sylvain Rouillard 2005-04-26 15:25:51 UTC
Add today's 1267 to the list of not useable kernels.

Comment 10 John Pearson 2005-04-26 20:45:43 UTC
04/26/2005-03:28:47 PM-EDT 
 
I booted up the 1267 kernel revision about an hour ago.  The machine has 
locked or spontaneously rebooted 5 times. 
 
uname -a 
Linux Katchoo.crooks1722.hab 2.6.11-1.1267_FC4 #1 
Mon Apr 25 19:41:39 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux 
 
 
The only usable thing is this message from the monitor: 
 
Call Trace: <#DF><0> kernel panic - not syncing; Kernel/module.c : 2074: 
sin_lock 
(Kernel module.c: ffffffff80489420) already locked by Kernel/module.c/2039. 
(Not tainted) 
 
-- The message repeats 4 times -- 
 
Call Trace: <#SS>[<000 ... 00001>] <ffffffff8013a015>{panic+133} 
 
--End of screen -- 
 
-Jpearson 
 

Comment 11 Warren Togami 2005-04-27 09:10:52 UTC
*** Bug 155844 has been marked as a duplicate of this bug. ***

Comment 12 Warren Togami 2005-04-27 09:24:02 UTC
Confirmed seeing this on my Athlon64 with MSI motherboard.  1253 and earlier
kernels were unaffected, while 1258 through 1268 are confirmed broken in the
same way.  I am able to reproduce this readily without X running by rebuilding
the perl package, at the end during check-rpaths (from fedora-rpmdevtools) it
almost always triggers this bug.

8/15 times it deadlocked.  5/15 times it rebooted spontaneously, and twice it
had a kernel panic with traceback.

I have been unable to capture the traceback however because netconsole is not
working, and I can't get a serial console working for some strange reason.

This bug is very critical and maybe should block FC4test3 because it is unlikely
to survive long enough to install.

Comment 13 Warren Togami 2005-04-27 09:54:36 UTC
#!/bin/bash
RPM_BUILD_ROOT=`pwd` /usr/lib/rpm/check-rpaths
exec ./test.sh

After the perl build causes the system to crash during check-rpaths, I can
reproduce it readily by going into that directory and running the above script.
 It crashes within a minute.  HOWEVER this looping script in single user mode
does not cause the system to crash.

Runlevel 3 running the script does crash the system.  My totally wild guess is
that it takes disk activity plus something else to cause a race condition and
this crash.  My x86_64 system is an "everything" install so Bug 155893 may be
causing constant background disk activity, which explains the spontaneous
reboots while sitting idle at the gdm screen (and my nearly 1GB var/log/messages.)

Comment 14 Warren Togami 2005-04-27 18:25:30 UTC
Upstream linux-2.6.12-rc3 seems affected by the same issue.

Comment 15 Warren Togami 2005-04-27 21:34:07 UTC
http://people.redhat.com/wtogami/temp/kernel-x86_64/
kernel-2.6.11-1.1275_FC4.x86_64 seems to fix this for me.  Everyone please test
and report back.

Comment 16 Sylvain Rouillard 2005-04-27 23:00:48 UTC
It's been up for an hour now, with no particular problem. If it's still buggy,
then I got damn lucky (I rarely got uptimes of 1h before a crash). I'll let it
up for the night to be sure tho, and I'll post back when I wake up.

Good job

Comment 17 Sylvain Rouillard 2005-04-28 08:20:40 UTC
It's been up all the night (*thumbs up*). So, from my perspective, that is
fixed. Well done!

Comment 18 Gene Czarcinski 2005-04-28 15:36:38 UTC
OK, things are better with 1275 since the system  stays up.  BUT, it is still
hosed since running any 32 bit application results in a Segmentation fault.

Comment 19 Jurgen Kramer 2005-04-28 17:10:51 UTC
1275 seems to be running ok. No more lockups, looks good! (I have lots of other
problems but that is another mather).

Comment 20 Dave Jones 2005-04-28 21:25:29 UTC
The 32 bit segfault bug is 155790





Note You need to log in before you can comment on or make changes to this bug.