Bug 155827

Summary: (x86-64 1258+) spontaneous reboots, deadlocks, panics
Product: [Fedora] Fedora Reporter: Sylvain Rouillard <rouillardsy>
Component: kernelAssignee: Dave Jones <davej>
Status: CLOSED RAWHIDE QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: rawhideCC: cimmo, gczarcinski, gtmkramer, lsof, pfrields, pmatilai, twaugh, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-04-28 21:25:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 136450    
Attachments:
Description Flags
lspci
none
lsmod and lspci, as requested
none
lsmod without the proprietary nvidia driver none

Description Sylvain Rouillard 2005-04-24 11:49:04 UTC
Description of problem:
The system randomly reboots or randomly freezes (no mouse nor keyboard response
at all).

Version-Release number of selected component (if applicable):
This seemingly started after the last yum update (23/04). the 1261 kernel seems
affected, and so does 1258. Booting on 1253 restaures the system stability.

How reproducible:
Always, sooner or later

Steps to Reproduce:
1. Boot the system on the 1261 or 1258 version of the kernel
2. Wait, between 5 minutes and an hour
  
Actual results:
The system becomes completely unresponsive or even reboots on its own.

Expected results:
The system stays up and running

Additional info:
I couldn't find what triggers the reboot. This has been happening when I was
using various applications, and even when I was away (I found the computer
frozen on the screen saver). So, this seems to be unrelated to the user actions.
I also have to mention that I've been using the 1258 kernel the day before and
that it has been stable all the day, although it is now as unstable as 1261 is.
Nothing has changed on my system (hardware/software) over the last week or so,
except for the yum updates that I do daily. Of course I should also mention that
those reboots are so sudden that they let /var/log/messages all clean.
I checked my RAM with memtest and it is clean. Anyway I doubt this is a hardware
problem (though I have to admit this very well looks like one) since the exact
same thing happens to 3 other x86_64 users of the fedora-test list.

This is my first bugzilla report, so feel free to ask whatever info I could have
forgotten.

Cheers

Comment 1 Arjan van de Ven 2005-04-24 16:10:45 UTC
can you post an lsmod output?
With some luck we can find common ground between the users that way

Comment 2 Gene Czarcinski 2005-04-24 16:40:16 UTC
Also see this on 1261 (x86_64) but 1253 OK.

Comment 3 Gene Czarcinski 2005-04-24 16:42:54 UTC
Created attachment 113610 [details]
lspci

ASUS SK8V with Opteron 140

Comment 4 Sylvain Rouillard 2005-04-24 17:04:21 UTC
Created attachment 113613 [details]
lsmod and lspci, as requested

Comment 5 Arjan van de Ven 2005-04-24 20:03:14 UTC
nvidia               4580544  12

Comment 6 Sylvain Rouillard 2005-04-24 20:12:00 UTC
01:00.0 VGA compatible controller: ATI Technologies Inc Radeon R200 QM [Radeon
9100] (rev 80)

(from Gene's lspci)

Comment 7 Sylvain Rouillard 2005-04-24 20:27:05 UTC
Created attachment 113617 [details]
lsmod without the proprietary nvidia driver

Comment 8 Sylvain Rouillard 2005-04-24 20:31:00 UTC
I rebooted with 1261 and without the ugly proprietary nvidia beast (nv instead).
See the lsmod I got with this config above (Comment #7). Again, it crashed: one
freeze and one reboot, in no time.

I hope that helps.

Comment 9 Sylvain Rouillard 2005-04-26 15:25:51 UTC
Add today's 1267 to the list of not useable kernels.

Comment 10 John Pearson 2005-04-26 20:45:43 UTC
04/26/2005-03:28:47 PM-EDT 
 
I booted up the 1267 kernel revision about an hour ago.  The machine has 
locked or spontaneously rebooted 5 times. 
 
uname -a 
Linux Katchoo.crooks1722.hab 2.6.11-1.1267_FC4 #1 
Mon Apr 25 19:41:39 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux 
 
 
The only usable thing is this message from the monitor: 
 
Call Trace: <#DF><0> kernel panic - not syncing; Kernel/module.c : 2074: 
sin_lock 
(Kernel module.c: ffffffff80489420) already locked by Kernel/module.c/2039. 
(Not tainted) 
 
-- The message repeats 4 times -- 
 
Call Trace: <#SS>[<000 ... 00001>] <ffffffff8013a015>{panic+133} 
 
--End of screen -- 
 
-Jpearson 
 

Comment 11 Warren Togami 2005-04-27 09:10:52 UTC
*** Bug 155844 has been marked as a duplicate of this bug. ***

Comment 12 Warren Togami 2005-04-27 09:24:02 UTC
Confirmed seeing this on my Athlon64 with MSI motherboard.  1253 and earlier
kernels were unaffected, while 1258 through 1268 are confirmed broken in the
same way.  I am able to reproduce this readily without X running by rebuilding
the perl package, at the end during check-rpaths (from fedora-rpmdevtools) it
almost always triggers this bug.

8/15 times it deadlocked.  5/15 times it rebooted spontaneously, and twice it
had a kernel panic with traceback.

I have been unable to capture the traceback however because netconsole is not
working, and I can't get a serial console working for some strange reason.

This bug is very critical and maybe should block FC4test3 because it is unlikely
to survive long enough to install.

Comment 13 Warren Togami 2005-04-27 09:54:36 UTC
#!/bin/bash
RPM_BUILD_ROOT=`pwd` /usr/lib/rpm/check-rpaths
exec ./test.sh

After the perl build causes the system to crash during check-rpaths, I can
reproduce it readily by going into that directory and running the above script.
 It crashes within a minute.  HOWEVER this looping script in single user mode
does not cause the system to crash.

Runlevel 3 running the script does crash the system.  My totally wild guess is
that it takes disk activity plus something else to cause a race condition and
this crash.  My x86_64 system is an "everything" install so Bug 155893 may be
causing constant background disk activity, which explains the spontaneous
reboots while sitting idle at the gdm screen (and my nearly 1GB var/log/messages.)

Comment 14 Warren Togami 2005-04-27 18:25:30 UTC
Upstream linux-2.6.12-rc3 seems affected by the same issue.

Comment 15 Warren Togami 2005-04-27 21:34:07 UTC
http://people.redhat.com/wtogami/temp/kernel-x86_64/
kernel-2.6.11-1.1275_FC4.x86_64 seems to fix this for me.  Everyone please test
and report back.

Comment 16 Sylvain Rouillard 2005-04-27 23:00:48 UTC
It's been up for an hour now, with no particular problem. If it's still buggy,
then I got damn lucky (I rarely got uptimes of 1h before a crash). I'll let it
up for the night to be sure tho, and I'll post back when I wake up.

Good job

Comment 17 Sylvain Rouillard 2005-04-28 08:20:40 UTC
It's been up all the night (*thumbs up*). So, from my perspective, that is
fixed. Well done!

Comment 18 Gene Czarcinski 2005-04-28 15:36:38 UTC
OK, things are better with 1275 since the system  stays up.  BUT, it is still
hosed since running any 32 bit application results in a Segmentation fault.

Comment 19 Jurgen Kramer 2005-04-28 17:10:51 UTC
1275 seems to be running ok. No more lockups, looks good! (I have lots of other
problems but that is another mather).

Comment 20 Dave Jones 2005-04-28 21:25:29 UTC
The 32 bit segfault bug is 155790