Bug 164976 - system locks up
system locks up
Status: CLOSED WORKSFORME
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
4
i386 Linux
medium Severity high
: ---
: ---
Assigned To: Dave Jones
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-08-03 05:10 EDT by Charles C. Van Tilburg
Modified: 2015-01-04 17:21 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-09-30 11:28:59 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
scimark2 compiled with gcc 2.95.3 -O2 (14.35 KB, application/octet-stream)
2005-08-03 05:12 EDT, Charles C. Van Tilburg
no flags Details

  None (edit)
Description Charles C. Van Tilburg 2005-08-03 05:10:22 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.10) Gecko/20050716 Firefox/1.0.6

Description of problem:
This is a reproducible, complete system lock.  Nothing is logged.  No console
messages are printed.  

It first happened running Doom3, but since, I have found that scimark2, 
compiled with gcc 2.95.3 -O2, will consistently cause this to happen.  

Scimark2 compiled with FC4 gcc32 or stock gcc will *NOT* cause this to 
happen.

If X is running, the first attempt to run a gcc 2.95.3 compiled scimark2 (-O2)
will lock the system.  It looks suspiciously as if this happens as it it 
trying to print out its results.  

If X is not running, a shell loop running this binary will produce the lock 
in 3 or 4 repetitions; a single run is *NOT* sufficient.

It also looks like this is specific to one motherboard, since my other two 
systems are not affected.  The system affected is an ECS [Apollo KT266/A/333]/AMD XP3000, while, for example, the problem does *NOT* affect 
an identical FC4 software installation on a Foxconn [KT400/KT600 AGP]/AMD
XP2600.

The graphics drivers have been eliminated from consideration, since that
kernel module is not loaded at boot to console.

It would be interesting if this involves some kind of locking optimization,
since I also found /usr/include/bits/lowlevellock.h to be missing and filed 
bug #164974 about it.

Grasping at straws, it also involve some kind of change in the kernel chipset 
support... ??

Since I have recently made the switch from FC3 to FC4 and this was definately
*NOT* happening there (at least a couple of kernel revs back), this is 
probably related to new kernel work.

Version-Release number of selected component (if applicable):
kernel-2.6.12-1.1398_FC4

How reproducible:
Always

Steps to Reproduce:
1.run scimark 2 compiled with gcc 2.95.3 -O2
2.
3.
  

Actual Results:  complete system lockup, no messages logged, no console messages

Expected Results:  benchmark results

Additional info:
Comment 1 Charles C. Van Tilburg 2005-08-03 05:12:12 EDT
Created attachment 117389 [details]
scimark2 compiled with gcc 2.95.3 -O2
Comment 2 Dave Jones 2005-08-03 18:29:43 EDT
Things to try..
can you try booting with exec-shield=0   and also try vdso=0

also try echo 0 > /proc/sys/kernel/randomize_va_space
Comment 3 Charles C. Van Tilburg 2005-08-04 07:53:50 EDT
trying the easiest first, the randomize_va_space definately helped, but did not
solve the problem; it survived three runs with X running and locked up on the
fourth.  This is better than locking up on the first run.

rebooting with only the exec-shield=0 and vdso=0, running at console made things
worse; immediate lock up.  This is down from surviving three or four.

I'm about to try a combination of all three under X...
Comment 4 Charles C. Van Tilburg 2005-08-04 08:01:33 EDT
all three under X result in it surviving the first run, but locking up
on the second (when I started an sh loop).  This is an improvement over 
the original situation, but not much, and certainly not as good as just
doing the randomize_va_space.

if it means anything, SELinux is (and has been) disabled.
Comment 5 Dave Jones 2005-08-04 13:33:54 EDT
what graphics driver are you using ?
Comment 6 Charles C. Van Tilburg 2005-08-04 14:29:13 EDT
I thought I had eliminated the whole graphics system for consideration
by running at the console, right after boot, which doesn't even have
the nvidia kernel module loaded.  

However, I guess I have not tried running at the console, with only
the randomize_va_space set to 0.  I'll try it.

The fact that having X running exacerbates a possible kernel problem 
is not completely unbelievable.

To finally answer the question, the graphics drivers are Nvidia's 
latest; 7667.
Comment 7 Charles C. Van Tilburg 2005-08-04 14:45:59 EDT
Fresh boot, no nvidia kernel module loaded, console, the only
fix applied is the randomize_va_space set to 0.

Runs three times, locks up on the fourth.  Which is probably
exactly what happened at console mode before, without any fix
applied (I wasn't counting as precisely then, as now).

However, setting randomize_va_space to 0 sure did help while 
X was running...
Comment 8 Charles C. Van Tilburg 2005-08-04 14:57:29 EDT
Also... to be clear, my system does *NOT* boot to an xdm or such
login screen... it comes up to console only (run level 3).  I 
have to type startx to get X going after login.  FYI.
Comment 9 Dave Jones 2005-09-30 03:06:52 EDT
Mass update to all FC4 bugs:

An update has been released (2.6.13-1.1526_FC4) which rebases to a new upstream
kernel (2.6.13.2). As there were ~3500 changes upstream between this and the
previous kernel, it's possible your bug has been fixed already.

Please retest with this update, and update this bug if necessary.

Thanks.
Comment 10 Charles C. Van Tilburg 2005-09-30 11:28:59 EDT
Survived five runs...

Note You need to log in before you can comment on or make changes to this bug.