164976 – system locks up

Bug 164976 - system locks up

Summary: system locks up

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	4
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Dave Jones
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2005-08-03 09:10 UTC by Charles C. Van Tilburg
Modified:	2015-01-04 22:21 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2005-09-30 15:28:59 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
scimark2 compiled with gcc 2.95.3 -O2 (14.35 KB, application/octet-stream) 2005-08-03 09:12 UTC, Charles C. Van Tilburg	no flags	Details
View All

Description Charles C. Van Tilburg 2005-08-03 09:10:22 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.10) Gecko/20050716 Firefox/1.0.6

Description of problem:
This is a reproducible, complete system lock. Nothing is logged. No console
messages are printed.

It first happened running Doom3, but since, I have found that scimark2,
compiled with gcc 2.95.3 -O2, will consistently cause this to happen.

Scimark2 compiled with FC4 gcc32 or stock gcc will *NOT* cause this to
happen.

If X is running, the first attempt to run a gcc 2.95.3 compiled scimark2 (-O2)
will lock the system. It looks suspiciously as if this happens as it it
trying to print out its results.

If X is not running, a shell loop running this binary will produce the lock
in 3 or 4 repetitions; a single run is *NOT* sufficient.

It also looks like this is specific to one motherboard, since my other two
systems are not affected. The system affected is an ECS [Apollo KT266/A/333]/AMD XP3000, while, for example, the problem does *NOT* affect
an identical FC4 software installation on a Foxconn [KT400/KT600 AGP]/AMD
XP2600.

The graphics drivers have been eliminated from consideration, since that
kernel module is not loaded at boot to console.

It would be interesting if this involves some kind of locking optimization,
since I also found /usr/include/bits/lowlevellock.h to be missing and filed
bug #164974 about it.

Grasping at straws, it also involve some kind of change in the kernel chipset
support... ??

Since I have recently made the switch from FC3 to FC4 and this was definately
*NOT* happening there (at least a couple of kernel revs back), this is
probably related to new kernel work.

Version-Release number of selected component (if applicable):
kernel-2.6.12-1.1398_FC4

How reproducible:
Always

Steps to Reproduce:
1.run scimark 2 compiled with gcc 2.95.3 -O2
2.
3.

Actual Results: complete system lockup, no messages logged, no console messages

Expected Results: benchmark results

Additional info:

Comment 1 Charles C. Van Tilburg 2005-08-03 09:12:12 UTC

Created attachment 117389 [details]
scimark2 compiled with gcc 2.95.3 -O2

Comment 2 Dave Jones 2005-08-03 22:29:43 UTC

Things to try..
can you try booting with exec-shield=0   and also try vdso=0

also try echo 0 > /proc/sys/kernel/randomize_va_space

Comment 3 Charles C. Van Tilburg 2005-08-04 11:53:50 UTC

trying the easiest first, the randomize_va_space definately helped, but did not
solve the problem; it survived three runs with X running and locked up on the
fourth.  This is better than locking up on the first run.

rebooting with only the exec-shield=0 and vdso=0, running at console made things
worse; immediate lock up.  This is down from surviving three or four.

I'm about to try a combination of all three under X...

Comment 4 Charles C. Van Tilburg 2005-08-04 12:01:33 UTC

all three under X result in it surviving the first run, but locking up
on the second (when I started an sh loop).  This is an improvement over 
the original situation, but not much, and certainly not as good as just
doing the randomize_va_space.

if it means anything, SELinux is (and has been) disabled.

Comment 5 Dave Jones 2005-08-04 17:33:54 UTC

what graphics driver are you using ?

Comment 6 Charles C. Van Tilburg 2005-08-04 18:29:13 UTC

I thought I had eliminated the whole graphics system for consideration
by running at the console, right after boot, which doesn't even have
the nvidia kernel module loaded.  

However, I guess I have not tried running at the console, with only
the randomize_va_space set to 0.  I'll try it.

The fact that having X running exacerbates a possible kernel problem 
is not completely unbelievable.

To finally answer the question, the graphics drivers are Nvidia's 
latest; 7667.

Comment 7 Charles C. Van Tilburg 2005-08-04 18:45:59 UTC

Fresh boot, no nvidia kernel module loaded, console, the only
fix applied is the randomize_va_space set to 0.

Runs three times, locks up on the fourth.  Which is probably
exactly what happened at console mode before, without any fix
applied (I wasn't counting as precisely then, as now).

However, setting randomize_va_space to 0 sure did help while 
X was running...

Comment 8 Charles C. Van Tilburg 2005-08-04 18:57:29 UTC

Also... to be clear, my system does *NOT* boot to an xdm or such
login screen... it comes up to console only (run level 3).  I 
have to type startx to get X going after login.  FYI.

Comment 9 Dave Jones 2005-09-30 07:06:52 UTC

Mass update to all FC4 bugs:

An update has been released (2.6.13-1.1526_FC4) which rebases to a new upstream
kernel (2.6.13.2). As there were ~3500 changes upstream between this and the
previous kernel, it's possible your bug has been fixed already.

Please retest with this update, and update this bug if necessary.

Thanks.

Comment 10 Charles C. Van Tilburg 2005-09-30 15:28:59 UTC

Survived five runs...

Note You need to log in before you can comment on or make changes to this bug.