Bug 85341

Summary: summit system restarts under heavy graphic load
Product: Red Hat Enterprise Linux 2.1 Reporter: Ernst-Heinrich Klaas <ernst-heinrich.klaas>
Component: XFree86Assignee: John Dennis <jdennis>
Status: CLOSED ERRATA QA Contact: David Lawrence <dkl>
Severity: high Docs Contact:
Priority: high    
Version: 2.1CC: mharris, nphilipp
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-03-19 17:01:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Bug85341Info.tgz
none
GrepXFree86.txt
none
lsmod.txt
none
lspci.txt
none
messages
none
XF86Config-4
none
XFree86.0.log
none
messages.txt
none
XF86Config-4.txt
none
XFree86.0.log.txt
none
GrepXFree86.txt none

Description Ernst-Heinrich Klaas 2003-02-28 11:25:22 UTC
Description of problem:
We have installed Red Hat AS 2.1 on a Fujitsu Siemens Computers PRIMERGY T850 
with 8 CPU's and 16 GB RAM. The PRIMERGY T850 is the same hardware like the 
IBM xSeries 440.
After installing RH AS2.1 out of the box we have installed the kernel 2.4.9-
e.12:
rpm -Uvh kernel-doc-2.4.9-e.12.i386.rpm
rpm -Uvh kernel-headers-2.4.9-e.12.i386.rpm
rpm -Uvh kernel-source-2.4.9-e.12.i386.rpm
rpm -ivh kernel-summit-2.4.9-e.12.i686.rpm

Additionally we add the boot option "notsc".

The systems restarts approximate 58 minutes after starting the
rhr-ddX test. This test is part of the "Red Hat Ready Certification Testing 
Suite".
(rhr-core-1.8-1.noarch.rpm, rhr-auto-1.8-1.noarch.rpm, rhr-interactive-1.8-
1.noarch.rpm)

Version-Release number of selected component (if applicable)
XFree86 4.1.0
savage_drv.o  1.1.20t


How reproducible:
always

Steps to Reproduce:
1. Install all "rhr-xxx-1.8-1.noarch.rpm" RPMs
2. Start "rhr-ddX"
3.
    
Actual results:
System restart

Expected results:
No restart

Additional info:
After the restart we found following information inside
"event log" and "/var/log/messages":  

Event Log:

X SERVPROC 02/27/03 17:10:45 Resetting system due to an unrecoverable error
X SERVPROC 02/27/03 17:10:45 Upper CEC Machine Check, TW_MCKC: [20][00][00][00]
[00][00][00][00]
X SERVPROC 02/27/03 17:10:45 Lower CEC Machine Check, TW_MCKC: [20][00][00][00]
[00][00][00][00]
X SERVPROC 02/27/03 17:10:42 Upper CEC Machine Check, CYC_MCK2: [00][01][00][00]
[00][08][00][00]
X SERVPROC 02/27/03 17:10:33 Lower CEC Machine Check, CYC_MCK2: [00][00][80][00]
[00][08][00][00]
(Severities: X=Error)


/var/log/messages:

Feb 27 15:22:23 pdb0362c rhr-Config:  succeeded
Feb 27 15:22:23 pdb0362c rhr-Zresults:  succeeded
Feb 27 15:22:23 pdb0362c rhr-Config:  succeeded
Feb 27 15:22:30 pdb0362c rhr-INFO:  succeeded
Feb 27 15:22:30 pdb0362c last message repeated 4 times
Feb 27 15:22:31 pdb0362c kernel: hda: ATAPI 24X DVD-ROM drive, 512kB Cache, UDMA
(33)
Feb 27 15:22:31 pdb0362c kernel: Uniform CD-ROM driver Revision: 3.12
Feb 27 15:31:29 pdb0362c kernel: loop: loaded (max 8 devices)
Feb 27 15:39:12 pdb0362c login(pam_unix)[1211]: session opened for user root by 
LOGIN(uid=0)
Feb 27 15:39:12 pdb0362c  -- root[1211]: ROOT LOGIN ON tty2
Feb 27 16:14:31 pdb0362c rhr-cdromdata:  succeeded
Feb 27 16:14:35 pdb0362c modprobe: modprobe: Can't locate module char-major-81
Feb 27 16:33:10 pdb0362c modprobe: modprobe: Can't locate module char-major-81
Feb 27 16:52:53 pdb0362c modprobe: modprobe: Can't locate module char-major-81
Feb 27 17:11:08 pdb0362c syslogd 1.4.1: restart.

The Red Hat HCL entry for the identical IBM xSeries 440 include no hints for
"non-native" drivers ( specially for the graphic driver "savage_drv.o" )

Comment 1 Mike A. Harris 2003-02-28 11:34:46 UTC
Can you please attach the /var/log/messages file from when this occurs,
as well as the complete output of:

lspci
lsmod

Also attach the X server log and config file please.

Comment 2 Mike A. Harris 2003-02-28 11:36:39 UTC
Also, what exact RPM release of XFree86 are you using?

rpm -qa | grep XFree86

Thanks in advance.

Comment 3 Ernst-Heinrich Klaas 2003-02-28 14:09:53 UTC
Created attachment 90411 [details]
Bug85341Info.tgz

Attachment contains following files :
GrepXFree86.txt, lsmod.txt, lspci.txt, messages,
XF86Config-4, XFree86.0.log

Comment 4 Mike A. Harris 2003-02-28 14:41:01 UTC
Please make all file attachments separate individual text/plain file
attachments so that they can be easily viewed by clicking on them in the
web browser.

It is a lot of extra effort to download a tarball, decompress it, then
try keep track of the various files throughout the life of a bug.  They
get lost, and require redownloading each time to view them.  When multiplied
times several bug reports, it quickly becomes very unmanageable.


Comment 5 Ernst-Heinrich Klaas 2003-02-28 14:50:43 UTC
Created attachment 90413 [details]
GrepXFree86.txt

Comment 6 Ernst-Heinrich Klaas 2003-02-28 14:51:11 UTC
Created attachment 90414 [details]
lsmod.txt

Comment 7 Ernst-Heinrich Klaas 2003-02-28 14:51:59 UTC
Created attachment 90415 [details]
lspci.txt

Comment 8 Ernst-Heinrich Klaas 2003-02-28 14:52:40 UTC
Created attachment 90416 [details]
messages

Comment 9 Ernst-Heinrich Klaas 2003-02-28 14:53:26 UTC
Created attachment 90417 [details]
XF86Config-4

Comment 10 Ernst-Heinrich Klaas 2003-02-28 14:53:56 UTC
Created attachment 90418 [details]
XFree86.0.log

Comment 11 Ernst-Heinrich Klaas 2003-02-28 15:05:47 UTC
Created attachment 90419 [details]
messages.txt

Comment 12 Ernst-Heinrich Klaas 2003-02-28 15:07:48 UTC
Created attachment 90420 [details]
XF86Config-4.txt

Comment 13 Ernst-Heinrich Klaas 2003-02-28 15:08:33 UTC
Created attachment 90421 [details]
XFree86.0.log.txt

Comment 14 Ernst-Heinrich Klaas 2003-02-28 15:18:15 UTC
Created attachment 90422 [details]
GrepXFree86.txt

Some attachements did miss the suffix ".txt".
Corrected now !!
(Sorry for the inconvenience.)

Comment 15 Ernst-Heinrich Klaas 2003-03-05 14:36:24 UTC
The same problem occured also with XFree86 4.1.0-29.

Comment 16 John Dennis 2003-03-05 16:09:22 UTC
It appears as though there was a machine check. I'm not familar with this
system. Are you able to get a machine check dump from the bios?

It would be good to isolate the problem. Can you replace the graphics card with
something other than an S3, such as an ATI or Nvidia card and rerun the test? If
it does not fail with another card/driver then we know this is driver specific
and helps us to focus.

The current AS2.1 release of XFree86 is 4.1.0-44. There is no reason for me to
believe the changes between the release 25 you are running and the current 44
release affect this issue, but on the other hand thats quite a few revisions
apart, its probably worthwhile using the latest released package.


Comment 17 Ernst-Heinrich Klaas 2003-03-06 14:35:30 UTC
We have done a few tests now with XFree86 4.1.0-44 and it looks pretty good.
But we have to do some more tests before we can close this bugzilla report.


Comment 18 Ernst-Heinrich Klaas 2003-03-19 12:03:01 UTC
The system works fine after using XFree86 4.1.0-44 ( download: 
ftp://updates.redhat.com/... ) and all screensavers disabled.
(see also : "Servers - LINUX hangs with XScreenSaver v3.33" under
 http://www-1.ibm.com/support/ )

Comment 19 John Dennis 2003-03-19 16:42:15 UTC
Mike - why did you reopen this?

Comment 20 Mike A. Harris 2003-03-19 17:01:07 UTC
I've re-opened this as it was incorrectly closed as resolved in RAWHIDE.

XFree86 4.3.0 is what is in rawhide (internally anyway), and we don't really
know what the problem was specifically.  It definitely hasn't been tested
with rawhide though, so that resolution is invalid.

The problem is considered resolved above with 4.1.0-44, so it should be
resolved as ERRATA.  I'm wondering though if the problem still does occur
with screensavers enabled.  Disabling the screensavers would just be a
workaround in this case.

The future erratum may contain a new Savage driver update which is known
to solve many driver related issues.

Comment 21 Nils Philippsen 2003-03-24 16:22:05 UTC
On my machine (not Advanced Server though, different graphics card) I sometimes
(with older XFre86 versions) had 3D stability issues, so this might be a
possible explanation why disabling screensavers worked around the problem. Just
a thought.