Bug 112405 (RHEL3-x440-crasher) - RHEL3: x440 crashes under heavy load in X
Summary: RHEL3: x440 crashes under heavy load in X
Keywords:
Status: CLOSED WONTFIX
Alias: RHEL3-x440-crasher
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: XFree86
Version: 3.0
Hardware: i686
OS: Linux
medium
high
Target Milestone: ---
Assignee: X/OpenGL Maintenance List
QA Contact: David Lawrence
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2003-12-19 02:50 UTC by john stultz
Modified: 2007-11-30 22:06 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-05-12 21:01:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
/var/log/message (113.39 KB, text/plain)
2003-12-19 02:53 UTC, john stultz
no flags Details
XF86Config file (3.11 KB, text/plain)
2003-12-19 02:55 UTC, john stultz
no flags Details
XFree86.0.log file (30.94 KB, text/plain)
2003-12-19 02:56 UTC, john stultz
no flags Details
Script to demo X related hang on the x440 (959 bytes, text/plain)
2004-05-06 05:19 UTC, john stultz
no flags Details
lsmod output (939 bytes, text/plain)
2004-10-22 00:12 UTC, john stultz
no flags Details
/proc/interrupts output (2.39 KB, text/plain)
2004-10-22 00:13 UTC, john stultz
no flags Details
lsmod output (1.17 KB, text/plain)
2004-10-23 00:11 UTC, john stultz
no flags Details
/proc/interrupts output (2.79 KB, text/plain)
2004-10-23 00:13 UTC, john stultz
no flags Details
/var/log/messages from system (91.37 KB, text/plain)
2004-10-23 00:15 UTC, john stultz
no flags Details

Description john stultz 2003-12-19 02:50:06 UTC
NOTE: This seems to be the exact same problem as bug #112028 only 
against RHEL 3.0 Update 1 instead of RHEL 2.1 Update 3. 
 
Description of the problem: 
Running RHEL3.0-AS-Update1-Beta1  
 
When running the "pounder" test suite (formally known as tools10) to  
create a heavy load under X, moving windows around will cause the  
machine to crash. It appears X dies as the console image hangs for a  
second and then the monitor clicks resolutions and goes black. The  
system is not pingable, nor does the keyboard respond.   
  
So far the crash is not seen when the test is left to run on its  
own. It only occures while I'm moving windows around. So far I have  
been unable to reproduce the problem by moving/resizing windows  
around without "pounder" running.   
  
Version-Release number of selected component (if applicable):  
XFree86-4.3.0-44.EL and kernel-smp-2.4.21-6.EL 
 
How reproducible:  
Always  
  
Steps to Reproduce:  
1. Boot smp kernel on x440  
2. Log into X and run the "pounder" script  
3. While stress-test is running, move windows around  
      
  
Actual Results:  System hangs for a second, then the monitor clicks  
to black.  
  
Expected Results:  No problems.   
  
Additional info:  
 
Using the  
	Option "XaaNoSolidFillRect"  
setting in the XF86Config file appears to solve the problem just as 
in bug #112028

Comment 1 john stultz 2003-12-19 02:53:22 UTC
Created attachment 96632 [details]
/var/log/message

Crash took place at Dec 18 10:37:09 in the log file

Comment 2 john stultz 2003-12-19 02:55:26 UTC
Created attachment 96633 [details]
XF86Config file

Original config file, with the only additional change of adding:
	Option "XaaNoSolidFillRect"
to avoid the issue.

Comment 3 john stultz 2003-12-19 02:56:21 UTC
Created attachment 96634 [details]
XFree86.0.log file

XFree86 log file using 'Option "XaaNoSolidFillRect"'

Comment 4 john stultz 2003-12-19 03:04:15 UTC
I'm curious if this is related to bug #106023. 
 
The symptoms are similar, but they don't line up in some places 
('Option "NoAccel"' did not help in that case). 

Comment 5 john stultz 2004-03-26 22:24:17 UTC
From initial testing, this problem seems to be present in the 
2.4.21-11.ELsmp kernel that came w/ REHL3-U2-beta1. It seems to take 
a bit longer to trigger, but its definitly still happening. 
 
I'm doing further testing w/ 'Option "XaaNoSolidFillRect"' to see if 
that resolves it as well.  

Comment 6 john stultz 2004-04-16 16:42:09 UTC
Reproduced the issue RHEL3-U2 public beta. Adding 'Option 
"XaaNoSolidFillRect"' seemed to make the problem harder to reproduce. 
However in stress tests overnight the system hung (but did not 
reboot, as is normal w/ this issue). I'm trying to work out if the 
hang is related. 

Comment 7 john stultz 2004-04-21 22:18:50 UTC
Using the XaaNoSolidFillRect option, the system hang in comment #6 
could not be reproduced running w/o X. However, running w/ X I was 
able to reproduce the hang. Sysreq-T didn't work. 
 
Trying to see if this is reproduceable w/ a single cpu.  

Comment 8 john stultz 2004-05-06 03:32:06 UTC
Been working on this further. All of the following relates to probems 
seen on an 8way x440 w/ HT enabled.  
 
It seems there are two symptoms (possibly from two causes, I don't 
know).  
 
The first is while running pounder, dragging windows around in X will 
quickly cause the sudden reboot. No panic is displayed, and no 
machine check is logged in the service processor event log.  
 
I reproduced this symptom under RHEL 3.0 gold, Update 1, and Update 2 
beta. It is easilly reproduceable using the pounder test + dragging 
windows. This issue appears be resolved by using the 
XaaNoSolidFillRect option. I'm working to narrow down a smaller test 
case that will still reproduce this issue. 
 
The second issue is the less easily reproduced X related reboot. This 
issue has only been seen when running pounder (in X mode) overnight 
or over the weekend. It usually takes 5-24 hours to trigger. This 
case has very similar symptoms (no panic, no machine checks in the 
event log). The XaaNoSolidFillRect option does not seem to resolve 
this issue. 
 
These two symptoms seem to be related, as they exibit almost the same 
behaviour, but as only the first is apparently resolved w/ the 
XaaNoSolidFillRect option, they may be totally different issues.  
 
Neither of these issues have been seen w/ SLES 8, SLES 9, or RHEL 
4alpha2 on the same box. However, RHEL AS 2.1 does exibit both 
symptoms (as noted in bug #112028). 
 
Any ideas or suggestions from the RedHat team? 

Comment 9 john stultz 2004-05-06 05:19:37 UTC
Created attachment 100026 [details]
Script to demo X related hang on the x440

Here is a smaller test case that reproduces the problem. Please read the script
header for usage instructions.

Comment 10 john stultz 2004-05-06 05:22:05 UTC
The script can also be run from runlevel 3 to exibit the problem. 

Comment 11 Wendy Hung 2004-08-18 02:03:27 UTC
John has reproduced this on RHEL 3 U2 beta re0810.
Bob, Issuetracker 46728 also addresses this bug as well as RHEL 2.1 
bug 112028.

Comment 16 Mike A. Harris 2004-09-29 18:15:06 UTC
Please upgrade to the latest RHEL3 update packages if there are any
newer kernel or X packages available that haven't yet been updated.

Once the system is fully up to date, please run:

    redhat-config-xfree86 --reconfig

This will restore the X server config to our defaults.  Run the
'pounder' test suite, and if the problem still occurs, please
attach the X server config file and log file, /var/log/messages
and the output of "lsmod" to bugzilla as individual uncompressed
file attachments.

Next step is to try and narrow down if this is video driver
specific, or a generic issue.  That can be done by switching
to the 'vesa' video driver by hand editing the config file and
replacing the driver name.  After this, restart X and rerun
your test suite.  Please report back wether the test suite
runs while the vesa driver is being used.  Also attach the
same 4 files to the bug report as mentioned above for the
savage driver.

Also, please attach the output of /proc/interrupts from the
problem system.

All of the above information will be helpful in further
diagnosing and narrowing down the problem.

Thanks in advance.

Setting bug to "NEEDINFO" state.

Comment 17 Mike A. Harris 2004-10-07 04:44:48 UTC
The RHEL3 U4 freeze is nearing quickly.  We need the test results
from comment #16 above before Oct 8, 2004 (this Friday) in order
to be able to investigate this further for U4.

Thanks in advance.

Comment 20 john stultz 2004-10-12 23:04:21 UTC
Just an update, I'm currently working on reproducing this as 
described in comment #16. Using the same setup I was able to trigger 
the reboot using the x-hang.sh script, and I'm now running the 
pounder test to try to trigger the hang. 
 
I'm also setting up another box to duplicate the results. 
 

Comment 21 john stultz 2004-10-13 18:52:21 UTC
I was able to reproduce the hang/reboot w/ the x-hang.sh script on a 
different 8way x440 as well. 

Comment 22 john stultz 2004-10-20 21:58:30 UTC
Update: This issue (reset using x-hang.sh) was also reproduced using 
RHEL3 U4 beta1.  

Comment 23 john stultz 2004-10-20 22:48:38 UTC
Using the "vesa" X driver, I was unable to reproduce the reset using 
the x-hang.sh script w/ RHEL3 U4 beta. 

Comment 24 john stultz 2004-10-20 23:19:08 UTC
Additionally, using the "savage" X driver, w/ the  'Option 
"XaaNoSolidFillRect"' line added the the config file, I *was* able to 
reproduce the x-hang.sh reset, although it took quite a bit longer to 
show up. 

Comment 26 john stultz 2004-10-22 00:12:23 UTC
Created attachment 105619 [details]
lsmod output

Here is the requested lsmod output.

Comment 27 john stultz 2004-10-22 00:13:36 UTC
Created attachment 105620 [details]
/proc/interrupts output

Requested /proc/interrupts output

Comment 28 john stultz 2004-10-22 00:18:01 UTC
The /var/log/messages file I was about to post from the machine 
currently has a bit of junk in it (we were having hardware problems 
earlier that were fixed, however this problem still remains). Rather 
then confusing the issue I'm going to re-install, reproduce and post 
that /var/log/message file. 

Comment 29 john stultz 2004-10-23 00:11:21 UTC
Created attachment 105682 [details]
lsmod output

Requested lsmod output from fresh install of RHEL3 U4 beta1.

Comment 30 john stultz 2004-10-23 00:13:02 UTC
Created attachment 105683 [details]
/proc/interrupts output

/proc/interrupts output from freshly installed RHEL3 U4 beta1 install

Comment 31 john stultz 2004-10-23 00:15:31 UTC
Created attachment 105684 [details]
/var/log/messages from system 

/var/log/messages file from fresh install of RHEL3 U4 beta1 system, immediately
after x-hang.sh caused reset. The crash happened after Oct 22 17:02:26 in the
logs.

Comment 46 Wendy Hung 2005-05-12 21:01:20 UTC
RH BZ #112028 was closed as WONTFIX.
This is the same bug only for RHEL 3.


Note You need to log in before you can comment on or make changes to this bug.