Red Hat Bugzilla – Bug 112405
RHEL3: x440 crashes under heavy load in X
Last modified: 2007-11-30 17:06:59 EST
NOTE: This seems to be the exact same problem as bug #112028 only
against RHEL 3.0 Update 1 instead of RHEL 2.1 Update 3.
Description of the problem:
When running the "pounder" test suite (formally known as tools10) to
create a heavy load under X, moving windows around will cause the
machine to crash. It appears X dies as the console image hangs for a
second and then the monitor clicks resolutions and goes black. The
system is not pingable, nor does the keyboard respond.
So far the crash is not seen when the test is left to run on its
own. It only occures while I'm moving windows around. So far I have
been unable to reproduce the problem by moving/resizing windows
around without "pounder" running.
Version-Release number of selected component (if applicable):
XFree86-4.3.0-44.EL and kernel-smp-2.4.21-6.EL
Steps to Reproduce:
1. Boot smp kernel on x440
2. Log into X and run the "pounder" script
3. While stress-test is running, move windows around
Actual Results: System hangs for a second, then the monitor clicks
Expected Results: No problems.
setting in the XF86Config file appears to solve the problem just as
in bug #112028
Created attachment 96632 [details]
Crash took place at Dec 18 10:37:09 in the log file
Created attachment 96633 [details]
Original config file, with the only additional change of adding:
to avoid the issue.
Created attachment 96634 [details]
XFree86 log file using 'Option "XaaNoSolidFillRect"'
I'm curious if this is related to bug #106023.
The symptoms are similar, but they don't line up in some places
('Option "NoAccel"' did not help in that case).
From initial testing, this problem seems to be present in the
2.4.21-11.ELsmp kernel that came w/ REHL3-U2-beta1. It seems to take
a bit longer to trigger, but its definitly still happening.
I'm doing further testing w/ 'Option "XaaNoSolidFillRect"' to see if
that resolves it as well.
Reproduced the issue RHEL3-U2 public beta. Adding 'Option
"XaaNoSolidFillRect"' seemed to make the problem harder to reproduce.
However in stress tests overnight the system hung (but did not
reboot, as is normal w/ this issue). I'm trying to work out if the
hang is related.
Using the XaaNoSolidFillRect option, the system hang in comment #6
could not be reproduced running w/o X. However, running w/ X I was
able to reproduce the hang. Sysreq-T didn't work.
Trying to see if this is reproduceable w/ a single cpu.
Been working on this further. All of the following relates to probems
seen on an 8way x440 w/ HT enabled.
It seems there are two symptoms (possibly from two causes, I don't
The first is while running pounder, dragging windows around in X will
quickly cause the sudden reboot. No panic is displayed, and no
machine check is logged in the service processor event log.
I reproduced this symptom under RHEL 3.0 gold, Update 1, and Update 2
beta. It is easilly reproduceable using the pounder test + dragging
windows. This issue appears be resolved by using the
XaaNoSolidFillRect option. I'm working to narrow down a smaller test
case that will still reproduce this issue.
The second issue is the less easily reproduced X related reboot. This
issue has only been seen when running pounder (in X mode) overnight
or over the weekend. It usually takes 5-24 hours to trigger. This
case has very similar symptoms (no panic, no machine checks in the
event log). The XaaNoSolidFillRect option does not seem to resolve
These two symptoms seem to be related, as they exibit almost the same
behaviour, but as only the first is apparently resolved w/ the
XaaNoSolidFillRect option, they may be totally different issues.
Neither of these issues have been seen w/ SLES 8, SLES 9, or RHEL
4alpha2 on the same box. However, RHEL AS 2.1 does exibit both
symptoms (as noted in bug #112028).
Any ideas or suggestions from the RedHat team?
Created attachment 100026 [details]
Script to demo X related hang on the x440
Here is a smaller test case that reproduces the problem. Please read the script
header for usage instructions.
The script can also be run from runlevel 3 to exibit the problem.
John has reproduced this on RHEL 3 U2 beta re0810.
Bob, Issuetracker 46728 also addresses this bug as well as RHEL 2.1
Please upgrade to the latest RHEL3 update packages if there are any
newer kernel or X packages available that haven't yet been updated.
Once the system is fully up to date, please run:
This will restore the X server config to our defaults. Run the
'pounder' test suite, and if the problem still occurs, please
attach the X server config file and log file, /var/log/messages
and the output of "lsmod" to bugzilla as individual uncompressed
Next step is to try and narrow down if this is video driver
specific, or a generic issue. That can be done by switching
to the 'vesa' video driver by hand editing the config file and
replacing the driver name. After this, restart X and rerun
your test suite. Please report back wether the test suite
runs while the vesa driver is being used. Also attach the
same 4 files to the bug report as mentioned above for the
Also, please attach the output of /proc/interrupts from the
All of the above information will be helpful in further
diagnosing and narrowing down the problem.
Thanks in advance.
Setting bug to "NEEDINFO" state.
The RHEL3 U4 freeze is nearing quickly. We need the test results
from comment #16 above before Oct 8, 2004 (this Friday) in order
to be able to investigate this further for U4.
Thanks in advance.
Just an update, I'm currently working on reproducing this as
described in comment #16. Using the same setup I was able to trigger
the reboot using the x-hang.sh script, and I'm now running the
pounder test to try to trigger the hang.
I'm also setting up another box to duplicate the results.
I was able to reproduce the hang/reboot w/ the x-hang.sh script on a
different 8way x440 as well.
Update: This issue (reset using x-hang.sh) was also reproduced using
RHEL3 U4 beta1.
Using the "vesa" X driver, I was unable to reproduce the reset using
the x-hang.sh script w/ RHEL3 U4 beta.
Additionally, using the "savage" X driver, w/ the 'Option
"XaaNoSolidFillRect"' line added the the config file, I *was* able to
reproduce the x-hang.sh reset, although it took quite a bit longer to
Created attachment 105619 [details]
Here is the requested lsmod output.
Created attachment 105620 [details]
Requested /proc/interrupts output
The /var/log/messages file I was about to post from the machine
currently has a bit of junk in it (we were having hardware problems
earlier that were fixed, however this problem still remains). Rather
then confusing the issue I'm going to re-install, reproduce and post
that /var/log/message file.
Created attachment 105682 [details]
Requested lsmod output from fresh install of RHEL3 U4 beta1.
Created attachment 105683 [details]
/proc/interrupts output from freshly installed RHEL3 U4 beta1 install
Created attachment 105684 [details]
/var/log/messages from system
/var/log/messages file from fresh install of RHEL3 U4 beta1 system, immediately
after x-hang.sh caused reset. The crash happened after Oct 22 17:02:26 in the
RH BZ #112028 was closed as WONTFIX.
This is the same bug only for RHEL 3.