NOTE: This seems to be the exact same problem as bug #112028 only against RHEL 3.0 Update 1 instead of RHEL 2.1 Update 3. Description of the problem: Running RHEL3.0-AS-Update1-Beta1 When running the "pounder" test suite (formally known as tools10) to create a heavy load under X, moving windows around will cause the machine to crash. It appears X dies as the console image hangs for a second and then the monitor clicks resolutions and goes black. The system is not pingable, nor does the keyboard respond. So far the crash is not seen when the test is left to run on its own. It only occures while I'm moving windows around. So far I have been unable to reproduce the problem by moving/resizing windows around without "pounder" running. Version-Release number of selected component (if applicable): XFree86-4.3.0-44.EL and kernel-smp-2.4.21-6.EL How reproducible: Always Steps to Reproduce: 1. Boot smp kernel on x440 2. Log into X and run the "pounder" script 3. While stress-test is running, move windows around Actual Results: System hangs for a second, then the monitor clicks to black. Expected Results: No problems. Additional info: Using the Option "XaaNoSolidFillRect" setting in the XF86Config file appears to solve the problem just as in bug #112028
Created attachment 96632 [details] /var/log/message Crash took place at Dec 18 10:37:09 in the log file
Created attachment 96633 [details] XF86Config file Original config file, with the only additional change of adding: Option "XaaNoSolidFillRect" to avoid the issue.
Created attachment 96634 [details] XFree86.0.log file XFree86 log file using 'Option "XaaNoSolidFillRect"'
I'm curious if this is related to bug #106023. The symptoms are similar, but they don't line up in some places ('Option "NoAccel"' did not help in that case).
From initial testing, this problem seems to be present in the 2.4.21-11.ELsmp kernel that came w/ REHL3-U2-beta1. It seems to take a bit longer to trigger, but its definitly still happening. I'm doing further testing w/ 'Option "XaaNoSolidFillRect"' to see if that resolves it as well.
Reproduced the issue RHEL3-U2 public beta. Adding 'Option "XaaNoSolidFillRect"' seemed to make the problem harder to reproduce. However in stress tests overnight the system hung (but did not reboot, as is normal w/ this issue). I'm trying to work out if the hang is related.
Using the XaaNoSolidFillRect option, the system hang in comment #6 could not be reproduced running w/o X. However, running w/ X I was able to reproduce the hang. Sysreq-T didn't work. Trying to see if this is reproduceable w/ a single cpu.
Been working on this further. All of the following relates to probems seen on an 8way x440 w/ HT enabled. It seems there are two symptoms (possibly from two causes, I don't know). The first is while running pounder, dragging windows around in X will quickly cause the sudden reboot. No panic is displayed, and no machine check is logged in the service processor event log. I reproduced this symptom under RHEL 3.0 gold, Update 1, and Update 2 beta. It is easilly reproduceable using the pounder test + dragging windows. This issue appears be resolved by using the XaaNoSolidFillRect option. I'm working to narrow down a smaller test case that will still reproduce this issue. The second issue is the less easily reproduced X related reboot. This issue has only been seen when running pounder (in X mode) overnight or over the weekend. It usually takes 5-24 hours to trigger. This case has very similar symptoms (no panic, no machine checks in the event log). The XaaNoSolidFillRect option does not seem to resolve this issue. These two symptoms seem to be related, as they exibit almost the same behaviour, but as only the first is apparently resolved w/ the XaaNoSolidFillRect option, they may be totally different issues. Neither of these issues have been seen w/ SLES 8, SLES 9, or RHEL 4alpha2 on the same box. However, RHEL AS 2.1 does exibit both symptoms (as noted in bug #112028). Any ideas or suggestions from the RedHat team?
Created attachment 100026 [details] Script to demo X related hang on the x440 Here is a smaller test case that reproduces the problem. Please read the script header for usage instructions.
The script can also be run from runlevel 3 to exibit the problem.
John has reproduced this on RHEL 3 U2 beta re0810. Bob, Issuetracker 46728 also addresses this bug as well as RHEL 2.1 bug 112028.
Please upgrade to the latest RHEL3 update packages if there are any newer kernel or X packages available that haven't yet been updated. Once the system is fully up to date, please run: redhat-config-xfree86 --reconfig This will restore the X server config to our defaults. Run the 'pounder' test suite, and if the problem still occurs, please attach the X server config file and log file, /var/log/messages and the output of "lsmod" to bugzilla as individual uncompressed file attachments. Next step is to try and narrow down if this is video driver specific, or a generic issue. That can be done by switching to the 'vesa' video driver by hand editing the config file and replacing the driver name. After this, restart X and rerun your test suite. Please report back wether the test suite runs while the vesa driver is being used. Also attach the same 4 files to the bug report as mentioned above for the savage driver. Also, please attach the output of /proc/interrupts from the problem system. All of the above information will be helpful in further diagnosing and narrowing down the problem. Thanks in advance. Setting bug to "NEEDINFO" state.
The RHEL3 U4 freeze is nearing quickly. We need the test results from comment #16 above before Oct 8, 2004 (this Friday) in order to be able to investigate this further for U4. Thanks in advance.
Just an update, I'm currently working on reproducing this as described in comment #16. Using the same setup I was able to trigger the reboot using the x-hang.sh script, and I'm now running the pounder test to try to trigger the hang. I'm also setting up another box to duplicate the results.
I was able to reproduce the hang/reboot w/ the x-hang.sh script on a different 8way x440 as well.
Update: This issue (reset using x-hang.sh) was also reproduced using RHEL3 U4 beta1.
Using the "vesa" X driver, I was unable to reproduce the reset using the x-hang.sh script w/ RHEL3 U4 beta.
Additionally, using the "savage" X driver, w/ the 'Option "XaaNoSolidFillRect"' line added the the config file, I *was* able to reproduce the x-hang.sh reset, although it took quite a bit longer to show up.
Created attachment 105619 [details] lsmod output Here is the requested lsmod output.
Created attachment 105620 [details] /proc/interrupts output Requested /proc/interrupts output
The /var/log/messages file I was about to post from the machine currently has a bit of junk in it (we were having hardware problems earlier that were fixed, however this problem still remains). Rather then confusing the issue I'm going to re-install, reproduce and post that /var/log/message file.
Created attachment 105682 [details] lsmod output Requested lsmod output from fresh install of RHEL3 U4 beta1.
Created attachment 105683 [details] /proc/interrupts output /proc/interrupts output from freshly installed RHEL3 U4 beta1 install
Created attachment 105684 [details] /var/log/messages from system /var/log/messages file from fresh install of RHEL3 U4 beta1 system, immediately after x-hang.sh caused reset. The crash happened after Oct 22 17:02:26 in the logs.
RH BZ #112028 was closed as WONTFIX. This is the same bug only for RHEL 3.