From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050922 Fedora/1.0.7-1.1.fc4 Firefox/1.0.7 Description of problem: Reference:153926 This bug has appeared to have reoccured with the last xorg release: fonts-xorg-100dpi-6.8.1.1-1.EL.1 fonts-xorg-75dpi-6.8.1.1-1.EL.1 fonts-xorg-base-6.8.1.1-1.EL.1 xorg-x11-6.8.2-1.EL.13.20 xorg-x11-deprecated-libs-6.8.2-1.EL.13.20 xorg-x11-deprecated-libs-devel-6.8.2-1.EL.13.20 xorg-x11-devel-6.8.2-1.EL.13.20 xorg-x11-font-utils-6.8.2-1.EL.13.20 xorg-x11-libs-6.8.2-1.EL.13.20 xorg-x11-Mesa-libGL-6.8.2-1.EL.13.20 xorg-x11-Mesa-libGLU-6.8.2-1.EL.13.20 xorg-x11-tools-6.8.2-1.EL.13.20 xorg-x11-twm-6.8.2-1.EL.13.20 xorg-x11-xauth-6.8.2-1.EL.13.20 xorg-x11-xdm-6.8.2-1.EL.13.20 xorg-x11-xfs-6.8.2-1.EL.13.20 We are unable to boot into runlevel5. X pegs at 99.9% CPU, will eventually crash server with E07F0 PROC1 PROC2 IERR. This has occurred on two identical servers that were updated at the same time. Version-Release number of selected component (if applicable): xorg-x11-6.8.2-1.EL.13.20 How reproducible: Always Steps to Reproduce: 1. Boot normal runlevel 5 2. X will not show on console 3. Remote SSH into box will result in CPU 100%, crash will occur in time. Actual Results: E07F0 PROC1 PROC2 IERR Expected Results: Normal boot. Additional info: Already tried modifying xorg.conf as stated in bug#153926 to no avail.
Setting up new Dell 2850, 2proc, 6G ram, RHEL4. All fine last nite. This AM found server crashed. Dell LCD reporting "E07F0 Proc1 IERR |E07F0 Proc2 IERR". Had to power off to resolve crash. Last thing I'd added to server was "fonts-xorg-truetype-7.8.1.1-1.EL.1.noarch.rpm". I plan to remove this RPM and see if server crashes again.
Same problem here. New Dell 2850, 2proc, 2GB RAM. X worked fine after initial FC4 install. Brought system up-to-date via 'yum'. Display went blank after booting. SSHed in, 'top' showed "xorg" taking up 99% of CPU. System locked up after a number of seconds. Front began showing "E07F0 Proc1 IERR |E07F0 Proc2 IERR" error. Also tried fix from bug#153926. No luck. Running fine in init 3.
Quick fix, although a kludge: # rm -rf /lib/modules/*/kernel/drivers/char/drm Then reboot. The issue is with the drm, and the funky BIOS from Dell.
I am livid!!! When you pay this much for a system with (arguably) the best Enterprise Linux OS you don't expect to be dead in the water after an update. Thank you Stonie for the workaround. Dell PowerEdge 2800, maxed out server (about $16K with govt discount). RHEL4 I hadn't updated in a while. After getting the last batch of updates off RHN system console would not go past init messages (X wouldn't start). System slowed down to a crawl - users complaining. Tried a restart (it's hard to drop Windoze habits) but no diff. Next morning I found dead system with hardware error code "E07F0" and "PROC 1 IERR PROC 2 IERR" on LCD display with amber background. Followed Dell troubleshooter and reseated CPU's. This is no small task, removing the card cage, fan cage, cables and heat sinks. Once it was reassembled, the error went away, but still couln't get X to come up and system still slow. Now I find out that this was all unnecessary! Until I found this bug report it was still running slow with no X on console and the LCD error display. Instead of deleting the "DRM" drivers, I moved them to a temporary location. I hope Red Hat comes up with a fix for this soon. This is no way to run an "Enterprise". We have a PowerEdge 2600 in another building that suffered no ill effects. It is also running RHEL 4 and has the same updates. However, its display is an "ATI Technologies Inc Rage XL (rev 27)", whereas the affected machine has an "ATI Technologies Inc Radeon RV100 QY [Radeon 7000/VE]". Anyone know the reperocussions of removing the DRM modules?
Stonie's fix worked for me too. I just renamed the drm folder. I don't think removing the DRM modules should matter on a server. Unless for some reason you're running 3D apps. on your server.
Red Hat Enterprise Linux customers are urged to directly contact Red Hat support services for all support related problems, either by filing an official support ticket at http://www.redhat.com/support or by telephone at 1-888-RED-HAT1 depending on the type of subscription contract. This ensures that any problems reported are given higher priority over issues directly reported in bugzilla, as bugzilla is only a bug tracking tool and is not a support mechanism. If there are any questions about our Red Hat Enterprise Linux support services, please contact Red Hat support via web or phone as per above and they will provide assistance. This issue seems similar to IERR issues reported by one of our partners via our official support system. While it is not clear if the issue reported here is the same issue or is a different problem, the problem will be investigated during the RHEL4 U4 update cycle. There are preliminary patches which we will be making available for testing in the near future. We will update this report once packages are ready for testing. Thanks for your patience.
Stonie's fix has worked for me on my Dell PE2850s w/RHEL4, 4 servers to date, most re-built a few times now w/same results each time. This after Dell had me remove DRAC cards on two and they replaced sub-processor boards and some other parts and we still got same IERR msg. I replaced DRACs, rebuilt servers, did up2date, got IERR, did Stonie's fix and haven't had prob since on any of the servers. All my 'real' servers are PE2850s w/RHEL4 and Oracle 10gR2 or 9.2.0.4. An aside: I have a Compaq DL380 "sandbox" w/CentOS4, SVN & MySQL but can't remember if it had same problem. Willing to rebuild it with *NIX distro of your choice if that might be helpful :)
Thanks for the bug report. We have reviewed the information you have provided above, and there is some additional information we require that will be helpful in our diagnosis of this issue. Please attach your X server config file (/etc/X11/xorg.conf) and X server log file (/var/log/Xorg.*.log) to the bug report as individual uncompressed file attachments using the bugzilla file attachment link below. We will review this issue again once you've had a chance to attach this information. Thanks in advance.
Created attachment 126900 [details] log file dated March 10, 2006 will send earlier one as well
Created attachment 126901 [details] log file dated Feb 12, 2006
Created attachment 126902 [details] file dated March 10, 2006 last of requested log files
Created attachment 126906 [details] Xorg log from the time of the failure The problem appeared 5 Jan 2006 when I installed the 2.6.9-22.0.2.ELsmp kernel as part of an update. I did save the xorg.conf from the failure. It is attached as "Xorg.0.log.20060106". I'm also attaching the current log (Xorg.0.log). I have not seen the failure repeat since then. The 2.6.9-34.ELsmp kernel drm modules are enabled but don't appear to be causing any load on the system.
Created attachment 126908 [details] Current Xorg log (asymptomatic) This is the current log using kernel 2.6.9-34.ELsmp. I am unaware of any current problems similar to the one reported in this bug. It appears to have been fixed, at least for our platform, in the updated kernel.
Les: Can you confirm this?
Will advise...
We believe that this problem is fixed with the RHEL-4 U3 release. Please upgrade to this release and after rebooting into the latest kernel, confirm wether the problem is resolved or still persists. Thanks in advance.
What changes included in RHEL4 U3 release could solve this problem ?
(In reply to comment #22) > What changes included in RHEL4 U3 release could solve this problem ? * Mon Jan 9 2006 Mike A. Harris <mharris> 6.8.2-1.EL.13.25 - Updated xorg-x11-6.8.2-ati-radeon-7000-disable-dri.patch with additional changes to resolve regressions in bug (#170008). HTH
To expand a bit on comment #23, we have reviewed the attached log files, and from comment #12, the log starts: X Window System Version 6.8.2 Release Date: 9 February 2005 X Protocol Version 11, Revision 0, Release 6.8.2 Build Operating System: Linux 2.4.21-25.ELsmp i686 [ELF] Current Operating System: Linux pegasus 2.6.9-22.0.1.ELsmp #1 SMP Tue Oct 18 18:39:27 EDT 2005 i686 Build Date: 19 September 2005 Build Host: porky.build.redhat.com Comment #13's log shows: X Window System Version 6.8.2 Release Date: 9 February 2005 X Protocol Version 11, Revision 0, Release 6.8.2 Build Operating System: Linux 2.6.9-22.18.bz155725.ELsmp i686 [ELF] Current Operating System: Linux pegasus 2.6.9-34.ELsmp #1 SMP Fri Feb 24 16:54:53 EST 2006 i686 Build Date: 11 January 2006 Build Host: hs20-bc1-7.build.redhat.com Which would seem to indicate the build date being a newer build of X. We assume that changes that went into the build just prior to that, which were to disable DRI on Radeon 7000 due to various instabilities, are responsible for solving this, and several similar problems reported over time. Since this bug report is claiming a DRI induced lockup on Radeon 7000 hardware, it is reasonably likely that it is the same issue, and that by disabling DRI completely, the problem should no longer occur. I have made the latest xorg-x11-6.8.2-103.EL build available at my FTP space on people.redhat.com for people experiencing this problem to test. Please test these rpms and update the report as soon as possible to indicate wether the problem is resolved in this build or not. The patch for this problem has been present in our builds for a while now, and seems to solve the problem for all those who have reported it thus far. The rpms are available at the following temporary location. Please download them as soon as possible even if you can't test them immediately, as I don't have much disk quota and may have to remove them soon to make room for other test packages. ftp://people.redhat.com/mharris/testing/4E/xorg-x11 If the problem does persist after this update, please attach a new X server log file and config file as individual uncompressed file attachments of type "text/plain" that are readable in the web browser, and we will review the new info and continue our diagnosis. Thanks in advance.
Please provide a status update on the results of testing the above packages, and any other comments/feedback. Since we believe this issue may be resolved as per above, this bug will be marked as resolved soon unless we receive feedback that the issue is still present in our latest test packages. Thanks in advance.
Several weeks have gone by and we haven't heard any feedback. Do the test packages provided above resolve this issue?
Sorry folks, it's been a few busy weeks, and I've been out of country. If the kernel fixes in U3 have resolved this issue for other users, then move to close this case, I won't be able to schedule reboots on these servers for a month because they went into production (and don't really need runlevel 5 regardless.) I'll advise only if updates do not work. :-)
Servers that had problem are fully up2date and have notseen problem since doing Stonie's fix. Servers were pressed into service last month so I was reluctant to apply the test pkgs.
Ok, closing as fixed in ERRATA. If the problem recurs, please file a new support ticket with Red Hat support services at http://www.redhat.com/support or via phone at 1-888-RED-HAT1. Thanks.
I apologize for not responding sooner. I didn't realize until recently that comment #24 was directed at me (re: comments # 12 & 13). I've been out of town several times in the past few months and wasn't closely monitoring my mailing lists and forums. I probably missed the download window where test packages were available "ftp://people.redhat.com/mharris/testing/4E/xorg-x11/6.8.2-1.EL.13.30/686/". According to "http://kbase.redhat.com/faq/FAQ_80_1014.shtm", "686" architecture is appropriate for Xeon processors. I've seen no recurrence of the problem with the latest updated RHN packages. This server's cpu load is normal and Xorg log has no failure indications. I'm willing to install test updates if you still want them, but it appears that disabling Digital Rights Management clears up the problem. The question remains, however, why DRM should have such a profound impact on hardware devices.DRM is not a big concern here, though, unless it brings down our systems. We don't play movies or music on our servers and we observe all intellectual property licenses.
(In reply to comment #30) > I'm willing to install test updates if you still want them, but it appears that > disabling Digital Rights Management clears up the problem. Unforutnately, DRM has become an overloaded acronym. In the current context, DRM stands for Direct Rendering Manager, which is a kernel module to manage direct rendering contexts. It has nothing to do with Digital Rights Management. > The question remains, however, why DRM should have such a profound impact on > hardware devices.DRM is not a big concern here, though, unless it brings down > our systems. We don't play movies or music on our servers and we observe all > intellectual property licenses. The issue here is that the Direct Rendering Manager requires the AGP module, and there are some known kernel interactions with AGP and Radeon 7000/VE cards. And, as Mike described above, there was a fix to properly disable the direct rendering on the Radeon 7000 in RHEL4U3.
My bad... Here I was, righteously indignant about how the Digital Rights Gestapo was ruining my life. :) So, what's the difference between changes in the official RHN xorg packages and the test packages? I imagine I'll see them in future package updates. Am I correct in assuming that "DRM" changes _should_ only affect the physical console display? We have "remote" X sessions available to some of our local developers through VNC. However, none of this should affect rendering in their sessions since they're using virtual displays. Our physical console display is almost never used. Even when I connect to run up2date I do so through VNC using the Xorg vnc module.