Bug 170048 - (IERR) E07F0 Hard Crash Dell PE2850
(IERR) E07F0 Hard Crash Dell PE2850
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: xorg-x11 (Show other bugs)
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: X/OpenGL Maintenance List
: Regression
Depends On:
Blocks: 170416
  Show dependency treegraph
Reported: 2005-10-06 16:16 EDT by Les
Modified: 2007-11-30 17:07 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2006-05-16 12:21:15 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
log file dated March 10, 2006 (38.00 KB, text/plain)
2006-03-28 08:21 EST, Deborah Flad
no flags Details
log file dated Feb 12, 2006 (38.58 KB, text/plain)
2006-03-28 08:22 EST, Deborah Flad
no flags Details
file dated March 10, 2006 (3.37 KB, text/plain)
2006-03-28 08:25 EST, Deborah Flad
no flags Details
Xorg log from the time of the failure (48.66 KB, text/plain)
2006-03-28 09:28 EST, Calvin Webster
no flags Details
Current Xorg log (asymptomatic) (47.11 KB, text/plain)
2006-03-28 09:32 EST, Calvin Webster
no flags Details

  None (edit)
Description Les 2005-10-06 16:16:02 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050922 Fedora/1.0.7-1.1.fc4 Firefox/1.0.7

Description of problem:

This bug has appeared to have reoccured with the last xorg release:

We are unable to boot into runlevel5. X pegs at 99.9% CPU, will eventually crash server with E07F0 PROC1 PROC2 IERR. This has occurred on two identical servers that were updated at the same time. 

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Boot normal runlevel 5
2. X will not show on console
3. Remote SSH into box will result in CPU 100%, crash will occur in time. 

Actual Results:  E07F0 PROC1 PROC2 IERR

Expected Results:  Normal boot. 

Additional info:

Already tried modifying xorg.conf as stated in bug#153926 to no avail.
Comment 1 Deborah Flad 2005-11-04 09:01:36 EST
Setting up new Dell 2850, 2proc, 6G ram, RHEL4. All fine last nite. This AM
found server crashed. Dell LCD reporting "E07F0 Proc1 IERR |E07F0 Proc2 IERR".
Had to power off to resolve crash. Last thing I'd added to server was
"fonts-xorg-truetype-". I plan to remove this RPM and
see if server crashes again.
Comment 2 Andy Clark 2005-12-22 20:40:43 EST
Same problem here.
New Dell 2850, 2proc, 2GB RAM.
X worked fine after initial FC4 install.
Brought system up-to-date via 'yum'.
Display went blank after booting. SSHed in, 'top' showed "xorg" taking up 99% of
System locked up after a number of seconds.
Front began showing "E07F0 Proc1 IERR |E07F0 Proc2 IERR" error.

Also tried fix from bug#153926. No luck.

Running fine in init 3.
Comment 3 Stonie R. Cooper 2005-12-22 21:35:59 EST
Quick fix, although a kludge:

# rm -rf /lib/modules/*/kernel/drivers/char/drm

Then reboot.  The issue is with the drm, and the funky BIOS from Dell.
Comment 4 Calvin Webster 2006-01-06 15:56:47 EST
I am livid!!! When you pay this much for a system with (arguably) the best
Enterprise Linux OS you don't expect to be dead in the water after an update.

Thank you Stonie for the workaround.

Dell PowerEdge 2800, maxed out server (about $16K with govt discount). RHEL4

I hadn't updated in a while. After getting the last batch of updates off RHN
system console would not go past init messages (X wouldn't start). System slowed
down to a crawl - users complaining. Tried a restart (it's hard to drop Windoze
habits) but no diff.

Next morning I found dead system with hardware error code "E07F0" and "PROC 1
IERR PROC 2 IERR" on LCD display with amber background. Followed Dell
troubleshooter and reseated CPU's. This is no small task, removing the card
cage, fan cage, cables and heat sinks. Once it was reassembled, the error went
away, but still couln't get X to come up and system still slow. Now I find out
that this was all unnecessary!

Until I found this bug report it was still running slow with no X on console and
the LCD error display.

Instead of deleting the "DRM" drivers, I moved them to a temporary location.

I hope Red Hat comes up with a fix for this soon. This is no way to run an

We have a PowerEdge 2600 in another building that suffered no ill effects. It is
also running RHEL 4 and has the same updates. However, its display is an "ATI
Technologies Inc Rage XL (rev 27)", whereas the affected machine has an "ATI
Technologies Inc Radeon RV100 QY [Radeon 7000/VE]".

Anyone know the reperocussions of removing the DRM modules?
Comment 5 Andy Clark 2006-01-06 17:20:12 EST
Stonie's fix worked for me too. I just renamed the drm folder.

I don't think removing the DRM modules should matter on a server. Unless for
some reason you're running 3D apps. on your server.
Comment 6 Mike A. Harris 2006-03-07 23:05:49 EST
Red Hat Enterprise Linux customers are urged to directly contact Red Hat
support services for all support related problems, either by filing an
official support ticket at http://www.redhat.com/support or by telephone
at 1-888-RED-HAT1 depending on the type of subscription contract.

This ensures that any problems reported are given higher priority over
issues directly reported in bugzilla, as bugzilla is only a bug tracking
tool and is not a support mechanism.  If there are any questions about
our Red Hat Enterprise Linux support services, please contact Red Hat
support via web or phone as per above and they will provide assistance.

This issue seems similar to IERR issues reported by one of our partners
via our official support system.  While it is not clear if the issue
reported here is the same issue or is a different problem, the problem
will be investigated during the RHEL4 U4 update cycle.  There are
preliminary patches which we will be making available for testing in
the near future.

We will update this report once packages are ready for testing.

Thanks for your patience.
Comment 7 Deborah Flad 2006-03-08 08:22:58 EST
Stonie's fix has worked for me on my Dell PE2850s w/RHEL4, 4 servers to date, 
most re-built a few times now w/same results each time. This after Dell had me 
remove DRAC cards on two and they replaced sub-processor boards and some other 
parts and we still got same IERR msg. I replaced DRACs, rebuilt servers, did 
up2date, got IERR, did Stonie's fix and haven't had prob since on any of the 
servers. All my 'real' servers are PE2850s w/RHEL4 and Oracle 10gR2 or 

An aside: I have a Compaq DL380 "sandbox" w/CentOS4, SVN & MySQL but can't 
remember if it had same problem. Willing to rebuild it with *NIX distro of your 
choice if that might be helpful :)
Comment 8 Mike A. Harris 2006-03-28 07:42:57 EST
Thanks for the bug report.  We have reviewed the information you
have provided above, and there is some additional information we
require that will be helpful in our diagnosis of this issue.

Please attach your X server config file (/etc/X11/xorg.conf) and X
server log file (/var/log/Xorg.*.log) to the bug report as individual
uncompressed file attachments using the bugzilla file attachment link

We will review this issue again once you've had a chance to attach
this information.

Thanks in advance.
Comment 9 Deborah Flad 2006-03-28 08:21:03 EST
Created attachment 126900 [details]
log file dated March 10, 2006 

will send earlier one as well
Comment 10 Deborah Flad 2006-03-28 08:22:49 EST
Created attachment 126901 [details]
log file dated Feb 12, 2006
Comment 11 Deborah Flad 2006-03-28 08:25:13 EST
Created attachment 126902 [details]
file dated March 10, 2006

last of requested log files
Comment 12 Calvin Webster 2006-03-28 09:28:48 EST
Created attachment 126906 [details]
Xorg log from the time of the failure

The problem appeared 5 Jan 2006 when I installed the 2.6.9-22.0.2.ELsmp kernel
as part of an update. I did save the xorg.conf from the failure. It is attached
as "Xorg.0.log.20060106". I'm also attaching the current log (Xorg.0.log).

I have not seen the failure repeat since then. The 2.6.9-34.ELsmp kernel drm
modules are enabled but don't appear to be causing any load on the system.
Comment 13 Calvin Webster 2006-03-28 09:32:22 EST
Created attachment 126908 [details]
Current Xorg log (asymptomatic)

This is the current log using kernel 2.6.9-34.ELsmp. I am unaware of any
current problems similar to the one reported in this bug. It appears to have
been fixed, at least for our platform, in the updated kernel.
Comment 14 Mike A. Harris 2006-03-28 12:43:21 EST

Can you confirm this?
Comment 15 Les 2006-03-28 13:52:17 EST
Will advise...
Comment 21 Mike A. Harris 2006-04-24 20:46:17 EDT
We believe that this problem is fixed with the RHEL-4 U3 release.  Please
upgrade to this release and after rebooting into the latest kernel, confirm
wether the problem is resolved or still persists.

Thanks in advance.
Comment 22 Keiichi Mori 2006-04-24 21:04:00 EDT
What changes included in RHEL4 U3 release could solve this problem ?
Comment 23 Mike A. Harris 2006-04-24 21:40:37 EDT
(In reply to comment #22)
> What changes included in RHEL4 U3 release could solve this problem ?

* Mon Jan  9 2006 Mike A. Harris <mharris@redhat.com> 6.8.2-1.EL.13.25
- Updated xorg-x11-6.8.2-ati-radeon-7000-disable-dri.patch with additional
  changes to resolve regressions in bug (#170008).

Comment 24 Mike A. Harris 2006-04-26 17:31:20 EDT
To expand a bit on comment #23, we have reviewed the attached log files,
and from comment #12, the log starts:

X Window System Version 6.8.2
Release Date: 9 February 2005
X Protocol Version 11, Revision 0, Release 6.8.2
Build Operating System: Linux 2.4.21-25.ELsmp i686 [ELF] 
Current Operating System: Linux pegasus 2.6.9-22.0.1.ELsmp #1 SMP Tue Oct 18
18:39:27 EDT 2005 i686
Build Date: 19 September 2005
Build Host: porky.build.redhat.com

Comment #13's log shows:

X Window System Version 6.8.2
Release Date: 9 February 2005
X Protocol Version 11, Revision 0, Release 6.8.2
Build Operating System: Linux 2.6.9-22.18.bz155725.ELsmp i686 [ELF] 
Current Operating System: Linux pegasus 2.6.9-34.ELsmp #1 SMP Fri Feb 24
16:54:53 EST 2006 i686
Build Date: 11 January 2006
Build Host: hs20-bc1-7.build.redhat.com

Which would seem to indicate the build date being a newer build of X.
We assume that changes that went into the build just prior to that,
which were to disable DRI on Radeon 7000 due to various instabilities,
are responsible for solving this, and several similar problems reported
over time.  Since this bug report is claiming a DRI induced lockup on
Radeon 7000 hardware, it is reasonably likely that it is the same issue,
and that by disabling DRI completely, the problem should no longer

I have made the latest xorg-x11-6.8.2-103.EL build available at my
FTP space on people.redhat.com for people experiencing this problem
to test.  Please test these rpms and update the report as soon as
possible to indicate wether the problem is resolved in this build
or not.  The patch for this problem has been present in our builds
for a while now, and seems to solve the problem for all those who
have reported it thus far.

The rpms are available at the following temporary location.  Please
download them as soon as possible even if you can't test them
immediately, as I don't have much disk quota and may have to remove
them soon to make room for other test packages.


If the problem does persist after this update, please attach a new X server
log file and config file as individual uncompressed file attachments of
type "text/plain" that are readable in the web browser, and we will review
the new info and continue our diagnosis.

Thanks in advance.
Comment 25 Mike A. Harris 2006-05-10 08:26:47 EDT
Please provide a status update on the results of testing the above
packages, and any other comments/feedback.  Since we believe this issue
may be resolved as per above, this bug will be marked as resolved soon
unless we receive feedback that the issue is still present in our latest
test packages.

Thanks in advance.

Comment 26 Mike A. Harris 2006-05-16 09:44:01 EDT
Several weeks have gone by and we haven't heard any feedback.  Do the
test packages provided above resolve this issue?

Comment 27 Les 2006-05-16 10:03:50 EDT
Sorry folks, it's been a few busy weeks, and I've been out of country. If the
kernel fixes in U3 have resolved this issue for other users, then move to close
this case, I won't be able to schedule reboots on these servers for a month
because they went into production (and don't really need runlevel 5 regardless.)
I'll advise only if updates do not work.  :-)
Comment 28 Deborah Flad 2006-05-16 10:14:48 EDT
Servers that had problem are fully up2date and have notseen problem since doing 
Stonie's fix. Servers were pressed into service last month so I was reluctant 
to apply the test pkgs.
Comment 29 Mike A. Harris 2006-05-16 12:21:15 EDT
Ok, closing as fixed in ERRATA.  If the problem recurs, please file a new
support ticket with Red Hat support services at http://www.redhat.com/support
or via phone at 1-888-RED-HAT1.

Comment 30 Calvin Webster 2006-05-16 12:23:32 EDT
I apologize for not responding sooner. I didn't realize until recently that
comment #24 was directed at me (re: comments # 12 & 13). I've been out of town
several times in the past few months and wasn't closely monitoring my mailing
lists and forums.

I probably missed the download window where test packages were available
According to "http://kbase.redhat.com/faq/FAQ_80_1014.shtm", "686" architecture
is appropriate for Xeon processors.

I've seen no recurrence of the problem with the latest updated RHN packages.
This server's cpu load is normal and Xorg log has no failure indications.

I'm willing to install test updates if you still want them, but it appears that
disabling Digital Rights Management clears up the problem. 

The question remains, however, why DRM should have such a profound impact on
hardware devices.DRM is not a big concern here, though, unless it brings down
our systems. We don't play movies or music on our servers and we observe all
intellectual property licenses.
Comment 31 Kevin E. Martin 2006-05-16 13:49:26 EDT
(In reply to comment #30)
> I'm willing to install test updates if you still want them, but it appears that
> disabling Digital Rights Management clears up the problem. 

Unforutnately, DRM has become an overloaded acronym.  In the current context,
DRM stands for Direct Rendering Manager, which is a kernel module to manage
direct rendering contexts.  It has nothing to do with Digital Rights Management.

> The question remains, however, why DRM should have such a profound impact on
> hardware devices.DRM is not a big concern here, though, unless it brings down
> our systems. We don't play movies or music on our servers and we observe all
> intellectual property licenses.

The issue here is that the Direct Rendering Manager requires the AGP module, and
there are some known kernel interactions with AGP and Radeon 7000/VE cards. 
And, as Mike described above, there was a fix to properly disable the direct
rendering on the Radeon 7000 in RHEL4U3.
Comment 32 Calvin Webster 2006-05-16 14:11:43 EDT
My bad... Here I was, righteously indignant about how the Digital Rights Gestapo
was ruining my life. :) So, what's the difference between changes in the
official RHN xorg packages and the test packages? I imagine I'll see them in
future package updates.

Am I correct in assuming that "DRM" changes _should_ only affect the physical
console display? We have "remote" X sessions available to some of our local
developers through VNC. However, none of this should affect rendering in their
sessions since they're using virtual displays. Our physical console display is
almost never used. Even when I connect to run up2date I do so through VNC using
the Xorg vnc module.

Note You need to log in before you can comment on or make changes to this bug.