236416 – RHEL 5 fails to get EDID data from monitor and sets low resolution

Bug 236416 - RHEL 5 fails to get EDID data from monitor and sets low resolution

Summary: RHEL 5 fails to get EDID data from monitor and sets low resolution

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	xorg-x11-drv-vesa
Sub Component:
Version:	5.0
Hardware:	i386
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Adam Jackson
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	222082 RHEL5u2_relnotes RHEL5u3_relnotes
TreeView+	depends on / blocked

Reported:	2007-04-13 18:41 UTC by wdc
Modified:	2008-11-19 00:30 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	(x86) When running the bare-metal (non-Virtualized) kernel, the X server may not be able to retrieve EDID information from the monitor. When this occurs, the graphics driver will be unable to display resolutions highers than 800x600. To work around this, add the following line to the ServerLayout section of /etc/X11/xorg.conf: Option "Int10Backend" "x86emu"
Clone Of:
Environment:
Last Closed:	2007-09-18 21:34:14 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Log output showing successful VBE DDC read (93.58 KB, text/plain) 2007-04-13 18:41 UTC, wdc	no flags	Details
Log output showing VBE DDC read with no data (90.87 KB, text/plain) 2007-04-13 18:46 UTC, wdc	no flags	Details
Sysreport of target system running RHEL 5 (345.35 KB, application/octet-stream) 2007-05-11 23:10 UTC, wdc	no flags	Details
Patch to debug EDID BIOS fetch (2.03 KB, patch) 2007-05-28 18:30 UTC, Charles R. Anderson	no flags	Details \| Diff
Xorg.0.log run 1 showing full EDID read (95.02 KB, text/plain) 2007-05-28 18:31 UTC, Charles R. Anderson	no flags	Details
Xorg.0.log run 2 showing full EDID read (95.02 KB, text/plain) 2007-05-28 18:32 UTC, Charles R. Anderson	no flags	Details
Xorg.0.log run 3 showing partial EDID read (94.72 KB, text/plain) 2007-05-28 18:32 UTC, Charles R. Anderson	no flags	Details
Diff between Xorg.0.log run 2 and run 3 (3.46 KB, patch) 2007-05-28 18:33 UTC, Charles R. Anderson	no flags	Details \| Diff
Log of successful DDC read, RHEL 4.5 with debug patch applied. (93.80 KB, text/plain) 2007-05-29 22:04 UTC, wdc	no flags	Details
Run of Xorg 6.8.6 under RH5 -- EDID all zeros (92.20 KB, text/plain) 2007-06-12 16:29 UTC, wdc	no flags	Details
Run of Xorg 6.8.6 under RH5 -- EDID partial transfer (92.85 KB, text/plain) 2007-06-12 16:36 UTC, wdc	no flags	Details
Patch to cut out audit call in the int10 emulator. (538 bytes, patch) 2007-07-10 22:46 UTC, wdc	no flags	Details \| Diff
Log of proffered but unused 1680x1050 resolution (94.81 KB, text/plain) 2007-07-11 20:27 UTC, wdc	no flags	Details
xorg.conf file used for testing RHEL 5.1 beta X server (1.19 KB, text/plain) 2007-08-02 22:49 UTC, wdc	no flags	Details
Log of run of RHEL 5.0 debugging X server. It sets 1400x1050. (94.92 KB, text/plain) 2007-08-02 22:50 UTC, wdc	no flags	Details
Log of run of RHEL 5.1 X server and vesa driver. Configs 1280x1024 (94.40 KB, text/plain) 2007-08-02 22:51 UTC, wdc	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
FreeDesktop.org	10238	0	None	None	None	Never

Description wdc 2007-04-13 18:41:43 UTC

Description of problem:

I just installed RHEL 5 client, and noticed that sometimes the X resolution is properly set, as I specified, 
to 1200x1024, but often, upon restart of the X server, it dumbs down the resolution to 800x600.

I will attach two Xorg.0.log outputs showing how the VESA BBE DDE read is said to be successful,
but in the dumb-down case no actual data comes in that enables proper configuration of the monitor.

This problem DOES NOT occur under RHEL 4.5 beta, nor does it occur using a third party fglrx driver.

Version-Release number of selected component (if applicable):

X Window System Version 7.1.1

How reproducible:

Often.  Note that I've got a VGA LCD attached via an adapter cable to the ATI Radeon X1300 pro card.
At first, I thought it might be because the adapter or the display or the card was flaky in reporting the 
VESA data.  But when I NEVER got failures under RHEL 4.5, I began to suspect something amiss in the
VESA DDI read.

Steps to Reproduce:
1. Start the X server
  
Actual results:

800x600 resolution, and no actual data from the VESA BBE DDC read showing up in Xorg.0.log

Expected results:

Proper detection of the monitor through proper data being returned by the VESA VBE DDC read showing
monitor Manufacture string, and all the other relevant data, and an ultimate result in proper 
configuration at 1200x1024

Additional info:

Comment 1 wdc 2007-04-13 18:41:43 UTC

Created attachment 152574 [details]
Log output showing successful VBE DDC read

Comment 2 wdc 2007-04-13 18:46:03 UTC

Created attachment 152575 [details]
Log output showing VBE DDC read with no data

Note that in this log file, the VESA VBE DDC read is declared successful, but
the Manufacture string, and all the other relevant data needed to configure has
NOT been obtained.

This log file was obtained on the exact same hardware as the Xorg.0.log.good
file.  My process was:

Start RHEL 5.
Notice X was configured correctly.
Log in.
Save the X.org.0 log file as X.org.0.log.good
Log out.  (And the X server restarted as per apparent config defaults.)
Notice that X was dumbed down to 800x600.
Log in.
Save the X.org.0 log file as X.org.0.log.bad

Comment 3 Jason Baron 2007-05-01 18:24:26 UTC

hmmm, is it possible that System->Preferences->Screen resolution menu is set to
800x600 thereby over-riding the system configuration on a per-user basis?

Comment 4 wdc 2007-05-01 19:51:01 UTC

Well, the screen resolution problem occurs even when logged in as root.

The resolution menu offered by the System Preferences, after the X server has decided to dumb itself
down to 800x600 offers no higher resolution than 800x600.

When I first installed RHEL 5, I got 800x600, but I ran some tool and specified 1280x1024, and that's
when I got to this state of affairs where it sometimes does and sometimes does not work. I regret that I
did not take careful note of which tool I ran. It was probably "system-config-display".

I am using a Dell Dell E196FP display on the Optiplex 745.
I have now used system-config-display to set that explicitly as the monitor.

Here's the odd thing:

When the VESA data transfer is successful, the monitor clearly reports that its optimal resolution setting
is 1280x1024x60, and the setup is correct.

When the VESA data transfer is unsuccessful, the X.org.0.log file reports that the
1280x1024x75 resolution is being tried multiple times, but that the 1280x1024x60 resolution
is **NEVER** tried. I wonder why this is so. I also wonder why no 1024x768 resolutions are being tried.

Even when the ACTUAL monitor I'm using is specified explicitly, no higher resolution than 800x600 is
offered when the VESA DDB data transfer fails.

I see two questions to answer here:

1. Why doed the VESA DDB transfer sometimes report success when no data is transferred?

2. Why does the X server never try 1024x768 resolutions, nor 1280x1024x60? It tries a WHOLE LOT of
them, as can be shown in the Xorg.0.log file.

----

Should I also ask you why you are asking me about user-level configuration settings, when the Xorg.
0.log file already shows that a whole bunch of resolutions, never offered in those user-level
configuration commands are being tried, and abandoned for reasons that have nothing to do with the
user-level configuration settings, and everything to do with the perceived capabilities of the monitor?

Or am I completely misreading the Xorg.0.log file here?

Comment 5 wdc 2007-05-10 21:59:17 UTC

I am disappointed that 10 days have gone by and nobody has followed up.
I guess nobody cares that the latest update to RHEL BROKE X server configuration.

I REALLY would like some help with this.

I've just taken RHEL 4.5, and either I've found a way to more consistently specify a broken configuration, 
or whatever you broke in RHEL 5 you've BACK PORTED to 4.5, because the RHEL 4.5 beta worked great, 
but the RHEL 4.5 that was released is ALSO BROKEN.

Let's get hopping on understanding this problem and fixing it QUICKLY!

Comment 6 wdc 2007-05-10 22:14:06 UTC

I've tested with another monitor, the Dell 2007WFP LCD.  Via the VGA connector. In this case, the VESA 
data seems to be correctly fetched by both RHEL 5 and RHEL 4.5, but the monitor VERY CAREFULLY 
configures itself to CHOP OFF the topmost 30 or so pixels.  Dell has no vertical size control so I get my 
choice of having the tool bar or the panel chopped away.  

This is unacceptable, and extremely frustrating.  How can I help MIT customers adopt RHEL 4.5 and RHEL 
5 when basic X display configuration has been so badly and obviously broken.

Ok, you folks don't see the test case, let's get someone back to me QUICKLY so both MIT and Red Hat see 
the same symptoms, and pool our collective understanding.

Comment 7 wdc 2007-05-11 23:10:54 UTC

Created attachment 154575 [details]
Sysreport of target system running RHEL 5

Comment 8 wdc 2007-05-11 23:28:41 UTC

In the interests of being helpful I have attached sysreport output of the relevant system.
Probably our next step is to decide if we have one bug or two here. The overall symptom is that
X is not properly configured.  But that could be due to two separate issues:

1. Failure to get consistently good data from the VESA DDB transfer.
2. X chops off the topmost 50 pixels when exact correct display is specified in the System-
>Administration->Display tool.

Comment 9 wdc 2007-05-11 23:51:03 UTC

There's something else interesting going on.  Yesterday the monitor would configure and chop off the top.
Today I can't seem to establish an xorg.conf that will drive the monitor at that size any more.  I either get 
800x600, or I get a complaint that I'm driving the monitor too hard.  

I *THINK* it's because the xorg.conf I'm now playing with does not contain explicit resolution settings, and 
so it's tryign to get them from the failed VESA DDB transfer.

Comment 10 Alan Matsuoka 2007-05-16 19:40:50 UTC

RHEL problems will get attention if they are filed via your TAM. Since this
works under RHEL4.5 I'll mark this as a regression.

Comment 11 RHEL Program Management 2007-05-16 19:46:15 UTC

This bugzilla has Keywords: Regression.  

Since no regressions are allowed between releases, 
it is also being proposed as a blocker for this release.  

Please resolve ASAP.

Comment 14 wdc 2007-05-16 20:34:33 UTC

MIT does not have a TAM.  It has something to do with the business model of, Since 1860, compaines 
have paid for the privelege of collaborating with MIT; Why does Red Hat insist on charging a premium 
price for the privilege of getting bugs taken seriously by the very community that helped create Linux in 
the first place... But I digress.

Inasmuch as this is a basic problem that will affect MANY users of RHEL 5, it seems in Red Hat's best 
interest to resolve it quickly.  The position expressed by "pm-rhel" seems quite wise.

Comment 19 wdc 2007-05-21 21:17:02 UTC

Since last I posted to this bug on 16 May, I've done some more careful testing and I understand a LOT
more about this situation.

Bottom Line summary: The RHEL 4.5 X server is performing acceptably. The RHEL 5 X server suffers
from a problem with DDC fetch that ALSO affects Ubuntu 7.04, and perhaps SuSE SLED 10.1.
I've searched the x.org bug tree and found two relevant bugs:
https://bugs.freedesktop.org/show_bug.cgi?id=6886
https://bugs.freedesktop.org/show_bug.cgi?id=10238
I've subscribed to the later one, and we'll see if the X.org folks respond.

Detail:

I needed to be told how to create a baseline xorg.conf file. Once I did that, I was able to carefully test
RHEL 4.5 and RHEL 5. Along the way, I discovered that some of the extremely good performance I was
getting under RHEL 4.5 beta was because I'd installed the ATI proprietary driver but FORGOT. (Oops.)

The detailed behavior I got while testing RHEL 4.5 is:

On the Optiplex 745 with the ATI Radeon X1300 Pro:
up to 1280x1024 works via VESA.
up to 1400x1050 works via DVI
If your xorg.conf specifies 1400x1050 the VESA display will be too big for the screen.
If your xorg.conf specifies 1600x1200 the VESAS display will draw a blank, but the DVI display will
know to not use that setting.

This seems reasonable, albeit non-ideal behavior to me.

Creating a baseline xorg.conf file under RHEL 5, I re-ran tests and determined:
The X server will not run AT ALL when connected via DVI.
When connected via VESA, the DDC transfer fails, forcing the X server to dumb down to 800x600.
If one explicitly provides Modeline directives in the xorg.conf file, the X server can be driven at up to
1280x1024 when connected via the VESA port. Perhaps higher resolutions are possible, but so far
I don't have a Modeline for better than that.
When connected via DVI, the X server WILL NOT START AT ALL. The monitor complains of being
over-driven.

DDC transfers under RHEL 5, with the X server version 7.1.1 always fail, both on the VESA port and on
the DVI port.

Ubuntu 7.10 seems to suffer the same fate. There is a long winded bug report about this at:
https://bugs.launchpad.net/ubuntu/+source/xorg/+bug/89853

It is still unclear to me whether Red Hat, the Ubuntu community or X.org do or do not understand the
root cause of this problem. Perhaps between the four of us we can converge on a useful fix.

Comment 22 Charles R. Anderson 2007-05-28 18:25:09 UTC

wdc and I dug into the X server sources, and produced the attached patch, with
interesting results.

Issues:

1.  The initialization of the EDID buffer carefully memset's 4 bytes to zero
because it uses the size of the pointer to the structure instead of the size of
the structure itself.  However, in our patch we use the constant 128, because
that is the size of an EDID buffer (as described in the EDID documentation we
found on the Web.)

2. When the EDID transfer fails and gives us an EDID buffer full of zeros,
xf86InterpretEDID in interpret_edid.c silently fails and returns NULL.  We
changed the code to report this error condition.

3. The EDID fetch from the BIOS is DEFINITELY flaky in a time-dependent way.
We inserted a sleep(2) into vbeReadEDID in vbe.c which seems to improve things
somewhat, but running Xorg multiple times results in EDID fetches in various
states of completion, with the buffer only being filled up to a certain point,
followed by zeros.  We copied the hex dump code from print_edid.c into vbe.c so
that the EDID buffer could be viewed immediately after the BIOS fetch.

Attached is the patch against xorg-x11-server-1.1.1-48.13.0.1 along with
Xorg.0.log files from successive runs with this patch showing the EDID buffer in
various states of fill.

Comment 23 Charles R. Anderson 2007-05-28 18:30:55 UTC

Created attachment 155549 [details]
Patch to debug EDID BIOS fetch

Comment 24 Charles R. Anderson 2007-05-28 18:31:36 UTC

Created attachment 155550 [details]
Xorg.0.log run 1 showing full EDID read

Comment 25 Charles R. Anderson 2007-05-28 18:32:00 UTC

Created attachment 155551 [details]
Xorg.0.log run 2 showing full EDID read

Comment 26 Charles R. Anderson 2007-05-28 18:32:28 UTC

Created attachment 155552 [details]
Xorg.0.log run 3 showing partial EDID read

Comment 27 Charles R. Anderson 2007-05-28 18:33:28 UTC

Created attachment 155553 [details]
Diff between Xorg.0.log run 2 and run 3

Comment 28 Charles R. Anderson 2007-05-28 18:38:13 UTC

If you remove our "sleep(2);" from vbe.c the hex dump output from the EDID fetch
from the BIOS  pretty much always comes up all zeros.

Comment 29 wdc 2007-05-29 22:04:02 UTC

Created attachment 155646 [details]
Log of successful DDC read, RHEL 4.5 with debug patch applied.

Today I built the X server under RHEL 4.5, applying the relevant portion of the
debug patch that performs the hex dump of the EDID fetch.  I ran Xorg several
times.	Always the result is the same: PERFECTLY RELIABLE fetch of the EDID
data!

I also looked at the differences in the int10 logic that seems to be doing the
nuts and bolts of the EDID fetch.  Although I might have missed something, I
think they are substantially the same.	This causes me to conclude that what we
have is a KERNEL bug, not an X server bug.  Perhaps something is playing fast
and loose with the real mode emulation that serves the VBE?

Since this problem seems also to affect Ubuntu 7.04 (although I can't get it to
consistently fail), we're probably talking about a kernel bug introduced
between 2.6.9 and 2.6.18.  (The Ubuntu 7.04 Desktop install CD which HAS the
problem uses 2.6.20-15.)

QUESTION: What further steps should I take to clarify that the fault lies in
the kernel and not in X?

Comment 30 wdc 2007-05-30 21:25:05 UTC

Today I did two things:

1. I experimented under Ubuntu 7.04 to try and learn more -- I got partial EDID transfers, but no clue 
how to control when the transfers were partial and when they were complete.

2. I found a package called "read-edid" that alleged to use the VM86 code in a stand-alone mode to 
perform the problematic EDID fetch.

See:http://john.fremlin.de/programs/linux/read-edid/

A debian package was available for Ubuntu.  Running the program ALWAYS gets a 100% good EDID 
fetch.

Building the package from source under RHEL 5, and running it ALSO ALWAYS gets a 100% good EDID 
fetch.

So now the question is, "What is happening to make stand-alone edid-get successful but X.org fetch 
un-successful?"

Someone suggested that there may be a memory caching issue involved.  get-edid is a small program, 
wehreas X is rather large, so that's not so far-fetched an idea.

My next task will be to read the get-edid code, and try to understand if it is doing the same thing the X 
server is doing.

ANY insight froma anyone else reading this bug report would be MOST welcome.

Comment 32 wdc 2007-06-12 16:29:37 UTC

Created attachment 156806 [details]
Run of Xorg 6.8.6 under RH5 -- EDID all zeros

Comment 33 wdc 2007-06-12 16:36:12 UTC

Created attachment 156807 [details]
Run of Xorg 6.8.6 under RH5 -- EDID partial transfer

I believe this Xorg.0.log output demonstrates we have a bug that WAS NOT
introduced between Xorg 6.8.6 and Xorg 7.1.1.

I tried to build Xorg 6.8.6 under RHEL 5 but hit a wall.
I tried to install RHEL 4.5's 6.8.6 on RHEL 5 but made a mess.
After cleaning up the mess well enough to get 7.1.1 running again I tried a
different tack to get Xorg 6.8.6 just running enough to do the EDID transfer.

Since RHEL 4.5 was in another partition, I ran Xorg out of there.  Additional
arguments were needed.
The command line that got me far enough was:

/rhel4/usr/X11R6/bin/Xorg -config /rhel4/etc/X11/xorg.conf -modulepath
/rhel4/usr/X11R6/lib/modules/

The first new attachment Xorg.0.log-rh5-6.1-a is not sufficient.  It only shows
all zeros in the EDID transfer, and that could be caused by something else not
working as we kludge the Xorg run between major linux versions.

The secont new attachment, however, Xorg.0.log-rh5-6.1-b IS sufficient, I
believe because is shows a PARTIAL EDID transfer.

Xorg could not run enough to really run. (It couldn't find font "fixed" because
of how things are re-organized, but I very strongly believe that it DID run far
enough to do an EDID transfer, and to manifest EXACTLY THE SAME bug we are
experiencing under 7.1.1 under RHEL 5: A timing dependent flaky EDID transfer.

Comment 39 Adam Jackson 2007-06-25 17:59:57 UTC

Out of curiosity, does it work reliably when using a xen kernel, or on non-x86?

The reason I ask is, vm86 is known to be unreliable when using xen, and is
simply unavailable on other arches.  So for everything other than baremetal i386
kernels, we use a x86 real-mode emulator to execute VBE calls.  The logs given
appear to all be from non-xen machines.  I would be thrilled to learn that the
emulator is more reliable.

There is also an option to force use of the emulator, by saying:

Option "Int10Backend" "x86emu"

in the ServerLayout section of xorg.conf.

Comment 41 wdc 2007-06-28 01:33:02 UTC

It may indeed be that the emulator is more reliable.

I've just added that line to the xorg.conf file, and run the X server a couple times.
Previously the EDID buffer would contain a random amount of data and the rest be all zeros.
This time the EDID buffer was consistently full and the data remained the the same
across multiple runs.

This is good evidence that the problem is in the vm86old code. (We're guessing that the code for 
auditing has inappropriately messed up the registers, and plan to build a kernel to test that theory in  a 
few days.)

The problem here though, is that people will not be able to run X far enough to put in a fix.  What 
options do you think should be pursued to help people get a default install of RHEL 5 and the other 
2.6.18+ kernels to get something that works from the get go?

Comment 43 wdc 2007-07-10 22:46:14 UTC

Created attachment 158912 [details]
Patch to cut out audit call in the int10 emulator.

Today we built a kernel with the attached patch that disables the code that
called audit_syscall_exit.
Although those nasty error messages about freeing multiple audit contexts came
back, the EDID transfers
were once again 100% successful.  (Yes, I was careful to use an xorg.conf file
with x86emu disabled.  I tested a stock kernel build to confirm I had a good
build process, and that the stock kernel tickled the bug.)

So it seems that the way audit_syscall_exit is called is trashing the registers
and making the EDID transfer flaky.  This is probably appropriately classified
as a regression and probably needs to be fast-tracked to the original author so
he or she can fix up the call.	We have a very reproducible test case and test
setup to test candidate kernel patches.  (We didn't feel we understood things
well enough to propose a change ourselves.)

Comment 44 wdc 2007-07-10 22:58:49 UTC

I have a bug open at kernel.org where I asked for help looking at this. I'll mention there that this
regression is the root cause. Would it be appropriate for Red Hat to weigh in and lobby for examination
of that bug?
http://bugzilla.kernel.org/show_bug.cgi?id=8633

Now that we understand the root cause, and have a work-around, what next steps should we take?

Ideally the kernel regression will eventually be remedied.

Should we consider lobbying freedesktop.org to make the x86emu as int10backend the default for x86
in addition to everything else?

There are additional bugs in the X server, once the EDID data is acquired with 100% fidelity:

1. Plugged into the VESA connector, 1400x1024 resolution will configure if requested, but it will chop
off the topmost quarter inch and the leftmost inch of pixels. Modern Dell LCDs no longer support the
ability to control the vertical or horizontal size so this is an unpleasant state of affairs.

2. The EDID data provides a detailed modeline for 1680x1050 operation which is ignored.

I guess I should take these up with freedesktop.org. Do people think I should open a Red Hat bugzilla
bug on these two issues?

Finally there is the issue that the X server does not properly report the EDID transfer failure. I will take
the freedesktop.org bug I have open about this and lobby for my patch to be considered as a remedy.
Here too, I wonder if Red Hat weighing in on the bug would be useful?
https://bugs.freedesktop.org/show_bug.cgi?id=10238

Mr. Jackson et. al., what do you advise as the best way forward?

Comment 45 Adam Jackson 2007-07-11 19:53:45 UTC

(In reply to comment #44)

> Now that we understand the root cause, and have a work-around, what next steps
should we take?
> 
> Ideally the kernel regression will eventually be remedied.
> 
> Should we consider lobbying freedesktop.org to make the x86emu as int10backend
the default for x86 
> in addition to everything else?

We're already doing this for Fedora 7 and later, and I'm certainly telling
everyone I can upstream that vm86 is insane.  I wish I'd flipped this switch
before FC6, so it would have been incorporated in EL5, but the fear that the
emulator would prove to be a regression relative to EL4's behaviour was too
high.  (And justified, it turns out, since several x86emu bugs have been fixed
since 5.0.)

In the meantime, I'm investigating a way to magically invoke the x86emu backend
for DDC transfers if the vm86 method fails.  It's slightly hairy due to
namespace issues but I think it's doable.  (Setting devel ack for 5.1, we should
include this if I get it working.)

> There are additional bugs in the X server, once the EDID data is acquired with
100% fidelity:
> 
> 1. Plugged into the VESA connector, 1400x1024 resolution will configure if
requested, but it will chop 
> off the topmost quarter inch and the leftmost inch of pixels.  Modern Dell
LCDs no longer support the 
> ability to control the vertical or horizontal size so this is an unpleasant
state of affairs.
> 
> 2. The EDID data provides a detailed modeline for 1680x1050 operation which is
ignored.

The X logs in this bz seem to all show the use of the vesa driver.  The vesa
bios interface is limited in terms of output setup capability.  In particular,
there are two sets of modes: the set that the monitor reports it can display,
and the set that the bios reports it can configure.  It's literally not possible
to ask the bios to set up a mode outside its list, so the best we can do with
the vesa driver - or any other driver that uses the vesa bios mode setting
interface - is pick a "good" mode that happens to be in both lists.

So regarding these two issues, assuming they're occuring with the vesa driver. 
The first sounds like we're either picking a mode that's larger than the monitor
- in which case, 5.1 includes a vesa driver update that should address this
issue - or that the mode we're selecting is not being programmed properly by the
video bios, in which case we're just out of luck.  The second problem sounds
like the 1680x1050 mode is advertised by the monitor but not by the bios, in
which case we are again out of luck.

If my assumptions are incorrect here, I would certainly like to see an X log of
the failure case(s).

In general, these limitations mean that although the vesa driver is supported,
it's not recommended for regular use, and we strongly prefer that people use
native drivers wherever possible.  The configuration infrastructure in EL5
should be smart enough to pick the correct native driver when one is available.

> Finally there is the issue that the X server does not properly report the EDID
transfer failure.  I will take 
> the freedesktop.org bug I have open about this and lobby for my patch to be
considered as a remedy.  
> Here too, I wonder if Red Hat weighing in on the bug would be useful?
> https://bugs.freedesktop.org/show_bug.cgi?id=10238

That looks pretty good; I'll take it up upstream.  Thanks!

Comment 46 wdc 2007-07-11 20:21:15 UTC

Invoking x86emu if the DDC fails sounds hairy, scary and a lot of work.  Thanks for putting in the effort 
to make it right!

Indeed the X resolution issues I am having are occurring with the VESA driver.  Apparently the x.org ATI 
driver does not yet know about the R500 chip set that the x1300 and x1400 use.  The reverse 
engineered driver will, I'm sure, eventually benefit this driver.  It will be interesting to test the RHEL 5.1 
X server to see which driver it picks.

I'll attach Xorg.0.log output showing the 1680x1050 mode that the EDID fetch offers, and how it's not 
used.  I'm still not sure I'm totally up to speed on reading the log output, so I'd be grateful if you'd call 
my attention to the lines where the BIOS denies support for that mode.  Is it in those long, detailed 
segments?  Indeed I see a 1600x1200 go by, and a 1400x1050 go by, but indeed no 1680x1050.

Comment 47 wdc 2007-07-11 20:27:26 UTC

Created attachment 159000 [details]
Log of proffered but unused 1680x1050 resolution

See lines 461 and 462:
 
  (II) VESA(0): h_active: 1680	h_sync: 1728  h_sync_end 1760 h_blank_end  
1840 h_border: 0
  (II) VESA(0): v_active: 1050	v_sync: 1053  v_sync_end 1059 v_blanking:  
 
and line 488:
 
  (II) VESA(0): Modeline "1680x1050"  119.00  1680 1728 1760 1840  1050 10
53 1059 1080 -hsync +vsync
 
Here the VESA transfer offers the mode.  Why exactly isn't it being used?

Comment 48 wdc 2007-07-11 21:32:43 UTC

I just had a thought!

How will you detect a bad EDID transfer?  The kernel bug causes the transfer to OFTEN come up all zeros, 
but sometimes it gets a partial transfer padded out with zeros.  Does the EDID block have a checksum in it 
that you can compute and test?  The current code just looks at the first few bytes for a version number 
and uses that to decide the transfer was good.

If you can't detect a zero-padded partial transfer, then your additional work to use x86emu may be 
wasted.

Comment 50 Adam Jackson 2007-07-26 17:42:29 UTC

(In reply to comment #47)
> Created an attachment (id=159000) [edit]
> Log of proffered but unused 1680x1050 resolution
> 
> See lines 461 and 462:
>  
>   (II) VESA(0): h_active: 1680	h_sync: 1728  h_sync_end 1760 h_blank_end  
> 1840 h_border: 0
>   (II) VESA(0): v_active: 1050	v_sync: 1053  v_sync_end 1059 v_blanking:  
>  
> and line 488:
>  
>   (II) VESA(0): Modeline "1680x1050"  119.00  1680 1728 1760 1840  1050 10
> 53 1059 1080 -hsync +vsync
>  
> Here the VESA transfer offers the mode.  Why exactly isn't it being used?

That's the EDID block's mode list.  Remember, I can only set modes to things in
the intersection of: in the VESA BIOS's mode list, and within the capabilities
reported by EDID.

So, yeah, 1680x1050 in the monitor, but not in the video BIOS, means no
1680x1050 for you.

(In reply to comment #48)
> How will you detect a bad EDID transfer?  The kernel bug causes the transfer
to OFTEN come up all zeros, 
> but sometimes it gets a partial transfer padded out with zeros.  Does the EDID
block have a checksum in it 
> that you can compute and test?  The current code just looks at the first few
bytes for a version number 
> and uses that to decide the transfer was good.

Yes, there is a checksum.  The last byte is set such that a cumulative sum of
all bytes in the block, modulo 256, is 0.

We do use this to reject bad EDID blocks.  See DDC_checksum() in
hw/xfree86/ddc/edid.c, and its caller in hw/xfree86/ddc/xf86DDC.c.

Comment 51 wdc 2007-07-26 21:04:38 UTC

I've looked at the code in xf86DDC.c, but there's something that confuses me:

How come I never saw a checksum error report in the log?  Clearly I was getting bad EDID reads.

What determines if the code that's doing the EDID fetch is from hw/xfree86/vbe/vbe.c where it can 
silently fail (unless you've taken my patch ;-)  ) and where no checksum is computed in the readEDID 
routine,
versus the code that's in hw/xfree86/ddc/edid.c?

Or are you saying that you plan to add checksum stuff like in ddc/... to vbe/...?

----

Thanks also for the clarification about the BIOS thing.

Comment 52 wdc 2007-08-02 22:42:04 UTC

Andrew:

I just installed the X server and VESA driver from the RHEL 5.1 beta.
Alas, it does one thing that is admittedly more correct but less desirable to me:

Previously, somehow the server would see that the display could handle 1600x1080, and even though 
no 1400x1024 mode was specifically offered, it would configure that mode.  (This got us into trouble 
when connected to the analog VESA port, but worked just fine on the digital port.)

Now, because there is no exact match, the display that used to be 1400x1024 is configured for 
1280x1024.

By the same token, that particular monitor offers 1600x1050, but not 1600x1200, so even though the 
vesa driver is improved and has a 1600x1200 mode, 1600x1050 is not configured because it is not an 
exact match.

Wasn't there partial match code being worked on?  I thought it was already in place.
Somebody is suffering with the latest Ubuntu because their card support 1280x1024, but their display 
only support 1280x800.  That run ends up finding no matching modes whatsoever.

I am concerned here that people will have gotten used to running 1400x1050 on these monitors under 
RHEL 4, but will not get degraded resolution of 1280x1024 after "upgrading" to RHEL 5.1.

I will attach the xorg.conf file and the Xorg.0.log files so that this all can be rigorously documented.

Comment 53 wdc 2007-08-02 22:49:38 UTC

Created attachment 160558 [details]
xorg.conf file used for testing RHEL 5.1 beta X server

Comment 54 wdc 2007-08-02 22:50:34 UTC

Created attachment 160559 [details]
Log of run of RHEL 5.0 debugging X server.  It sets 1400x1050.

Comment 55 wdc 2007-08-02 22:51:48 UTC

Created attachment 160561 [details]
Log of run  of RHEL 5.1 X server and vesa driver.  Configs 1280x1024

Comment 56 wdc 2007-08-10 17:52:29 UTC

Sorry to be a pest here.  I expect there are many important issues being worked as RHEL 5.1 beta testing 
proceeds.  I am concerned that people are going to consider this an imporper regression in behavior.
If there were a plan of attack in addressing it, I might be able to help do the work.

Comment 57 Adam Jackson 2007-08-14 14:00:30 UTC

The patch looks something like:

http://people.redhat.com/ajackson/omg-vbe-hax.patch

Utterly untested atm; going to try to hit that today.

Comment 58 wdc 2007-08-14 17:18:06 UTC

Although I've not bench checked it carefully, the patch looks plausible.

The issue that concerns me is not so much the EDID thing at the moment, but that the VESA update to the 
X server currently on track for dissemination as part of the RHEL 5.1 update does a worse job than the 
present one at finding the highest resolution even when the EDID transfer is 100% successful.

Andrew, should I open a different bug about that?  What do you think is the way I can be most helpful in 
identifying the root cause and fixing the new regression?

Comment 59 Adam Jackson 2007-08-15 16:02:03 UTC

(My name's Adam, btw.)

(In reply to comment #58)
> Although I've not bench checked it carefully, the patch looks plausible.
> 
> The issue that concerns me is not so much the EDID thing at the moment, but
that the VESA update to the 
> X server currently on track for dissemination as part of the RHEL 5.1 update
does a worse job than the 
> present one at finding the highest resolution even when the EDID transfer is
100% successful.

Yeah, that's intentional.

The issue is that you _really_ want to try for strict intersection of modes
between the monitor and the video BIOS in this case.  There do exist monitors
where the EDID list is literally all it can do.  Worse, there are monitors where
if (like your example) there's a VBIOS mode between the two largest EDID modes
like so:

   VBIOS         EDID
A:               1680x1050
B: 1400x1050
C:               1280x1024

and you attempt to set mode B, then the monitor will try to sync as though it's
mode C and the rest will just be off the screen.  Or go blank.  Either one is
unacceptable.

The other case we ran into was some laptop panels, which give you a
mostly-nonconformant EDID block that just contains a mode for the panel size and
nothing else, and of course no matching mode in the VBIOS.  In that case, strict
intersection of mode lists would mean the server just fails to start.

So the new heuristic is: Attempt strict intersection.  If doing so produces a
non-empty mode list, then use it.  Otherwise, revalidate the VBIOS mode list
against a range-based model of the EDID properties (using the sync ranges from
EDID if available, otherwise synthesizing them from an assumed minimum size of
640x480@60 and a max of whatever the EDID block reports as maximum), in the hope
that _something_ will survive validation and work.

This seems to be the least wrong thing to do.  Nonconformant panels get a best
effort, conformant panels get whatever the best intersection of BIOS and EDID
modes is, and we don't go wrong trying to do something the monitor doesn't
explicitly claim it's capable of doing.  This does mean some setups that used to
work at mode B (in the example above) now won't, but they'll still light up; in
exchange, some panels that would fail to do the right thing in mode B now do _a_
right thing, even if that happens to be mode C.  The vesa driver is intended to
be a conservative fallback driver anyway, so the real solution to the mode B
scenario is to use a native driver that doesn't use the VBIOS for output setup.

Comment 60 wdc 2007-08-15 17:11:07 UTC

Thanks very much for taking the time to provide a detailed clarification.
In light of those details, I'd have to agree that the new behavior is the least wrong thing to do.

Comment 61 Adam Jackson 2007-08-23 17:37:42 UTC

After some technical review, I've concluded that the patch in comment #57 is a
bad idea.  The act of initializing an int10 context on a non-primary card has
the side effect of posting the card.  This will blow away any state set up by
the driver prior to the VBE DDC call, which will almost certainly mean bad
rendering at best, and failure to launch or system hang at worst.

There's a more invasive change one could do where you'd set up the shadow x86emu
context _really_ early, and make sure to use the same maps for both vm86 and
x86emu execution, but that seems like a ton of work for very little return. 
Particularly since we know newer kernels have a working vm86 syscall.  Fixing
the kernel definitely seems like the right thing here.

Comment 62 Adam Jackson 2007-08-23 17:54:42 UTC

Cloned bug #254024 for a kernel fix, and moved the IT issue there.

This issue should be documented in a release note for 5.1.  Suggested text is
something like:

---
On i386 systems running the bare-metal (non-Xen-enabled) kernel, the X server
may not be able reliably retrieve EDID information from the monitor.  This may
manifest as the driver being unable to do larger than 800x600.

A potential workaround is to use an alternative method to query the monitor, by
adding the line:

Option "Int10Backend" "x86emu"

to the ServerLayout section in /etc/X11/xorg.conf .
---

Comment 63 wdc 2007-08-23 18:02:59 UTC

Adam,

(First, sorry for getting your name wrong.  I've got it right now!)

Wow!  You've done more rigorous testing than I would have been able to do.  And it looks like you've 
chased down the subtleties really well.

I agree that pursuing the kernel fix is the best thing.  My friend Chuck nudged the linux-kernel list to 
raise the visibility of the bug.  Andi Kleen picked it up, but started asking further questions that Chuck 
and I will need to work to answer.  (He also wants the bug demonstrated under a stock kernel.org 
kernel rather than an RHEL kernel.)

In fact, it may be more subtle a problem than "just fix the audit calls from inside vm86.c."

Are you on linux-kernel?  Do you want to chime in there with your insights from your vm86 and x8emu 
experience?

Comment 64 Don Domingo 2007-08-23 22:36:18 UTC

adding to RHEL5.1 release notes updates:
<quote>
(x86) When running the bare-metal (non-Virtualized) kernel, the X server may not
be able to retrieve EDID information from the monitor. When this occurs, the
graphics driver will be unable to display resolutions highers than 800x600.

To work around this, use an alternative method to query the monitor. One way of
doing this is by adding the following line to the ServerLayout section of
/etc/X11/xorg.conf:

Option "Int10Backend" "x86emu"
</quote>

please advise if any revisions are necessary. thanks!

Comment 65 Adam Jackson 2007-08-29 17:52:28 UTC

(In reply to comment #64)
> To work around this, use an alternative method to query the monitor. One way of
> doing this is by adding the following line to the ServerLayout section of
> /etc/X11/xorg.conf:

Change second sentence, replace "One way of doing this is by adding" to "Add".

There's only one alternative method, so there's no point in saying "one way".

Comment 66 Don Domingo 2007-08-29 22:55:29 UTC

note revised as requested. thanks!

Comment 68 RHEL Program Management 2007-09-04 19:27:15 UTC

This bugzilla has Keywords: Regression.  

Since no regressions are allowed between releases, 
it is also being proposed as a blocker for this release.  

Please resolve ASAP.

Comment 69 Adam Jackson 2007-09-18 21:34:14 UTC

Bug #254024 is for the kernel fix for 5.2.  No fix for this is planned in the
vesa driver, so this bug is being closed WONTFIX.  Please see the kernel bug for
the eventual resolution.

Comment 70 Don Domingo 2008-02-21 02:12:17 UTC

adding same release note to "Known Issues" of RHEL5.2. please advise if resolved
so we can document as such. thanks!

Comment 71 wdc 2008-03-21 16:57:20 UTC

Nope.  Still not resolved.  We know the 2.6.20 and later kernels don't have the problem, but have not 
identified what to back port to 2.6.18 to get the proper behavior.

Actually, there *IS* something that could be done for RHEL 5.2, but I fear it is too late to ask:

The current recommended work-around is to modify /etc/X11/xorg.conf:
In the"ServerLayout" section add a line:
    Option "Int10Backend" "x86emu"

Can we make that option the default for X in RHEL 5.2, or later until a kernel 2.6.20 or later comes 
along that has no buffer corruption problems when fetching from the int10 layer?

Comment 72 Don Domingo 2008-04-02 02:14:48 UTC

Hi,
the RHEL5.2 release notes will be dropped to translation on April 15, 2008, at
which point no further additions or revisions will be entertained.

a mockup of the RHEL5.2 release notes can be viewed at the following link:
http://intranet.corp.redhat.com/ic/intranet/RHEL5u2relnotesmockup.html

please use the aforementioned link to verify if your bugzilla is already in the
release notes (if it needs to be). each item in the release notes contains a
link to its original bug; as such, you can search through the release notes by
bug number.

Cheers,
Don

Comment 73 wdc 2008-04-02 15:07:29 UTC

Don,
I'm not sure if your note is a bulk-addition to bunches of bugs.   I'm also not sure if this is intended for 
the people who reported the bug, such as me, or the RH employees working the bug, such as Adam.

At any rate, I could not view the Red Hat internal mockup because it's an internal host.

I did look at the Beta release notes that are published at:
https://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.2/html/Release_Notes/RELEASE-
NOTES-U2-x86-en.html

Although this bug ID did not appear in those notes, I CAN confirm that the work-around is still 
necessary and accurately appears in the Beta release notes.

(I hope I'm being helpful rather than annoying here.)

Comment 74 Don Domingo 2008-04-03 02:41:08 UTC

yes, this was a bulk-message to all bugs that are tracked for the release notes.
the current version of the RHEL5.2 x86 release notes still contains the
following note as per this bug:

<quote>
(x86) When running the bare-metal (non-Virtualized) kernel, the X server may not
be able to retrieve EDID information from the monitor. When this occurs, the
graphics driver will be unable to display resolutions highers than 800x600.

To work around this, add the following line to the ServerLayout section of
/etc/X11/xorg.conf:

Option "Int10Backend" "x86emu"
</quote>

as always, please advise (before April 15) if any further revisions are
required. thanks!

Comment 75 Ryan Lerch 2008-08-11 01:04:30 UTC

Tracking this bug for the Red Hat Enterprise Linux 5.3 Release Notes. 

This Release Note is currently located in the Known Issues section.

Comment 76 Ryan Lerch 2008-08-11 01:04:30 UTC

Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Comment 77 wdc 2008-08-12 22:33:44 UTC

Alas, I cannot confirm this is still an issue without a little help.
(I still have my test setup, but I may not for much longer.)

I can confirm that the EDID transfer is still flaky and corrected with the work-around with the 
2.6.18-92.1.10 kernel.  However, I expect that 5.3 has a different kernel.

Is there an RHEL 5.3 trial kernel somewhere I can grab?  The newest kernel in the RHEL 5 beta channel is 
2.6.18-92.1.10, but I don't think that's what y'all are going live with for 5.3, is it?

Comment 78 wdc 2008-08-15 21:51:55 UTC

Additional info for others interested in this bug.

Here is a streamlined Repeat by:

1. Install RHEL 5.2 from DVD.
2. Change radeon_tp to vesa in /etc/X11/xorg.conf
3. change default run level from 5 to 3 in /etc/inittab
4. reboot.

Additionally, If you didn't follow step 3 above, there seems these days to be an even chance of a successful EDID from the X startup from gdm. I don't know why.

If you do "xinit" to start X with just an xterm, there may not be enough stuff in memory to trigger the bug.

But if you repeatedly start X with the command "Xorg" or a full session with "startx" logged in as root, the EDID transfer either silently fails or is partial.

This is on a Dell GX745 with Radeon X1300/X1550 series, chipset 0x7183.

Note You need to log in before you can comment on or make changes to this bug.