Description of problem: I just installed RHEL 5 client, and noticed that sometimes the X resolution is properly set, as I specified, to 1200x1024, but often, upon restart of the X server, it dumbs down the resolution to 800x600. I will attach two Xorg.0.log outputs showing how the VESA BBE DDE read is said to be successful, but in the dumb-down case no actual data comes in that enables proper configuration of the monitor. This problem DOES NOT occur under RHEL 4.5 beta, nor does it occur using a third party fglrx driver. Version-Release number of selected component (if applicable): X Window System Version 7.1.1 How reproducible: Often. Note that I've got a VGA LCD attached via an adapter cable to the ATI Radeon X1300 pro card. At first, I thought it might be because the adapter or the display or the card was flaky in reporting the VESA data. But when I NEVER got failures under RHEL 4.5, I began to suspect something amiss in the VESA DDI read. Steps to Reproduce: 1. Start the X server Actual results: 800x600 resolution, and no actual data from the VESA BBE DDC read showing up in Xorg.0.log Expected results: Proper detection of the monitor through proper data being returned by the VESA VBE DDC read showing monitor Manufacture string, and all the other relevant data, and an ultimate result in proper configuration at 1200x1024 Additional info:
Created attachment 152574 [details] Log output showing successful VBE DDC read
Created attachment 152575 [details] Log output showing VBE DDC read with no data Note that in this log file, the VESA VBE DDC read is declared successful, but the Manufacture string, and all the other relevant data needed to configure has NOT been obtained. This log file was obtained on the exact same hardware as the Xorg.0.log.good file. My process was: Start RHEL 5. Notice X was configured correctly. Log in. Save the X.org.0 log file as X.org.0.log.good Log out. (And the X server restarted as per apparent config defaults.) Notice that X was dumbed down to 800x600. Log in. Save the X.org.0 log file as X.org.0.log.bad
hmmm, is it possible that System->Preferences->Screen resolution menu is set to 800x600 thereby over-riding the system configuration on a per-user basis?
Well, the screen resolution problem occurs even when logged in as root. The resolution menu offered by the System Preferences, after the X server has decided to dumb itself down to 800x600 offers no higher resolution than 800x600. When I first installed RHEL 5, I got 800x600, but I ran some tool and specified 1280x1024, and that's when I got to this state of affairs where it sometimes does and sometimes does not work. I regret that I did not take careful note of which tool I ran. It was probably "system-config-display". I am using a Dell Dell E196FP display on the Optiplex 745. I have now used system-config-display to set that explicitly as the monitor. Here's the odd thing: When the VESA data transfer is successful, the monitor clearly reports that its optimal resolution setting is 1280x1024x60, and the setup is correct. When the VESA data transfer is unsuccessful, the X.org.0.log file reports that the 1280x1024x75 resolution is being tried multiple times, but that the 1280x1024x60 resolution is **NEVER** tried. I wonder why this is so. I also wonder why no 1024x768 resolutions are being tried. Even when the ACTUAL monitor I'm using is specified explicitly, no higher resolution than 800x600 is offered when the VESA DDB data transfer fails. I see two questions to answer here: 1. Why doed the VESA DDB transfer sometimes report success when no data is transferred? 2. Why does the X server never try 1024x768 resolutions, nor 1280x1024x60? It tries a WHOLE LOT of them, as can be shown in the Xorg.0.log file. ---- Should I also ask you why you are asking me about user-level configuration settings, when the Xorg. 0.log file already shows that a whole bunch of resolutions, never offered in those user-level configuration commands are being tried, and abandoned for reasons that have nothing to do with the user-level configuration settings, and everything to do with the perceived capabilities of the monitor? Or am I completely misreading the Xorg.0.log file here?
I am disappointed that 10 days have gone by and nobody has followed up. I guess nobody cares that the latest update to RHEL BROKE X server configuration. I REALLY would like some help with this. I've just taken RHEL 4.5, and either I've found a way to more consistently specify a broken configuration, or whatever you broke in RHEL 5 you've BACK PORTED to 4.5, because the RHEL 4.5 beta worked great, but the RHEL 4.5 that was released is ALSO BROKEN. Let's get hopping on understanding this problem and fixing it QUICKLY!
I've tested with another monitor, the Dell 2007WFP LCD. Via the VGA connector. In this case, the VESA data seems to be correctly fetched by both RHEL 5 and RHEL 4.5, but the monitor VERY CAREFULLY configures itself to CHOP OFF the topmost 30 or so pixels. Dell has no vertical size control so I get my choice of having the tool bar or the panel chopped away. This is unacceptable, and extremely frustrating. How can I help MIT customers adopt RHEL 4.5 and RHEL 5 when basic X display configuration has been so badly and obviously broken. Ok, you folks don't see the test case, let's get someone back to me QUICKLY so both MIT and Red Hat see the same symptoms, and pool our collective understanding.
Created attachment 154575 [details] Sysreport of target system running RHEL 5
In the interests of being helpful I have attached sysreport output of the relevant system. Probably our next step is to decide if we have one bug or two here. The overall symptom is that X is not properly configured. But that could be due to two separate issues: 1. Failure to get consistently good data from the VESA DDB transfer. 2. X chops off the topmost 50 pixels when exact correct display is specified in the System- >Administration->Display tool.
There's something else interesting going on. Yesterday the monitor would configure and chop off the top. Today I can't seem to establish an xorg.conf that will drive the monitor at that size any more. I either get 800x600, or I get a complaint that I'm driving the monitor too hard. I *THINK* it's because the xorg.conf I'm now playing with does not contain explicit resolution settings, and so it's tryign to get them from the failed VESA DDB transfer.
RHEL problems will get attention if they are filed via your TAM. Since this works under RHEL4.5 I'll mark this as a regression.
This bugzilla has Keywords: Regression. Since no regressions are allowed between releases, it is also being proposed as a blocker for this release. Please resolve ASAP.
MIT does not have a TAM. It has something to do with the business model of, Since 1860, compaines have paid for the privelege of collaborating with MIT; Why does Red Hat insist on charging a premium price for the privilege of getting bugs taken seriously by the very community that helped create Linux in the first place... But I digress. Inasmuch as this is a basic problem that will affect MANY users of RHEL 5, it seems in Red Hat's best interest to resolve it quickly. The position expressed by "pm-rhel" seems quite wise.
Since last I posted to this bug on 16 May, I've done some more careful testing and I understand a LOT more about this situation. Bottom Line summary: The RHEL 4.5 X server is performing acceptably. The RHEL 5 X server suffers from a problem with DDC fetch that ALSO affects Ubuntu 7.04, and perhaps SuSE SLED 10.1. I've searched the x.org bug tree and found two relevant bugs: https://bugs.freedesktop.org/show_bug.cgi?id=6886 https://bugs.freedesktop.org/show_bug.cgi?id=10238 I've subscribed to the later one, and we'll see if the X.org folks respond. Detail: I needed to be told how to create a baseline xorg.conf file. Once I did that, I was able to carefully test RHEL 4.5 and RHEL 5. Along the way, I discovered that some of the extremely good performance I was getting under RHEL 4.5 beta was because I'd installed the ATI proprietary driver but FORGOT. (Oops.) The detailed behavior I got while testing RHEL 4.5 is: On the Optiplex 745 with the ATI Radeon X1300 Pro: up to 1280x1024 works via VESA. up to 1400x1050 works via DVI If your xorg.conf specifies 1400x1050 the VESA display will be too big for the screen. If your xorg.conf specifies 1600x1200 the VESAS display will draw a blank, but the DVI display will know to not use that setting. This seems reasonable, albeit non-ideal behavior to me. Creating a baseline xorg.conf file under RHEL 5, I re-ran tests and determined: The X server will not run AT ALL when connected via DVI. When connected via VESA, the DDC transfer fails, forcing the X server to dumb down to 800x600. If one explicitly provides Modeline directives in the xorg.conf file, the X server can be driven at up to 1280x1024 when connected via the VESA port. Perhaps higher resolutions are possible, but so far I don't have a Modeline for better than that. When connected via DVI, the X server WILL NOT START AT ALL. The monitor complains of being over-driven. DDC transfers under RHEL 5, with the X server version 7.1.1 always fail, both on the VESA port and on the DVI port. Ubuntu 7.10 seems to suffer the same fate. There is a long winded bug report about this at: https://bugs.launchpad.net/ubuntu/+source/xorg/+bug/89853 It is still unclear to me whether Red Hat, the Ubuntu community or X.org do or do not understand the root cause of this problem. Perhaps between the four of us we can converge on a useful fix.
wdc and I dug into the X server sources, and produced the attached patch, with interesting results. Issues: 1. The initialization of the EDID buffer carefully memset's 4 bytes to zero because it uses the size of the pointer to the structure instead of the size of the structure itself. However, in our patch we use the constant 128, because that is the size of an EDID buffer (as described in the EDID documentation we found on the Web.) 2. When the EDID transfer fails and gives us an EDID buffer full of zeros, xf86InterpretEDID in interpret_edid.c silently fails and returns NULL. We changed the code to report this error condition. 3. The EDID fetch from the BIOS is DEFINITELY flaky in a time-dependent way. We inserted a sleep(2) into vbeReadEDID in vbe.c which seems to improve things somewhat, but running Xorg multiple times results in EDID fetches in various states of completion, with the buffer only being filled up to a certain point, followed by zeros. We copied the hex dump code from print_edid.c into vbe.c so that the EDID buffer could be viewed immediately after the BIOS fetch. Attached is the patch against xorg-x11-server-1.1.1-48.13.0.1 along with Xorg.0.log files from successive runs with this patch showing the EDID buffer in various states of fill.
Created attachment 155549 [details] Patch to debug EDID BIOS fetch
Created attachment 155550 [details] Xorg.0.log run 1 showing full EDID read
Created attachment 155551 [details] Xorg.0.log run 2 showing full EDID read
Created attachment 155552 [details] Xorg.0.log run 3 showing partial EDID read
Created attachment 155553 [details] Diff between Xorg.0.log run 2 and run 3
If you remove our "sleep(2);" from vbe.c the hex dump output from the EDID fetch from the BIOS pretty much always comes up all zeros.
Created attachment 155646 [details] Log of successful DDC read, RHEL 4.5 with debug patch applied. Today I built the X server under RHEL 4.5, applying the relevant portion of the debug patch that performs the hex dump of the EDID fetch. I ran Xorg several times. Always the result is the same: PERFECTLY RELIABLE fetch of the EDID data! I also looked at the differences in the int10 logic that seems to be doing the nuts and bolts of the EDID fetch. Although I might have missed something, I think they are substantially the same. This causes me to conclude that what we have is a KERNEL bug, not an X server bug. Perhaps something is playing fast and loose with the real mode emulation that serves the VBE? Since this problem seems also to affect Ubuntu 7.04 (although I can't get it to consistently fail), we're probably talking about a kernel bug introduced between 2.6.9 and 2.6.18. (The Ubuntu 7.04 Desktop install CD which HAS the problem uses 2.6.20-15.) QUESTION: What further steps should I take to clarify that the fault lies in the kernel and not in X?
Today I did two things: 1. I experimented under Ubuntu 7.04 to try and learn more -- I got partial EDID transfers, but no clue how to control when the transfers were partial and when they were complete. 2. I found a package called "read-edid" that alleged to use the VM86 code in a stand-alone mode to perform the problematic EDID fetch. See:http://john.fremlin.de/programs/linux/read-edid/ A debian package was available for Ubuntu. Running the program ALWAYS gets a 100% good EDID fetch. Building the package from source under RHEL 5, and running it ALSO ALWAYS gets a 100% good EDID fetch. So now the question is, "What is happening to make stand-alone edid-get successful but X.org fetch un-successful?" Someone suggested that there may be a memory caching issue involved. get-edid is a small program, wehreas X is rather large, so that's not so far-fetched an idea. My next task will be to read the get-edid code, and try to understand if it is doing the same thing the X server is doing. ANY insight froma anyone else reading this bug report would be MOST welcome.
Created attachment 156806 [details] Run of Xorg 6.8.6 under RH5 -- EDID all zeros
Created attachment 156807 [details] Run of Xorg 6.8.6 under RH5 -- EDID partial transfer I believe this Xorg.0.log output demonstrates we have a bug that WAS NOT introduced between Xorg 6.8.6 and Xorg 7.1.1. I tried to build Xorg 6.8.6 under RHEL 5 but hit a wall. I tried to install RHEL 4.5's 6.8.6 on RHEL 5 but made a mess. After cleaning up the mess well enough to get 7.1.1 running again I tried a different tack to get Xorg 6.8.6 just running enough to do the EDID transfer. Since RHEL 4.5 was in another partition, I ran Xorg out of there. Additional arguments were needed. The command line that got me far enough was: /rhel4/usr/X11R6/bin/Xorg -config /rhel4/etc/X11/xorg.conf -modulepath /rhel4/usr/X11R6/lib/modules/ The first new attachment Xorg.0.log-rh5-6.1-a is not sufficient. It only shows all zeros in the EDID transfer, and that could be caused by something else not working as we kludge the Xorg run between major linux versions. The secont new attachment, however, Xorg.0.log-rh5-6.1-b IS sufficient, I believe because is shows a PARTIAL EDID transfer. Xorg could not run enough to really run. (It couldn't find font "fixed" because of how things are re-organized, but I very strongly believe that it DID run far enough to do an EDID transfer, and to manifest EXACTLY THE SAME bug we are experiencing under 7.1.1 under RHEL 5: A timing dependent flaky EDID transfer.
Out of curiosity, does it work reliably when using a xen kernel, or on non-x86? The reason I ask is, vm86 is known to be unreliable when using xen, and is simply unavailable on other arches. So for everything other than baremetal i386 kernels, we use a x86 real-mode emulator to execute VBE calls. The logs given appear to all be from non-xen machines. I would be thrilled to learn that the emulator is more reliable. There is also an option to force use of the emulator, by saying: Option "Int10Backend" "x86emu" in the ServerLayout section of xorg.conf.
It may indeed be that the emulator is more reliable. I've just added that line to the xorg.conf file, and run the X server a couple times. Previously the EDID buffer would contain a random amount of data and the rest be all zeros. This time the EDID buffer was consistently full and the data remained the the same across multiple runs. This is good evidence that the problem is in the vm86old code. (We're guessing that the code for auditing has inappropriately messed up the registers, and plan to build a kernel to test that theory in a few days.) The problem here though, is that people will not be able to run X far enough to put in a fix. What options do you think should be pursued to help people get a default install of RHEL 5 and the other 2.6.18+ kernels to get something that works from the get go?
Created attachment 158912 [details] Patch to cut out audit call in the int10 emulator. Today we built a kernel with the attached patch that disables the code that called audit_syscall_exit. Although those nasty error messages about freeing multiple audit contexts came back, the EDID transfers were once again 100% successful. (Yes, I was careful to use an xorg.conf file with x86emu disabled. I tested a stock kernel build to confirm I had a good build process, and that the stock kernel tickled the bug.) So it seems that the way audit_syscall_exit is called is trashing the registers and making the EDID transfer flaky. This is probably appropriately classified as a regression and probably needs to be fast-tracked to the original author so he or she can fix up the call. We have a very reproducible test case and test setup to test candidate kernel patches. (We didn't feel we understood things well enough to propose a change ourselves.)
I have a bug open at kernel.org where I asked for help looking at this. I'll mention there that this regression is the root cause. Would it be appropriate for Red Hat to weigh in and lobby for examination of that bug? http://bugzilla.kernel.org/show_bug.cgi?id=8633 Now that we understand the root cause, and have a work-around, what next steps should we take? Ideally the kernel regression will eventually be remedied. Should we consider lobbying freedesktop.org to make the x86emu as int10backend the default for x86 in addition to everything else? There are additional bugs in the X server, once the EDID data is acquired with 100% fidelity: 1. Plugged into the VESA connector, 1400x1024 resolution will configure if requested, but it will chop off the topmost quarter inch and the leftmost inch of pixels. Modern Dell LCDs no longer support the ability to control the vertical or horizontal size so this is an unpleasant state of affairs. 2. The EDID data provides a detailed modeline for 1680x1050 operation which is ignored. I guess I should take these up with freedesktop.org. Do people think I should open a Red Hat bugzilla bug on these two issues? Finally there is the issue that the X server does not properly report the EDID transfer failure. I will take the freedesktop.org bug I have open about this and lobby for my patch to be considered as a remedy. Here too, I wonder if Red Hat weighing in on the bug would be useful? https://bugs.freedesktop.org/show_bug.cgi?id=10238 Mr. Jackson et. al., what do you advise as the best way forward?
(In reply to comment #44) > Now that we understand the root cause, and have a work-around, what next steps should we take? > > Ideally the kernel regression will eventually be remedied. > > Should we consider lobbying freedesktop.org to make the x86emu as int10backend the default for x86 > in addition to everything else? We're already doing this for Fedora 7 and later, and I'm certainly telling everyone I can upstream that vm86 is insane. I wish I'd flipped this switch before FC6, so it would have been incorporated in EL5, but the fear that the emulator would prove to be a regression relative to EL4's behaviour was too high. (And justified, it turns out, since several x86emu bugs have been fixed since 5.0.) In the meantime, I'm investigating a way to magically invoke the x86emu backend for DDC transfers if the vm86 method fails. It's slightly hairy due to namespace issues but I think it's doable. (Setting devel ack for 5.1, we should include this if I get it working.) > There are additional bugs in the X server, once the EDID data is acquired with 100% fidelity: > > 1. Plugged into the VESA connector, 1400x1024 resolution will configure if requested, but it will chop > off the topmost quarter inch and the leftmost inch of pixels. Modern Dell LCDs no longer support the > ability to control the vertical or horizontal size so this is an unpleasant state of affairs. > > 2. The EDID data provides a detailed modeline for 1680x1050 operation which is ignored. The X logs in this bz seem to all show the use of the vesa driver. The vesa bios interface is limited in terms of output setup capability. In particular, there are two sets of modes: the set that the monitor reports it can display, and the set that the bios reports it can configure. It's literally not possible to ask the bios to set up a mode outside its list, so the best we can do with the vesa driver - or any other driver that uses the vesa bios mode setting interface - is pick a "good" mode that happens to be in both lists. So regarding these two issues, assuming they're occuring with the vesa driver. The first sounds like we're either picking a mode that's larger than the monitor - in which case, 5.1 includes a vesa driver update that should address this issue - or that the mode we're selecting is not being programmed properly by the video bios, in which case we're just out of luck. The second problem sounds like the 1680x1050 mode is advertised by the monitor but not by the bios, in which case we are again out of luck. If my assumptions are incorrect here, I would certainly like to see an X log of the failure case(s). In general, these limitations mean that although the vesa driver is supported, it's not recommended for regular use, and we strongly prefer that people use native drivers wherever possible. The configuration infrastructure in EL5 should be smart enough to pick the correct native driver when one is available. > Finally there is the issue that the X server does not properly report the EDID transfer failure. I will take > the freedesktop.org bug I have open about this and lobby for my patch to be considered as a remedy. > Here too, I wonder if Red Hat weighing in on the bug would be useful? > https://bugs.freedesktop.org/show_bug.cgi?id=10238 That looks pretty good; I'll take it up upstream. Thanks!
Invoking x86emu if the DDC fails sounds hairy, scary and a lot of work. Thanks for putting in the effort to make it right! Indeed the X resolution issues I am having are occurring with the VESA driver. Apparently the x.org ATI driver does not yet know about the R500 chip set that the x1300 and x1400 use. The reverse engineered driver will, I'm sure, eventually benefit this driver. It will be interesting to test the RHEL 5.1 X server to see which driver it picks. I'll attach Xorg.0.log output showing the 1680x1050 mode that the EDID fetch offers, and how it's not used. I'm still not sure I'm totally up to speed on reading the log output, so I'd be grateful if you'd call my attention to the lines where the BIOS denies support for that mode. Is it in those long, detailed segments? Indeed I see a 1600x1200 go by, and a 1400x1050 go by, but indeed no 1680x1050.
Created attachment 159000 [details] Log of proffered but unused 1680x1050 resolution See lines 461 and 462: (II) VESA(0): h_active: 1680 h_sync: 1728 h_sync_end 1760 h_blank_end 1840 h_border: 0 (II) VESA(0): v_active: 1050 v_sync: 1053 v_sync_end 1059 v_blanking: and line 488: (II) VESA(0): Modeline "1680x1050" 119.00 1680 1728 1760 1840 1050 10 53 1059 1080 -hsync +vsync Here the VESA transfer offers the mode. Why exactly isn't it being used?
I just had a thought! How will you detect a bad EDID transfer? The kernel bug causes the transfer to OFTEN come up all zeros, but sometimes it gets a partial transfer padded out with zeros. Does the EDID block have a checksum in it that you can compute and test? The current code just looks at the first few bytes for a version number and uses that to decide the transfer was good. If you can't detect a zero-padded partial transfer, then your additional work to use x86emu may be wasted.
(In reply to comment #47) > Created an attachment (id=159000) [edit] > Log of proffered but unused 1680x1050 resolution > > See lines 461 and 462: > > (II) VESA(0): h_active: 1680 h_sync: 1728 h_sync_end 1760 h_blank_end > 1840 h_border: 0 > (II) VESA(0): v_active: 1050 v_sync: 1053 v_sync_end 1059 v_blanking: > > and line 488: > > (II) VESA(0): Modeline "1680x1050" 119.00 1680 1728 1760 1840 1050 10 > 53 1059 1080 -hsync +vsync > > Here the VESA transfer offers the mode. Why exactly isn't it being used? That's the EDID block's mode list. Remember, I can only set modes to things in the intersection of: in the VESA BIOS's mode list, and within the capabilities reported by EDID. So, yeah, 1680x1050 in the monitor, but not in the video BIOS, means no 1680x1050 for you. (In reply to comment #48) > How will you detect a bad EDID transfer? The kernel bug causes the transfer to OFTEN come up all zeros, > but sometimes it gets a partial transfer padded out with zeros. Does the EDID block have a checksum in it > that you can compute and test? The current code just looks at the first few bytes for a version number > and uses that to decide the transfer was good. Yes, there is a checksum. The last byte is set such that a cumulative sum of all bytes in the block, modulo 256, is 0. We do use this to reject bad EDID blocks. See DDC_checksum() in hw/xfree86/ddc/edid.c, and its caller in hw/xfree86/ddc/xf86DDC.c.
I've looked at the code in xf86DDC.c, but there's something that confuses me: How come I never saw a checksum error report in the log? Clearly I was getting bad EDID reads. What determines if the code that's doing the EDID fetch is from hw/xfree86/vbe/vbe.c where it can silently fail (unless you've taken my patch ;-) ) and where no checksum is computed in the readEDID routine, versus the code that's in hw/xfree86/ddc/edid.c? Or are you saying that you plan to add checksum stuff like in ddc/... to vbe/...? ---- Thanks also for the clarification about the BIOS thing.
Andrew: I just installed the X server and VESA driver from the RHEL 5.1 beta. Alas, it does one thing that is admittedly more correct but less desirable to me: Previously, somehow the server would see that the display could handle 1600x1080, and even though no 1400x1024 mode was specifically offered, it would configure that mode. (This got us into trouble when connected to the analog VESA port, but worked just fine on the digital port.) Now, because there is no exact match, the display that used to be 1400x1024 is configured for 1280x1024. By the same token, that particular monitor offers 1600x1050, but not 1600x1200, so even though the vesa driver is improved and has a 1600x1200 mode, 1600x1050 is not configured because it is not an exact match. Wasn't there partial match code being worked on? I thought it was already in place. Somebody is suffering with the latest Ubuntu because their card support 1280x1024, but their display only support 1280x800. That run ends up finding no matching modes whatsoever. I am concerned here that people will have gotten used to running 1400x1050 on these monitors under RHEL 4, but will not get degraded resolution of 1280x1024 after "upgrading" to RHEL 5.1. I will attach the xorg.conf file and the Xorg.0.log files so that this all can be rigorously documented.
Created attachment 160558 [details] xorg.conf file used for testing RHEL 5.1 beta X server
Created attachment 160559 [details] Log of run of RHEL 5.0 debugging X server. It sets 1400x1050.
Created attachment 160561 [details] Log of run of RHEL 5.1 X server and vesa driver. Configs 1280x1024
Sorry to be a pest here. I expect there are many important issues being worked as RHEL 5.1 beta testing proceeds. I am concerned that people are going to consider this an imporper regression in behavior. If there were a plan of attack in addressing it, I might be able to help do the work.
The patch looks something like: http://people.redhat.com/ajackson/omg-vbe-hax.patch Utterly untested atm; going to try to hit that today.
Although I've not bench checked it carefully, the patch looks plausible. The issue that concerns me is not so much the EDID thing at the moment, but that the VESA update to the X server currently on track for dissemination as part of the RHEL 5.1 update does a worse job than the present one at finding the highest resolution even when the EDID transfer is 100% successful. Andrew, should I open a different bug about that? What do you think is the way I can be most helpful in identifying the root cause and fixing the new regression?
(My name's Adam, btw.) (In reply to comment #58) > Although I've not bench checked it carefully, the patch looks plausible. > > The issue that concerns me is not so much the EDID thing at the moment, but that the VESA update to the > X server currently on track for dissemination as part of the RHEL 5.1 update does a worse job than the > present one at finding the highest resolution even when the EDID transfer is 100% successful. Yeah, that's intentional. The issue is that you _really_ want to try for strict intersection of modes between the monitor and the video BIOS in this case. There do exist monitors where the EDID list is literally all it can do. Worse, there are monitors where if (like your example) there's a VBIOS mode between the two largest EDID modes like so: VBIOS EDID A: 1680x1050 B: 1400x1050 C: 1280x1024 and you attempt to set mode B, then the monitor will try to sync as though it's mode C and the rest will just be off the screen. Or go blank. Either one is unacceptable. The other case we ran into was some laptop panels, which give you a mostly-nonconformant EDID block that just contains a mode for the panel size and nothing else, and of course no matching mode in the VBIOS. In that case, strict intersection of mode lists would mean the server just fails to start. So the new heuristic is: Attempt strict intersection. If doing so produces a non-empty mode list, then use it. Otherwise, revalidate the VBIOS mode list against a range-based model of the EDID properties (using the sync ranges from EDID if available, otherwise synthesizing them from an assumed minimum size of 640x480@60 and a max of whatever the EDID block reports as maximum), in the hope that _something_ will survive validation and work. This seems to be the least wrong thing to do. Nonconformant panels get a best effort, conformant panels get whatever the best intersection of BIOS and EDID modes is, and we don't go wrong trying to do something the monitor doesn't explicitly claim it's capable of doing. This does mean some setups that used to work at mode B (in the example above) now won't, but they'll still light up; in exchange, some panels that would fail to do the right thing in mode B now do _a_ right thing, even if that happens to be mode C. The vesa driver is intended to be a conservative fallback driver anyway, so the real solution to the mode B scenario is to use a native driver that doesn't use the VBIOS for output setup.
Thanks very much for taking the time to provide a detailed clarification. In light of those details, I'd have to agree that the new behavior is the least wrong thing to do.
After some technical review, I've concluded that the patch in comment #57 is a bad idea. The act of initializing an int10 context on a non-primary card has the side effect of posting the card. This will blow away any state set up by the driver prior to the VBE DDC call, which will almost certainly mean bad rendering at best, and failure to launch or system hang at worst. There's a more invasive change one could do where you'd set up the shadow x86emu context _really_ early, and make sure to use the same maps for both vm86 and x86emu execution, but that seems like a ton of work for very little return. Particularly since we know newer kernels have a working vm86 syscall. Fixing the kernel definitely seems like the right thing here.
Cloned bug #254024 for a kernel fix, and moved the IT issue there. This issue should be documented in a release note for 5.1. Suggested text is something like: --- On i386 systems running the bare-metal (non-Xen-enabled) kernel, the X server may not be able reliably retrieve EDID information from the monitor. This may manifest as the driver being unable to do larger than 800x600. A potential workaround is to use an alternative method to query the monitor, by adding the line: Option "Int10Backend" "x86emu" to the ServerLayout section in /etc/X11/xorg.conf . ---
Adam, (First, sorry for getting your name wrong. I've got it right now!) Wow! You've done more rigorous testing than I would have been able to do. And it looks like you've chased down the subtleties really well. I agree that pursuing the kernel fix is the best thing. My friend Chuck nudged the linux-kernel list to raise the visibility of the bug. Andi Kleen picked it up, but started asking further questions that Chuck and I will need to work to answer. (He also wants the bug demonstrated under a stock kernel.org kernel rather than an RHEL kernel.) In fact, it may be more subtle a problem than "just fix the audit calls from inside vm86.c." Are you on linux-kernel? Do you want to chime in there with your insights from your vm86 and x8emu experience?
adding to RHEL5.1 release notes updates: <quote> (x86) When running the bare-metal (non-Virtualized) kernel, the X server may not be able to retrieve EDID information from the monitor. When this occurs, the graphics driver will be unable to display resolutions highers than 800x600. To work around this, use an alternative method to query the monitor. One way of doing this is by adding the following line to the ServerLayout section of /etc/X11/xorg.conf: Option "Int10Backend" "x86emu" </quote> please advise if any revisions are necessary. thanks!
(In reply to comment #64) > To work around this, use an alternative method to query the monitor. One way of > doing this is by adding the following line to the ServerLayout section of > /etc/X11/xorg.conf: Change second sentence, replace "One way of doing this is by adding" to "Add". There's only one alternative method, so there's no point in saying "one way".
note revised as requested. thanks!
Bug #254024 is for the kernel fix for 5.2. No fix for this is planned in the vesa driver, so this bug is being closed WONTFIX. Please see the kernel bug for the eventual resolution.
adding same release note to "Known Issues" of RHEL5.2. please advise if resolved so we can document as such. thanks!
Nope. Still not resolved. We know the 2.6.20 and later kernels don't have the problem, but have not identified what to back port to 2.6.18 to get the proper behavior. Actually, there *IS* something that could be done for RHEL 5.2, but I fear it is too late to ask: The current recommended work-around is to modify /etc/X11/xorg.conf: In the"ServerLayout" section add a line: Option "Int10Backend" "x86emu" Can we make that option the default for X in RHEL 5.2, or later until a kernel 2.6.20 or later comes along that has no buffer corruption problems when fetching from the int10 layer?
Hi, the RHEL5.2 release notes will be dropped to translation on April 15, 2008, at which point no further additions or revisions will be entertained. a mockup of the RHEL5.2 release notes can be viewed at the following link: http://intranet.corp.redhat.com/ic/intranet/RHEL5u2relnotesmockup.html please use the aforementioned link to verify if your bugzilla is already in the release notes (if it needs to be). each item in the release notes contains a link to its original bug; as such, you can search through the release notes by bug number. Cheers, Don
Don, I'm not sure if your note is a bulk-addition to bunches of bugs. I'm also not sure if this is intended for the people who reported the bug, such as me, or the RH employees working the bug, such as Adam. At any rate, I could not view the Red Hat internal mockup because it's an internal host. I did look at the Beta release notes that are published at: https://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.2/html/Release_Notes/RELEASE- NOTES-U2-x86-en.html Although this bug ID did not appear in those notes, I CAN confirm that the work-around is still necessary and accurately appears in the Beta release notes. (I hope I'm being helpful rather than annoying here.)
yes, this was a bulk-message to all bugs that are tracked for the release notes. the current version of the RHEL5.2 x86 release notes still contains the following note as per this bug: <quote> (x86) When running the bare-metal (non-Virtualized) kernel, the X server may not be able to retrieve EDID information from the monitor. When this occurs, the graphics driver will be unable to display resolutions highers than 800x600. To work around this, add the following line to the ServerLayout section of /etc/X11/xorg.conf: Option "Int10Backend" "x86emu" </quote> as always, please advise (before April 15) if any further revisions are required. thanks!
Tracking this bug for the Red Hat Enterprise Linux 5.3 Release Notes. This Release Note is currently located in the Known Issues section.
Release note added. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team.
Alas, I cannot confirm this is still an issue without a little help. (I still have my test setup, but I may not for much longer.) I can confirm that the EDID transfer is still flaky and corrected with the work-around with the 2.6.18-92.1.10 kernel. However, I expect that 5.3 has a different kernel. Is there an RHEL 5.3 trial kernel somewhere I can grab? The newest kernel in the RHEL 5 beta channel is 2.6.18-92.1.10, but I don't think that's what y'all are going live with for 5.3, is it?
Additional info for others interested in this bug. Here is a streamlined Repeat by: 1. Install RHEL 5.2 from DVD. 2. Change radeon_tp to vesa in /etc/X11/xorg.conf 3. change default run level from 5 to 3 in /etc/inittab 4. reboot. Additionally, If you didn't follow step 3 above, there seems these days to be an even chance of a successful EDID from the X startup from gdm. I don't know why. If you do "xinit" to start X with just an xterm, there may not be enough stuff in memory to trigger the bug. But if you repeatedly start X with the command "Xorg" or a full session with "startx" logged in as root, the EDID transfer either silently fails or is partial. This is on a Dell GX745 with Radeon X1300/X1550 series, chipset 0x7183.