Bug 591718

Summary: Screen corruption (and lockup) booting on EFI system,
Product: Red Hat Enterprise Linux 6 Reporter: Andrew Cathrow <acathrow>
Component: kernelAssignee: Steve Best <sbest>
Status: CLOSED NOTABUG QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: high    
Version: 6.0CC: arozansk, borgan, ddumas, garyhade, lcm, nobody+PNT0273897, peterm, pjones, rbalakri, snagar, syeghiay
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-08-19 10:00:40 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Output of dmidecode
none
Output of lspci
none
serial console output showing successful boot and Anaconda startup none

Description Andrew Cathrow 2010-05-12 21:46:13 UTC
Tested with RHEL6 Beta#1

IBM x3550 M2 (Full hardware details attached in lspci and dmidecode)

Booting from install DVD (or boot.iso) brings up initial grub screen, hitting enter starts a boot but it locks up with screen corruption (green blocks)

Using nodemodeset or xdriver=vesa makes no difference.

Machine can boot and install correctly in legacy bios mode.

Peter Jones had a look at this to see where the problem was

<snip>
I'm not at all sure what's going wrong here, but to me it looks
like a kernel issue - the screen corruption starts between initrd
unpacking and the audit subsystem starting up. The normal log looks
something like:

Trying to unpack rootfs image as initramfs...
Freeing initrd memory: 29540k freed <-- this is the last thing we see.
audit: initializing netlink socket (disabled)

On this machine, we never see past Freeing initrd memory, but the code
that's actually *in* the initrd (i.e. the installer) doesn't actually
start execution until much later, so it's unlikely the problem is
there.

So this is most likely a kernel bug, I think.
</snip>

This machine is in Westford, I can provide physical access.

Comment 1 Andrew Cathrow 2010-05-12 21:46:47 UTC
Created attachment 413577 [details]
Output of dmidecode

Comment 2 Andrew Cathrow 2010-05-12 21:47:09 UTC
Created attachment 413578 [details]
Output of lspci

Comment 4 RHEL Program Management 2010-05-12 23:18:52 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 5 IBM Bug Proxy 2010-05-24 21:20:46 UTC
------- Comment From edpollar.com 2010-05-24 17:07 EDT-------
reverse mirror of RHBZ  591718  - Screen corruption (and lockup) booting on EFI system

Comment 7 Gary Hade 2010-05-26 18:21:20 UTC
This issue also reproduces on the x3650 M2 that has the same
Matrox VGA controller.  On that system with the serial console
enabled in the BIOS and "console=tty0 console=ttyS0,115200"
added to the kernel command line, the kernel successfully boots
after which the initial installation screens are displayed in
text form on the serial console.  After responding to each of
the initial screens, Anaconda appears to successfully start.
Nothing is displayed on the VGA console while this is happening.
I will attach the serial console output.

I also tried with only "text" (no console= options) appended
to the kernel command line and saw nothing displayed on VGA 
console and only the following displayed on the serial console.

Comment 8 Gary Hade 2010-05-26 18:24:07 UTC
Created attachment 416967 [details]
serial console output showing successful boot and Anaconda startup

Comment 9 Gary Hade 2010-05-26 18:39:22 UTC
(In reply to comment #7)
...
> I also tried with only "text" (no console= options) appended
> to the kernel command line and saw nothing displayed on VGA 
> console and only the following displayed on the serial console.

Oops, I didn't include the output.  The following was displayed.

 Trying to allocate 923 pages for VMLINUZ
[Linux-EFI, setup=0x1026, size=0x39b000]
   [Initrd, addr=0x77a03000, size=0x1c96b26]
�

Comment 10 Gary Hade 2010-05-27 00:01:20 UTC
Andrew, Could you please go into the F1 setup and check the
state of the 'Force Legacy Video on Boot' option under
System Settings->Legacy Support?  If it is set to "Enable"
please change it to "Disable" and then initiate an install
and check to see if the screen corruption issue still exists.

Comment 11 Andrew Cathrow 2010-06-01 13:04:58 UTC
(In reply to comment #10)
> Andrew, Could you please go into the F1 setup and check the
> state of the 'Force Legacy Video on Boot' option under
> System Settings->Legacy Support?  If it is set to "Enable"
> please change it to "Disable" and then initiate an install
> and check to see if the screen corruption issue still exists.    

Gary,
Back in the office now. I just checked the box and it was set to the default of Enable. Changing this to disable I was able to boot normally and access anaconda.

Comment 12 Gary Hade 2010-06-01 23:06:42 UTC
Thanks, Andrew.  I saw the same result on the x3650 M2.

We know of an earlier video corruption issue originally
discovered on x3650 M2 that had symptoms that appear to
be very similar to this issue, including the improvement
when 'Force Legacy Video on Boot' is changed from "Enable"
to "Disable".  Unfortunately, the history for that
previous issue indicates that it was resolved with the
below two changes that are already included in the
RHEL6 xorg-x11-drv-mga.

I hope the Red Hat Xorg experts have some ideas on what
later change may have caused this to regress.

commit 19c44d537e982fcf0fe2dc9f3273ac6166302510
Author: Yannick Heneault <yheneaul>
Date:   Tue Apr 21 10:00:24 2009 -0400

    Fixed bad vga access in memory count routine.

commit 2388c4d512554258bce2b78c8f8aa1151b161c3e
Author: Yannick Heneault <yheneaul>
Date:   Tue Apr 21 09:51:34 2009 -0400

    Force pitch of 1024 for G200SE Pilot1 when edid is used as modeline.

Comment 13 IBM Bug Proxy 2010-06-02 00:30:50 UTC
------- Comment From garyhade.com 2010-06-01 20:23 EDT-------
I was just reminded that the changes I mentioned in the last comment are
probably related to a different issue that appeared later in the install, after
X actually started.  We are seeing the video corruption at the very beginning
of the install so X isn't even in the picture yet.

Stand by while we do some more research.

Comment 15 IBM Bug Proxy 2010-07-06 21:31:08 UTC
------- Comment From garyhade.com 2010-07-06 17:28 EDT-------
Red Hat,
We are pursuing this as a possible firmware issue.  We have not
definitely confirmed that it really is a firmware issue which is the
reason we are keeping this bug open.

Comment 18 IBM Bug Proxy 2010-08-02 15:12:11 UTC
------- Comment From garyhade.com 2010-08-02 11:05 EDT-------
Red Hat,
Just to let you know, we are still in discussions with the BIOS
team concerning this issue.

Comment 19 IBM Bug Proxy 2010-08-10 18:41:17 UTC
------- Comment From mknutson.com 2010-08-10 14:37 EDT-------
Upgrading Priority - Looks like an issue in OS code and not firmware.  Affects nearly every system being shipped today.  All customers will see this if installing from CD/DVD.   summary comment to follow.

Comment 20 Siddharth Nagar 2010-08-11 14:56:01 UTC
IBM, please be advised that this issue risks not getting resolved for RHEL 6.0. We need immediate action on this.

Comment 21 Peter Jones 2010-08-12 17:24:27 UTC
Could we get some details on what the firmware team investigated and discovered?

Comment 22 IBM Bug Proxy 2010-08-12 22:51:01 UTC
------- Comment From lcm.com 2010-08-12 18:43 EDT-------
(In reply to comment #31)
> Could we get some details on what the firmware team investigated and
> discovered?

I've asked and am awaiting a response. I'll include the information here as soon as I get it.

Comment 23 Denise Dumas 2010-08-18 16:52:51 UTC
Please realize that this problem is about to get booted out of RHEL6 for lack of information.

Comment 24 IBM Bug Proxy 2010-08-19 05:51:45 UTC
------- Comment From lcm.com 2010-08-19 01:40 EDT-------
This is a firmware issue. With 'Force legacy video' mode during a native UEFI boot the firmware must disconnect the native UEFI (GOP) driver and attach a thunk driver that translates GOP protocol to an int10 interface .A bug in this code caused stale frame buffer data (bogus address/size) to be passed to the loader / kernel resulting in the blank screen during boot.

This only occurs if the video mode is switched midstream. If legacy int10 is used during the entire boot process or native UEFI video is used during the entire boot process (i.e., 'Force legacy video' is disabled, then no corruption occurs.

Comment 25 Steve Best 2010-08-19 10:00:40 UTC
closing not a bug, firmware issue

Comment 26 Denise Dumas 2010-08-19 18:15:18 UTC
Steve, do you want to propose a release note for 6.0? And maybe point people to where to find firmware updates?

Comment 27 Peter Jones 2010-08-23 14:44:22 UTC
At the very least, we want to tell our customers not to enable "force legacy video mode" in the firmware; we want the GOP driver at all times if the user is booting via UEFI.

Comment 28 Denise Dumas 2010-08-23 21:35:02 UTC
OK, how is this for a relnote? 

Systems such as the IBM x3650 M2 and similar, using the Matrox VGA controller, may encounter installation problems in which the screen is unreadable when booted using UEFI.

This is a firmware issue. Do not enable "force legacy
video mode" in the firmware; use the GOP driver at all times if the user is
booting via UEFI.

Alternatively, enable the serial console in the BIOS and add "console=tty0 console=ttyS0,115200" to the kernel command line. 

Updated firmware can be found at (sbeal???)

Comment 29 IBM Bug Proxy 2010-08-31 18:51:11 UTC
------- Comment From stanselk.com 2010-08-31 14:46 EDT-------
Couple changes below:

Systems using an Integrated Management Module (IMM), using the Matrox VGA controller, may encounter installation problems in which the screen is unreadable when booted using UEFI.

This is a firmware issue. Do not enable "force legacy video mode" in the firmware; use the GOP driver at all times if the user is booting via UEFI.


A future firmware release will correct this issue and will be available at ibm.com.