Bug 1335830

Summary: Kexecing RHEL7 into RHEL6 fails with CIRRUS video type (KVM/QEMU)
Product: Red Hat Enterprise Linux 7 Reporter: Lukas Zapletal <lzap>
Component: kexec-toolsAssignee: Pingfan Liu <piliu>
Status: CLOSED WONTFIX QA Contact: Emma Wu <xiawu>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.4CC: chayang, coli, dsirrine, hachen, juzhang, knoel, kraxel, lilu, lmiksik, lsurette, lzap, michal.skrivanek, michen, qzhao, rbalakri, Rhev-m-bugs, ruyang, srevivo, virt-maint, ykaul
Target Milestone: pre-dev-freeze   
Target Release: 7.4   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-15 06:58:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1334477, 1394638, 1473055    
Attachments:
Description Flags
stdvga fix
none
partial cirrus support none

Description Lukas Zapletal 2016-05-13 10:30:57 UTC
Hello,

1) In RHEV 3.6+ create new VM with VNC screen (do not use SPICE - it works)
2) Install or run RHEL 7.0 from an image
3) Download Anaconda initram and kernel from RHEL 6.0 kickstart repository on the guest (it must be RHEL version 6.x not 7.x)
4) Install kexec-tools
5) Run: kexec  -d --force ./initrd.img ./vmlinuz

The system freezes.

Expected behavior:

You see Anaconda initializing network devices and trying to download kickstart or welcome screen (depending on the kernel command line options).

Satellite 6 uses kexec to provision systems on PXE/DHCP-less networks

Comment 2 Michal Skrivanek 2016-05-14 07:30:13 UTC
so the only difference between working and non-working setup is VNC vs SPICE? 
Worth trying with VNC on QXL (in UI), SPICE+VNC setting, or plain VNC and use vga instead of cirrus(changed in 4.0 by default, or use a simple vdsm hook to replace cirrus with vga on VM start)

Comment 3 Lukas Zapletal 2016-05-17 09:16:57 UTC
Reproduced with new VM with defaults from RHEV:

OS: RHEL7
Video Type: CIRRUS
Graphics Protocol: VNC

Commands:

curl http://download.englab.brq.redhat.com/pub/rhel/released/RHEL-6/6.8/Server/x86_64/os/images/pxeboot/initrd.img -o initrd.img
curl http://download.englab.brq.redhat.com/pub/rhel/released/RHEL-6/6.8/Server/x86_64/os/images/pxeboot/vmlinuz -o vmlinuz
kexec --debug --initrd initrd.img vmlinuz

Switching Video Type to QXL solved the problem. Looks like it does not matter if Graphic Protocol is VNC or Spice, the problem is the Video Type.

Tried --reset-vga kexec option without any luck.

Comment 6 Lukas Zapletal 2016-05-17 10:44:55 UTC
UPDATED REPRODUCER:

1) Create new VM with CIRRUS video type
2) Install or run RHEL 7.x from an image
3) Download Anaconda initram and kernel from RHEL 6.0 kickstart repository on the guest (it must be RHEL version 6.x not 7.x)
4) Install kexec-tools
5) Run: kexec  -d --force ./initrd.img ./vmlinuz

Here is the snippet to run:

curl http://download.englab.brq.redhat.com/pub/rhel/released/RHEL-6/6.8/Server/x86_64/os/images/pxeboot/initrd.img -o initrd.img
curl http://download.englab.brq.redhat.com/pub/rhel/released/RHEL-6/6.8/Server/x86_64/os/images/pxeboot/vmlinuz -o vmlinuz
kexec --debug --initrd initrd.img vmlinuz

We only identified CIRRUS as the problematic one. QEMU offers more drivers, please test them all when assuring quality: "vga", "cirrus", "vmvga", "xen", "vbox", "qxl" or "virtio". Thanks.

Comment 7 Michal Skrivanek 2016-05-17 10:54:36 UTC
we're switching to "vga" in 4.0, can you confirm (e.g. on you own non-rhev setup) that "vga" works and "cirrus" doesn't?

Comment 9 Lukas Zapletal 2016-05-17 14:25:14 UTC
Additional testing on RHEL 7.1 (no updates applied) kexecing RHEL 6.8 kernel.

QXL: PASS
Cirrus: FAIL (all black, console does not respond at all)
VGA: FAIL (console does not respond after "Starting new kernel" message)
VMVGA: PASS
XEN: N/A

Comment 10 Gerd Hoffmann 2016-05-17 18:59:13 UTC
My testing shows that only the vga console not functional, for all three vga cards (qxl, stdvga, cirrus).  qxl shows something, but the text mode font is screwed up so it is unreadable.  The system is doing fine though, serial console works, I expect a fully automatic install works too even though you can't watch it on the vga console.

The fundamental problem with the vga console is that RHEL-7 has kernel mode setting drivers for the qemu emulated cards (bochs-drm.ko for stdvga, cirrus.ko and qxl.ko), whereas RHEL-6 depends on the vgabios to handle the card.  kexec seems to be able to handle the vgabios handover from one kernel to the next, but apparently only in case both kernels are using the vgabios.

When forcing RHEL-7 into vgabios mode by blacklisting the kms driver module the vga console works fine in the RHEL-6 kernel after kexec.

There is nothing we can do in qemu to fix that.  The possible options I see are:

(1) remove the qemu kms drivers from whatever image satellite uses for kexec.

(2) maybe a quirk can be added to kexec to handle that case.

Reassigning to kexec-tools for comments on (2).

Comment 11 Dave Young 2016-05-18 01:58:57 UTC
Gerd, I think your analysis make sense to me but I do not have any clue how to add a quirk to kexec. If the first kernel uses kms then kexec will always fail unless the 2nd kernel also has kms driver so that it can be reinitialized. I believe kexec can do nothing.

Could you clarify a bit about (2) in your mind?

Comment 12 Dave Young 2016-05-18 02:00:56 UTC
Here is a bug for the kms/kexec issue
https://bugzilla.redhat.com/show_bug.cgi?id=1279013

Comment 13 Dave Young 2016-05-18 02:02:16 UTC
Correct myself in comment #11, kexec will always fail means kexec kernel graphics will not work...

Comment 14 Gerd Hoffmann 2016-05-18 07:10:58 UTC
(In reply to Dave Young from comment #11)
> Gerd, I think your analysis make sense to me but I do not have any clue how
> to add a quirk to kexec. If the first kernel uses kms then kexec will always
> fail unless the 2nd kernel also has kms driver so that it can be
> reinitialized.

Yes, kexec rhel7 -> rhel7 works fine for that reason.

> I believe kexec can do nothing.

> Could you clarify a bit about (2) in your mind?

There is --reset-vga.  Maybe create a variant of that which does not only reset the vga, but also initializes the vga to 80x25 text mode?

After skimming over bug 1279013 I suspect (1) is the better way though, and *all* kms drivers not only the qemu ones should be removed.

Comment 15 Dave Young 2016-05-18 08:18:31 UTC
(In reply to Gerd Hoffmann from comment #14)
> There is --reset-vga.  Maybe create a variant of that which does not only
> reset the vga, but also initializes the vga to 80x25 text mode?

I'm not sure --reset-vga helps, I remember I tested it with nvidia card before, it just hung. I think it may help things very limited but I will do more test.

> 
> After skimming over bug 1279013 I suspect (1) is the better way though, and
> *all* kms drivers not only the qemu ones should be removed.

Yes, for this bug, if kms drivers can be excluded or blacklisted it will be the best approach.

Thanks
Dave

Comment 16 Gerd Hoffmann 2016-05-18 12:55:40 UTC
(In reply to Dave Young from comment #15)
> (In reply to Gerd Hoffmann from comment #14)
> > There is --reset-vga.  Maybe create a variant of that which does not only
> > reset the vga, but also initializes the vga to 80x25 text mode?
> 
> I'm not sure --reset-vga helps, I remember I tested it with nvidia card
> before, it just hung. I think it may help things very limited but I will do
> more test.

--reset-vga works on the qemu vga cards.  Reset state is *not* vga text mode though, so the vga console still doesn't work.  Switching the vga into text mode should be easy though, at least for the qemu vga cards which act like classic standard vga cards from early 90ies when it comes to text mode.

Physical hardware is a different story.  On a modern gpu alot more than programming a bunch of registers with a hard-codes sequence must be done.  Scan outputs, figure where a display is connected, configure scanouts accordingly, setup laptop panel, ...

I suspect getting real hardware to work without running the vgabios is next to impossible, and re-initializing the gpu using vgabios as part of the kexec sequence sounds scary to me.

Comment 17 Dave Young 2016-05-20 01:47:03 UTC
> --reset-vga works on the qemu vga cards.  Reset state is *not* vga text mode
> though, so the vga console still doesn't work.  Switching the vga into text
> mode should be easy though, at least for the qemu vga cards which act like
> classic standard vga cards from early 90ies when it comes to text mode.
> 

I will try if I find time on it, and google about how to do it. If you have some links I can refer to it will be also appreciated.

> Physical hardware is a different story.  On a modern gpu alot more than
> programming a bunch of registers with a hard-codes sequence must be done. 
> Scan outputs, figure where a display is connected, configure scanouts
> accordingly, setup laptop panel, ...
> 
> I suspect getting real hardware to work without running the vgabios is next
> to impossible, and re-initializing the gpu using vgabios as part of the
> kexec sequence sounds scary to me.

Yes, totally agree, that is also the reason why we have not get it work for long time.

Thanks
Dave

Comment 18 Gerd Hoffmann 2016-05-20 06:13:51 UTC
(In reply to Dave Young from comment #17)
> > --reset-vga works on the qemu vga cards.  Reset state is *not* vga text mode
> > though, so the vga console still doesn't work.  Switching the vga into text
> > mode should be easy though, at least for the qemu vga cards which act like
> > classic standard vga cards from early 90ies when it comes to text mode.
> > 
> 
> I will try if I find time on it, and google about how to do it. If you have
> some links I can refer to it will be also appreciated.

vgabios source code used by qemu is here:

https://code.coreboot.org/p/seabios/source/tree/master/vgasrc/

Check stdvga_set_mode() in stdvgamodes.c

Comment 19 Dave Young 2016-05-20 06:43:38 UTC
Gerd, thank you, will have a look.

Comment 20 Lukas Zapletal 2016-05-23 14:50:33 UTC
For the record, I tried --reset-vga with Cirrus and it did not help.

I think blacklisting driver on the discovery image is an easy task which I can implement easily if that provides better user experience. Can you tell me exactly what should I blacklist? I suppose we won't miss any important functionality, console is only used for simple TUI to show discovery status (no performance interest or similar).

I can also add --reset-vga to the command line just for case. I can also force to the simple 80x25 text mode if that helps (when I was testing this, I remember I was not in this mode).

Or is there a way to force RHEL to boot into some kind of super-generic (framebuffer perhaps) driver that works everywhere? This could be win for us and we could perhaps also drop all the video hardware drivers from the image (size matters a lot here). Performance does not really matter here, we only need it to work on both bare metal and virtualization environments (all of them).

For the record, I filed the tickets upstream under:

http://projects.theforeman.org/issues/15144
http://projects.theforeman.org/issues/15145

Thanks for help!

Comment 21 Gerd Hoffmann 2016-05-24 05:50:56 UTC
> I think blacklisting driver on the discovery image is an easy task which I
> can implement easily if that provides better user experience. Can you tell
> me exactly what should I blacklist?

qemu drm drivers are: bochs-drm.ko, cirrus.ko, qxl.ko, virtio-gpu.ko

Given this is a problem on real hardware too (see bug 1279013) I'd suggest to blacklist everything below drivers/gpu/drm.

> I can also add --reset-vga to the command line just for case. I can also
> force to the simple 80x25 text mode if that helps (when I was testing this,
> I remember I was not in this mode).

Probably not helpful.

> Or is there a way to force RHEL to boot into some kind of super-generic
> (framebuffer perhaps) driver that works everywhere? This could be win for us
> and we could perhaps also drop all the video hardware drivers from the image
> (size matters a lot here).

Dropping all drm drivers should do the trick.  The system should continue to run in vga textmode then.  Or when running on UEFI continue to use the firmware framebuffer (efifb), which (as far I know) kexec can handover from one kernel to the next.

Comment 22 Lukas Zapletal 2016-05-24 08:46:25 UTC
Thank you very much, that indeed fixes the issue on Cirrus when kexecing RHEL 6.

Comment 27 Gerd Hoffmann 2017-03-14 16:08:22 UTC
Created attachment 1263010 [details]
stdvga fix

finally found some time to look at this.  patch gets stdvga going.

stdvga reset method works for qxl-vga too.

stdvga reset works partly for virtio-vga.  It manages to successfully reset the vga emulation, but doesn't switch back from virtio mode to vga compat mode.  That'll happen when the linux kernel virtio-pci driver resets all virtio devices, at which point the kernel messages start to appear on the vga text console.

Doing the virtio reset in purgatory requires a pci mmio bar write, doesn't look like purgatory has the infrastructure to do that easily ...

Comment 28 Gerd Hoffmann 2017-03-14 16:10:49 UTC
Created attachment 1263011 [details]
partial cirrus support

patch gets the cirrus back to text mode, memory access still seems to be in some weird mode though, boot messages appear somewhat scrambled.  But I'm tired for today ...

Comment 29 Dave Young 2017-03-23 09:11:10 UTC
Gerd, thanks for the fix, std-vga works for me except when use vga=788, after kexec reboot the window keeps the text mode size and kernel does not change to 788 framebuffer. But I think that is acceptable consider very few users use this.

Rethinking about this, the most important part is the real hardware case, so it may not worth more effort on emulated cards so for the cirrus, maybe we can just leave it as is, one can still use the workaround.

For real hardware if it is not possible or very hard then we have to give up, Bhupesh is taking this bug I hope Bhupesh can do some investigation see if we can do something for real hardware as well. Or we can just fix the qemu std-vga only.

Comment 32 Dave Young 2017-08-15 06:58:01 UTC
Rethink about this, we prefer not to fix it only for the cirrus reset. We can use the workaround mentioned before.