Bug 1335830
| Summary: | Kexecing RHEL7 into RHEL6 fails with CIRRUS video type (KVM/QEMU) | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Lukas Zapletal <lzap> | ||||||
| Component: | kexec-tools | Assignee: | Pingfan Liu <piliu> | ||||||
| Status: | CLOSED WONTFIX | QA Contact: | Emma Wu <xiawu> | ||||||
| Severity: | unspecified | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 7.4 | CC: | chayang, coli, dsirrine, hachen, juzhang, knoel, kraxel, lilu, lmiksik, lsurette, lzap, michal.skrivanek, michen, qzhao, rbalakri, Rhev-m-bugs, ruyang, srevivo, virt-maint, ykaul | ||||||
| Target Milestone: | pre-dev-freeze | ||||||||
| Target Release: | 7.4 | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2017-08-15 06:58:01 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 1334477, 1394638, 1473055 | ||||||||
| Attachments: |
|
||||||||
|
Description
Lukas Zapletal
2016-05-13 10:30:57 UTC
so the only difference between working and non-working setup is VNC vs SPICE? Worth trying with VNC on QXL (in UI), SPICE+VNC setting, or plain VNC and use vga instead of cirrus(changed in 4.0 by default, or use a simple vdsm hook to replace cirrus with vga on VM start) Reproduced with new VM with defaults from RHEV: OS: RHEL7 Video Type: CIRRUS Graphics Protocol: VNC Commands: curl http://download.englab.brq.redhat.com/pub/rhel/released/RHEL-6/6.8/Server/x86_64/os/images/pxeboot/initrd.img -o initrd.img curl http://download.englab.brq.redhat.com/pub/rhel/released/RHEL-6/6.8/Server/x86_64/os/images/pxeboot/vmlinuz -o vmlinuz kexec --debug --initrd initrd.img vmlinuz Switching Video Type to QXL solved the problem. Looks like it does not matter if Graphic Protocol is VNC or Spice, the problem is the Video Type. Tried --reset-vga kexec option without any luck. UPDATED REPRODUCER: 1) Create new VM with CIRRUS video type 2) Install or run RHEL 7.x from an image 3) Download Anaconda initram and kernel from RHEL 6.0 kickstart repository on the guest (it must be RHEL version 6.x not 7.x) 4) Install kexec-tools 5) Run: kexec -d --force ./initrd.img ./vmlinuz Here is the snippet to run: curl http://download.englab.brq.redhat.com/pub/rhel/released/RHEL-6/6.8/Server/x86_64/os/images/pxeboot/initrd.img -o initrd.img curl http://download.englab.brq.redhat.com/pub/rhel/released/RHEL-6/6.8/Server/x86_64/os/images/pxeboot/vmlinuz -o vmlinuz kexec --debug --initrd initrd.img vmlinuz We only identified CIRRUS as the problematic one. QEMU offers more drivers, please test them all when assuring quality: "vga", "cirrus", "vmvga", "xen", "vbox", "qxl" or "virtio". Thanks. we're switching to "vga" in 4.0, can you confirm (e.g. on you own non-rhev setup) that "vga" works and "cirrus" doesn't? Additional testing on RHEL 7.1 (no updates applied) kexecing RHEL 6.8 kernel. QXL: PASS Cirrus: FAIL (all black, console does not respond at all) VGA: FAIL (console does not respond after "Starting new kernel" message) VMVGA: PASS XEN: N/A My testing shows that only the vga console not functional, for all three vga cards (qxl, stdvga, cirrus). qxl shows something, but the text mode font is screwed up so it is unreadable. The system is doing fine though, serial console works, I expect a fully automatic install works too even though you can't watch it on the vga console. The fundamental problem with the vga console is that RHEL-7 has kernel mode setting drivers for the qemu emulated cards (bochs-drm.ko for stdvga, cirrus.ko and qxl.ko), whereas RHEL-6 depends on the vgabios to handle the card. kexec seems to be able to handle the vgabios handover from one kernel to the next, but apparently only in case both kernels are using the vgabios. When forcing RHEL-7 into vgabios mode by blacklisting the kms driver module the vga console works fine in the RHEL-6 kernel after kexec. There is nothing we can do in qemu to fix that. The possible options I see are: (1) remove the qemu kms drivers from whatever image satellite uses for kexec. (2) maybe a quirk can be added to kexec to handle that case. Reassigning to kexec-tools for comments on (2). Gerd, I think your analysis make sense to me but I do not have any clue how to add a quirk to kexec. If the first kernel uses kms then kexec will always fail unless the 2nd kernel also has kms driver so that it can be reinitialized. I believe kexec can do nothing. Could you clarify a bit about (2) in your mind? Here is a bug for the kms/kexec issue https://bugzilla.redhat.com/show_bug.cgi?id=1279013 Correct myself in comment #11, kexec will always fail means kexec kernel graphics will not work... (In reply to Dave Young from comment #11) > Gerd, I think your analysis make sense to me but I do not have any clue how > to add a quirk to kexec. If the first kernel uses kms then kexec will always > fail unless the 2nd kernel also has kms driver so that it can be > reinitialized. Yes, kexec rhel7 -> rhel7 works fine for that reason. > I believe kexec can do nothing. > Could you clarify a bit about (2) in your mind? There is --reset-vga. Maybe create a variant of that which does not only reset the vga, but also initializes the vga to 80x25 text mode? After skimming over bug 1279013 I suspect (1) is the better way though, and *all* kms drivers not only the qemu ones should be removed. (In reply to Gerd Hoffmann from comment #14) > There is --reset-vga. Maybe create a variant of that which does not only > reset the vga, but also initializes the vga to 80x25 text mode? I'm not sure --reset-vga helps, I remember I tested it with nvidia card before, it just hung. I think it may help things very limited but I will do more test. > > After skimming over bug 1279013 I suspect (1) is the better way though, and > *all* kms drivers not only the qemu ones should be removed. Yes, for this bug, if kms drivers can be excluded or blacklisted it will be the best approach. Thanks Dave (In reply to Dave Young from comment #15) > (In reply to Gerd Hoffmann from comment #14) > > There is --reset-vga. Maybe create a variant of that which does not only > > reset the vga, but also initializes the vga to 80x25 text mode? > > I'm not sure --reset-vga helps, I remember I tested it with nvidia card > before, it just hung. I think it may help things very limited but I will do > more test. --reset-vga works on the qemu vga cards. Reset state is *not* vga text mode though, so the vga console still doesn't work. Switching the vga into text mode should be easy though, at least for the qemu vga cards which act like classic standard vga cards from early 90ies when it comes to text mode. Physical hardware is a different story. On a modern gpu alot more than programming a bunch of registers with a hard-codes sequence must be done. Scan outputs, figure where a display is connected, configure scanouts accordingly, setup laptop panel, ... I suspect getting real hardware to work without running the vgabios is next to impossible, and re-initializing the gpu using vgabios as part of the kexec sequence sounds scary to me. > --reset-vga works on the qemu vga cards. Reset state is *not* vga text mode > though, so the vga console still doesn't work. Switching the vga into text > mode should be easy though, at least for the qemu vga cards which act like > classic standard vga cards from early 90ies when it comes to text mode. > I will try if I find time on it, and google about how to do it. If you have some links I can refer to it will be also appreciated. > Physical hardware is a different story. On a modern gpu alot more than > programming a bunch of registers with a hard-codes sequence must be done. > Scan outputs, figure where a display is connected, configure scanouts > accordingly, setup laptop panel, ... > > I suspect getting real hardware to work without running the vgabios is next > to impossible, and re-initializing the gpu using vgabios as part of the > kexec sequence sounds scary to me. Yes, totally agree, that is also the reason why we have not get it work for long time. Thanks Dave (In reply to Dave Young from comment #17) > > --reset-vga works on the qemu vga cards. Reset state is *not* vga text mode > > though, so the vga console still doesn't work. Switching the vga into text > > mode should be easy though, at least for the qemu vga cards which act like > > classic standard vga cards from early 90ies when it comes to text mode. > > > > I will try if I find time on it, and google about how to do it. If you have > some links I can refer to it will be also appreciated. vgabios source code used by qemu is here: https://code.coreboot.org/p/seabios/source/tree/master/vgasrc/ Check stdvga_set_mode() in stdvgamodes.c Gerd, thank you, will have a look. For the record, I tried --reset-vga with Cirrus and it did not help. I think blacklisting driver on the discovery image is an easy task which I can implement easily if that provides better user experience. Can you tell me exactly what should I blacklist? I suppose we won't miss any important functionality, console is only used for simple TUI to show discovery status (no performance interest or similar). I can also add --reset-vga to the command line just for case. I can also force to the simple 80x25 text mode if that helps (when I was testing this, I remember I was not in this mode). Or is there a way to force RHEL to boot into some kind of super-generic (framebuffer perhaps) driver that works everywhere? This could be win for us and we could perhaps also drop all the video hardware drivers from the image (size matters a lot here). Performance does not really matter here, we only need it to work on both bare metal and virtualization environments (all of them). For the record, I filed the tickets upstream under: http://projects.theforeman.org/issues/15144 http://projects.theforeman.org/issues/15145 Thanks for help! > I think blacklisting driver on the discovery image is an easy task which I > can implement easily if that provides better user experience. Can you tell > me exactly what should I blacklist? qemu drm drivers are: bochs-drm.ko, cirrus.ko, qxl.ko, virtio-gpu.ko Given this is a problem on real hardware too (see bug 1279013) I'd suggest to blacklist everything below drivers/gpu/drm. > I can also add --reset-vga to the command line just for case. I can also > force to the simple 80x25 text mode if that helps (when I was testing this, > I remember I was not in this mode). Probably not helpful. > Or is there a way to force RHEL to boot into some kind of super-generic > (framebuffer perhaps) driver that works everywhere? This could be win for us > and we could perhaps also drop all the video hardware drivers from the image > (size matters a lot here). Dropping all drm drivers should do the trick. The system should continue to run in vga textmode then. Or when running on UEFI continue to use the firmware framebuffer (efifb), which (as far I know) kexec can handover from one kernel to the next. Thank you very much, that indeed fixes the issue on Cirrus when kexecing RHEL 6. Created attachment 1263010 [details]
stdvga fix
finally found some time to look at this. patch gets stdvga going.
stdvga reset method works for qxl-vga too.
stdvga reset works partly for virtio-vga. It manages to successfully reset the vga emulation, but doesn't switch back from virtio mode to vga compat mode. That'll happen when the linux kernel virtio-pci driver resets all virtio devices, at which point the kernel messages start to appear on the vga text console.
Doing the virtio reset in purgatory requires a pci mmio bar write, doesn't look like purgatory has the infrastructure to do that easily ...
Created attachment 1263011 [details]
partial cirrus support
patch gets the cirrus back to text mode, memory access still seems to be in some weird mode though, boot messages appear somewhat scrambled. But I'm tired for today ...
Gerd, thanks for the fix, std-vga works for me except when use vga=788, after kexec reboot the window keeps the text mode size and kernel does not change to 788 framebuffer. But I think that is acceptable consider very few users use this. Rethinking about this, the most important part is the real hardware case, so it may not worth more effort on emulated cards so for the cirrus, maybe we can just leave it as is, one can still use the workaround. For real hardware if it is not possible or very hard then we have to give up, Bhupesh is taking this bug I hope Bhupesh can do some investigation see if we can do something for real hardware as well. Or we can just fix the qemu std-vga only. Rethink about this, we prefer not to fix it only for the cirrus reset. We can use the workaround mentioned before. |