Bug 2002907

Summary: Unexpectedly failed when managedsave the guest which has qxl video device
Product: Red Hat Enterprise Linux 8 Reporter: Meina Li <meili>
Component: qemu-kvmAssignee: Gerd Hoffmann <kraxel>
qemu-kvm sub component: Graphics QA Contact: Guo, Zhiyi <zhguo>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: coli, ddepaula, jferlan, jinzhao, jmaloy, juzhang, smitterl, virt-maint, weizhan, yafu, yfu, zhguo
Version: 8.6Keywords: Regression
Target Milestone: rcFlags: zhguo: needinfo-
Target Release: ---   
Hardware: x86_64   
OS: All   
Whiteboard:
Fixed In Version: qemu-kvm-6.1.0-2.module+el8.6.0+12815+0d4739c1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-05-10 13:20:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
libvirtd.log none

Description Meina Li 2021-09-10 02:44:33 UTC
Created attachment 1821935 [details]
libvirtd.log

Description of problem:
Unexpectedly failed when managedsave/dump/snapshot-create --xmlfile the guest which has qxl video device

Version-Release number of selected component (if applicable):
libvirt-7.6.0-2.module+el8.6.0+12490+ec3e565c.x86_64
qemu-kvm-6.1.0-1.module+el8.6.0+12535+4e2af250.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Prepare a running guest with qxl video:
# virsh dumpxml avocado-vt-vm1 | grep /video -B4
    <video>
      <model type='qxl' ram='65536' vram='65536' vgamem='16384' heads='1' primary='yes'/>
      <alias name='video0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
    </video>
2. Managedsave the guest.
# virsh managedsave avocado-vt-vm1
error: Failed to save domain 'avocado-vt-vm1' state
error: operation failed: domain save job: unexpectedly failed
3. Dump the guest.
# virsh dump avocado-vt-vm1 /tmp/test
error: Failed to core dump domain 'avocado-vt-vm1' to /tmp/test
error: operation failed: domain core dump job: unexpectedly failed
4) Create the snapshot with xml file.
# virsh snapshot-create avocado-vt-vm1 snapshot.xml
error: operation failed: snapshot job: unexpectedly failed

Actual results:
Managedsave the guest failed

Expected results:
Managedsave the guest successfully

Additional info:
1) Can't reproduce in libvirt-7.6.0-2.module+el8.6.0+12490+ec3e565c.x86_64 and qemu-kvm-6.0.0-29.module+el8.6.0+12490+ec3e565c.x86_64. 
2) Qxl video is still using in RHV
3) The simple log(the detailed log is in attachment):
2021-09-10 02:31:49.227+0000: 141267: debug : qemuMonitorJSONIOProcessEvent:206 : handle MIGRATION handler=0x7fded83dd450 data=0x7fdea410a110
2021-09-10 02:31:49.227+0000: 141267: debug : qemuMonitorEmitMigrationStatus:1400 : mon=0x7fdf0c069300, status=failed
2021-09-10 02:31:49.227+0000: 141267: debug : qemuProcessHandleMigrationStatus:1584 : Migration of domain 0x7fdeb44df030 avocado-vt-vm1 changed state to failed
2021-09-10 02:31:49.227+0000: 87202: debug : qemuDomainObjBeginJobInternal:845 : Starting job: job=async nested agentJob=none asyncJob=none (vm=0x7fdeb44df030 name=avocado-vt-vm1, current job=none agentJob=none async=save)
2021-09-10 02:31:49.227+0000: 87202: debug : qemuDomainObjBeginJobInternal:892 : Started job: async nested (async=save vm=0x7fdeb44df030 name=avocado-vt-vm1)
2021-09-10 02:31:49.227+0000: 87202: debug : qemuDomainObjEnterMonitorInternal:5988 : Entering monitor (mon=0x7fdf0c069300 vm=0x7fdeb44df030 name=avocado-vt-vm1)
2021-09-10 02:31:49.227+0000: 87202: debug : qemuMonitorGetMigrationStats:2419 : mon:0x7fdf0c069300 vm:0x7fdeb44df030 fd:56
2021-09-10 02:31:49.227+0000: 87202: info : qemuMonitorSend:960 : QEMU_MONITOR_SEND_MSG: mon=0x7fdf0c069300 msg={"execute":"query-migrate","id":"libvirt-410"}^M
 fd=-1
2021-09-10 02:31:49.227+0000: 141267: info : qemuMonitorIOWrite:438 : QEMU_MONITOR_IO_WRITE: mon=0x7fdf0c069300 buf={"execute":"query-migrate","id":"libvirt-410"}^M
 len=48 ret=48 errno=0
2021-09-10 02:31:49.228+0000: 141267: debug : qemuMonitorJSONIOProcessLine:220 : Line [{"return": {"status": "failed"}, "id": "libvirt-410"}]
2021-09-10 02:31:49.228+0000: 141267: info : qemuMonitorJSONIOProcessLine:240 : QEMU_MONITOR_RECV_REPLY: mon=0x7fdf0c069300 reply={"return": {"status": "failed"}, "id": "libvirt-410"}
2021-09-10 02:31:49.228+0000: 87202: debug : qemuDomainObjExitMonitorInternal:6013 : Exited monitor (mon=0x7fdf0c069300 vm=0x7fdeb44df030 name=avocado-vt-vm1)
2021-09-10 02:31:49.228+0000: 87202: debug : qemuDomainObjEndJob:1145 : Stopping job: async nested (async=save vm=0x7fdeb44df030 name=avocado-vt-vm1)
2021-09-10 02:31:49.228+0000: 87202: error : qemuMigrationJobCheckStatus:1744 : operation failed: domain save job: unexpectedly failed

Comment 1 Guo, Zhiyi 2021-09-10 07:42:34 UTC
So a simple step to reproduce this issue:
# /usr/libexec/qemu-kvm -device qxl-vga -vnc :0 -monitor stdio
QEMU 6.1.0 monitor - type 'help' for more information
(qemu) migrate "exec:cat > mig"
qemu-kvm: pre-save failed: qxl

Comment 2 Guo, Zhiyi 2021-09-10 08:49:41 UTC
So this issue is also reproduced on upstream qemu v6.1.0 and cannot be reproduced after reverting commit:
commit 39b8a183e2f399d19f3ab6a3db44c7c74774dabd
Author: Gerd Hoffmann <kraxel>
Date:   Wed Jul 21 11:33:46 2021 +0200

    qxl: remove assert in qxl_pre_save.
    
    Since commit 551dbd0846d2 ("migration: check pre_save return in
    vmstate_save_state") the pre_save hook can fail.  So lets finally
    use that to drop the guest-triggerable assert in qxl_pre_save().
    
    Signed-off-by: Gerd Hoffmann <kraxel>
    Reviewed-by: Marc-André Lureau <marcandre.lureau>
    Message-Id: <20210721093347.338536-2-kraxel>

So assign to Gerd

Comment 4 John Ferlan 2021-09-14 22:43:40 UTC
Bulk update: Move RHEL8 bugs to RHEL9. If necessary to resolve in RHEL8, then clone to the current RHEL8 release.

Comment 5 Gerd Hoffmann 2021-09-17 10:13:18 UTC
Moving back to RHEL-8.  It's a regression there, and RHEL-9 is not affected due to spice being dropped.
Ahem, well, tried but bugzilla declares the subcomponent invalid and doesn't let be do that.  John?

Comment 6 Gerd Hoffmann 2021-09-17 10:14:12 UTC
(In reply to Gerd Hoffmann from comment #3)
> https://patchwork.ozlabs.org/project/qemu-devel/patch/20210910094203.3582378-
> 1-kraxel/

upstream commit eb94846280df3f1e2a91b6179fc05f9890b7e384

Comment 7 John Ferlan 2021-09-17 11:03:10 UTC
(In reply to Gerd Hoffmann from comment #5)
> Moving back to RHEL-8.  It's a regression there, and RHEL-9 is not affected
> due to spice being dropped.
> Ahem, well, tried but bugzilla declares the subcomponent invalid and doesn't
> let be do that.  John?

Yeah - it's a "bug" in the recent UI changes - the way around I found was using a search or having multiple bugs display and then "choosing" one to "edit"...

I'll take care of that.

Comment 10 Danilo de Paula 2021-09-30 17:51:29 UTC
QA: could you grant QA_ACK and provide ITM, please?

Comment 12 Yanan Fu 2021-10-08 05:32:53 UTC
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 13 Guo, Zhiyi 2021-10-08 05:54:23 UTC
Test against qemu-kvm-6.1.0-2.module+el8.6.0+12815+0d4739c1.x86_64

Simple reproducer cannot reproduce the issue anymore.
# /usr/libexec/qemu-kvm -device qxl-vga -vnc :0 -monitor stdio
QEMU 6.1.0 monitor - type 'help' for more information
(qemu) migrate "exec:cat > mig"
(qemu)

Also tested rhel8.6 and windows 10 VM live migration with qxl-vga device, migration works normally without error

Comment 16 Guo, Zhiyi 2021-10-13 06:32:54 UTC
Verified per comment 13

Comment 18 errata-xmlrpc 2022-05-10 13:20:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: virt:rhel and virt-devel:rhel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1759