Bug 1824363

Summary: Qemu core dump when do snapshot with same node and overlay that not existed in snapshot chain
Product: Red Hat Enterprise Linux 9 Reporter: aihua liang <aliang>
Component: qemu-kvmAssignee: Kevin Wolf <kwolf>
qemu-kvm sub component: Block Jobs QA Contact: aihua liang <aliang>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: low CC: coli, jinzhao, juzhang, kwolf, mrezanin, ngu, qzhang, virt-maint
Version: 9.0Keywords: EasyFix, Reopened, Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-6.2.0-1.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-05-17 12:23:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description aihua liang 2020-04-16 02:31:27 UTC
Description of problem:
 Qemu core dump when do snapshot with same node and overlay that not existed in snapshot chain

Version-Release number of selected component (if applicable):
 kernel version:4.18.0-175.el8.x86_64
 qemu-kvm version:qemu-kvm-4.2.0-18.module+el8.2.0+6278+dfae3426

How reproducible:
100%

Steps to Reproduce:
1.Start guest with qemu cmds:
    /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -machine q35  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x1 \
    -m 4096  \
    -smp 16,maxcpus=16,cores=8,threads=1,dies=1,sockets=2  \
    -cpu 'EPYC',+kvm_pv_unhalt  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20200203-033416-61dmcn92,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20200203-033416-61dmcn92,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idy8YPXp \
    -chardev socket,path=/var/tmp/serial-serial0-20200203-033416-61dmcn92,server,nowait,id=chardev_serial0 \
    -device isa-serial,id=serial0,chardev=chardev_serial0  \
    -chardev socket,id=seabioslog_id_20200203-033416-61dmcn92,path=/var/tmp/seabios-20200203-033416-61dmcn92,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20200203-033416-61dmcn92,iobase=0x402 \
    -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
    -device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 \
    -object iothread,id=iothread0 \
    -object iothread,id=iothread1 \
    -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \
    -blockdev node-name=file_image1,driver=file,aio=threads,filename=/home/kvm_autotest_root/images/rhel820-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,write-cache=on,bus=pcie.0-root-port-3,iothread=iothread0 \
    -device pcie-root-port,id=pcie.0-root-port-6,slot=6,chassis=6,addr=0x6,bus=pcie.0 \
    -blockdev node-name=file_data1,driver=file,aio=threads,filename=/home/data.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_data1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_data1 \
    -device virtio-blk-pci,id=data1,drive=drive_data1,write-cache=on,bus=pcie.0-root-port-6,iothread=iothread1 \
    -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \
    -device virtio-net-pci,mac=9a:6c:ca:b7:36:85,id=idz4QyVp,netdev=idNnpx5D,bus=pcie.0-root-port-4,addr=0x0  \
    -netdev tap,id=idNnpx5D,vhost=on \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot menu=off,order=cdn,once=c,strict=off \
    -enable-kvm \
    -device pcie-root-port,id=pcie_extra_root_port_0,slot=5,chassis=5,addr=0x5,bus=pcie.0 \
    -monitor stdio \
    -qmp tcp:0:3000,server,nowait \

 2. Create a snapshot target in advance
     {'execute':'blockdev-create','arguments':{'options':
{'driver':'file','filename':'/root/sn1','size':21474836480},'job-id':'job1'}}
     {'execute':'blockdev-add','arguments':{'driver':'file','node-name':'drive_sn1','filename':'/root/sn1'}}
     {'execute':'blockdev-create','arguments':{'options':
{'driver': 'qcow2','file':'drive_sn1','size':21474836480,'backing-file':'/home/data.qcow2','backing-fmt':'qcow2'},'job-id':'job2'}}
     {'execute':'blockdev-add','arguments':{'driver':'qcow2','node-name':'sn1','file':'drive_sn1','backing':null}}
     {'execute':'job-dismiss','arguments':{'id':'job1'}}
     {'execute':'job-dismiss','arguments':{'id':'job2'}}

 3. Do snapshot from sn1 to sn1
     {"execute":"blockdev-snapshot","arguments":{"node":"sn1","overlay":"sn1"}}
Ncat: Connection reset by peer.

Actual results:
 After step3, qemu core dump with info:
    (qemu) qemu-kvm: block.c:2416: bdrv_replace_child_noperm:
Assertion `new_bs->quiesce_counter <= new_bs_quiesce_counter' failed.
bug.txt: line 42: 18428 Aborted                 (core dumped)
/usr/libexec/qemu-kvm -name 'avocado-vt-vm1' -sandbox on -machine q35
-nodefaults ...

 gdb info:
  (gdb) bt
#0  0x00007f7916b5870f in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f7916b42b25 in __GI_abort () at abort.c:79
#2  0x00007f7916b429f9 in __assert_fail_base
    (fmt=0x7f7916ca8c28 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
assertion=0x5577e0b96700 "new_bs->quiesce_counter <=
new_bs_quiesce_counter", file=0x5577e0af7630 "block.c", line=2416,
function=<optimized out>) at assert.c:92
#3  0x00007f7916b50cc6 in __GI___assert_fail
    (assertion=assertion@entry=0x5577e0b96700 "new_bs->quiesce_counter
<= new_bs_quiesce_counter", file=file@entry=0x5577e0af7630 "block.c",
line=line@entry=2416, function=function@entry=0x5577e0b98040
<__PRETTY_FUNCTION__.31656> "bdrv_replace_child_noperm") at
assert.c:101
#4  0x00005577e0945017 in bdrv_replace_child_noperm
    (child=child@entry=0x5577e2d5df70, new_bs=new_bs@entry=0x5577e31e56f0)
    at block.c:2416
#5  0x00005577e0949f33 in bdrv_replace_child
    (child=child@entry=0x5577e2d5df70, new_bs=new_bs@entry=0x5577e31e56f0)
    at block.c:2453
#6  0x00005577e094aec8 in bdrv_root_attach_child
    (child_bs=child_bs@entry=0x5577e31e56f0,
child_name=child_name@entry=0x5577e0b9cd52 "backing",
child_role=child_role@entry=0x5577e1194080 <child_backing>,
ctx=<optimize--Type <RET> for more, q to quit, c to continue without
paging--
d out>, perm=<optimized out>, shared_perm=<optimized out>,
opaque=0x5577e31e56f0, errp=0x7ffd1b263a50) at block.c:2557
#7  0x00005577e094b0a5 in bdrv_attach_child
    (parent_bs=parent_bs@entry=0x5577e31e56f0,
child_bs=child_bs@entry=0x5577e31e56f0,
child_name=child_name@entry=0x5577e0b9cd52 "backing",
child_role=child_role@entry=0x5577e1194080 <child_backing>,
errp=errp@entry=0x7ffd1b263a50) at block.c:6058
#8  0x00005577e094b1f5 in bdrv_set_backing_hd
    (bs=bs@entry=0x5577e31e56f0,
backing_hd=backing_hd@entry=0x5577e31e56f0,
errp=errp@entry=0x7ffd1b263a50) at block.c:2709
#9  0x00005577e094b4aa in bdrv_append
    (bs_new=0x5577e31e56f0, bs_top=0x5577e31e56f0,
errp=errp@entry=0x7ffd1b263ab0)
    at block.c:4402
#10 0x00005577e07e8b70 in external_snapshot_prepare
    (common=0x5577e32a3940, errp=0x7ffd1b263b38) at blockdev.c:1683
#11 0x00005577e07ebd02 in qmp_transaction
    (dev_list=dev_list@entry=0x7ffd1b263bc0,
has_props=has_props@entry=false, props=0x5577e2c5f950,
props@entry=0x0, errp=errp@entry=0x7ffd1b263bf8) at blockdev.c:2470
#12 0x00005577e07ebfa5 in blockdev_do_action
    (errp=<optimized out>, action=0x7ffd1b263bb0) at blockdev.c:1118
#13 0x00005577e07ebfa5 in qmp_blockdev_snapshot
    (node=<optimized out>, overlay=<optimized out>,
errp=errp@entry=0x7ffd1b263bf8)
--Type <RET> for more, q to quit, c to continue without paging--
    at blockdev.c:1160
#14 0x00005577e0905a08 in qmp_marshal_blockdev_snapshot
    (args=<optimized out>, ret=<optimized out>, errp=0x7ffd1b263c68)
    at qapi/qapi-commands-block-core.c:343
#15 0x00005577e09c9f9c in do_qmp_dispatch
    (errp=0x7ffd1b263c60, allow_oob=<optimized out>,
request=<optimized out>, cmds=0x5577e12b5d80 <qmp_commands>) at
qapi/qmp-dispatch.c:132
#16 0x00005577e09c9f9c in qmp_dispatch
    (cmds=0x5577e12b5d80 <qmp_commands>, request=<optimized out>,
allow_oob=<optimized out>) at qapi/qmp-dispatch.c:175
#17 0x00005577e08e7ed1 in monitor_qmp_dispatch
    (mon=0x5577e251f2f0, req=<optimized out>) at monitor/qmp.c:145
#18 0x00005577e08e856a in monitor_qmp_bh_dispatcher (data=<optimized out>)
    at monitor/qmp.c:234
#19 0x00005577e0a11996 in aio_bh_call (bh=0x5577e241da20) at util/async.c:117
#20 0x00005577e0a11996 in aio_bh_poll (ctx=ctx@entry=0x5577e241c5d0)
    at util/async.c:117
#21 0x00005577e0a14d84 in aio_dispatch (ctx=0x5577e241c5d0) at
util/aio-posix.c:459
#22 0x00005577e0a11872 in aio_ctx_dispatch
    (source=<optimized out>, callback=<optimized out>,
user_data=<optimized out>)
    at util/async.c:260
--Type <RET> for more, q to quit, c to continue without paging--
#23 0x00007f791b3e067d in g_main_dispatch (context=0x5577e24aa110) at
gmain.c:3176
#24 0x00007f791b3e067d in g_main_context_dispatch
    (context=context@entry=0x5577e24aa110) at gmain.c:3829
#25 0x00005577e0a13e38 in glib_pollfds_poll () at util/main-loop.c:219
#26 0x00005577e0a13e38 in os_host_main_loop_wait (timeout=<optimized out>)
    at util/main-loop.c:242
#27 0x00005577e0a13e38 in main_loop_wait (nonblocking=<optimized out>)
    at util/main-loop.c:518
#28 0x00005577e07f60b1 in main_loop () at vl.c:1828
#29 0x00005577e06a1ff2 in main
    (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
at vl.c:4504

Expected results:
 After step3, snapshot fail with the correct error info.

Comment 2 aihua liang 2020-04-16 02:38:22 UTC
As it's a negative test and it can't be triggered by libvirt, set its priority to "low"

Comment 3 aihua liang 2020-09-11 09:25:25 UTC
Test on qemu-kvm-5.1.0-5.module+el8.3.0+7975+b80d25f1, still hit this issue.

Comment 5 John Ferlan 2021-09-08 21:49:55 UTC
Move RHEL-AV bugs to RHEL9. If necessary to resolve in RHEL8, then clone to the current RHEL8 release.

Comment 7 RHEL Program Management 2021-10-16 07:27:06 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 8 aihua liang 2021-10-18 08:28:37 UTC
Test on qemu-kvm-6.1.0-5.el9, still hit this core dump issue.

Hi, Kevin

 Will we plan to fix it? If yes, I will reopen it.

Thanks,
Aliang

Comment 9 Kevin Wolf 2021-10-18 13:01:01 UTC
Oh, this one didn't even have an assignee.

Yes, I'm reopening it. I'll fix it upstream and then we'll get it from the 6.2 rebase in time for 9.0-GA.

Comment 11 aihua liang 2021-12-17 06:39:46 UTC
Test with qemu-kvm-6.2.0-1.el9, don't hit this issue any more.

Test Steps:
 1.Start with qemu cmd:
    /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -machine q35,memory-backend=mem-machine_mem \
    -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
    -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x2 \
    -m 30720 \
    -object memory-backend-ram,size=30720M,id=mem-machine_mem  \
    -smp 10,maxcpus=10,cores=5,threads=1,dies=1,sockets=2  \
    -cpu 'Cascadelake-Server-noTSX',+kvm_pv_unhalt \
    -chardev socket,wait=off,id=qmp_id_qmpmonitor1,path=/tmp/monitor-qmpmonitor1-20211215-212014-u83qUkY3,server=on  \
    -mon chardev=qmp_id_qmpmonitor1,mode=control \
    -chardev socket,wait=off,id=qmp_id_catch_monitor,path=/tmp/monitor-catch_monitor-20211215-212014-u83qUkY3,server=on  \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=ida8F4GE \
    -chardev socket,wait=off,id=chardev_serial0,path=/tmp/serial-serial0-20211215-212014-u83qUkY3,server=on \
    -device isa-serial,id=serial0,chardev=chardev_serial0  \
    -chardev socket,id=seabioslog_id_20211215-212014-u83qUkY3,path=/tmp/seabios-20211215-212014-u83qUkY3,server=on,wait=off \
    -device isa-debugcon,chardev=seabioslog_id_20211215-212014-u83qUkY3,iobase=0x402 \
    -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
    -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    --object iothread,id=iothread1 \
    -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-2,addr=0x0 \
    -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/rhel900-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \
    -device pcie-root-port,id=pcie.0-root-port-6,slot=6,chassis=6,addr=0x6,bus=pcie.0 \
    -blockdev node-name=file_data1,driver=file,aio=threads,filename=/home/data.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_data1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_data1 \
    -device virtio-blk-pci,id=data1,drive=drive_data1,write-cache=on,bus=pcie.0-root-port-6,iothread=iothread1 \
    -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
    -device virtio-net-pci,mac=9a:37:88:01:97:b6,id=idvRVbq8,netdev=idSXhwTw,bus=pcie-root-port-3,addr=0x0  \
    -netdev tap,id=idSXhwTw,vhost=on  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot menu=off,order=cdn,once=c,strict=off \
    -enable-kvm \
    -device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x3,chassis=5 \
    -monitor stdio \

 2.Create target node
    {'execute':'blockdev-create','arguments':{'options':{'driver':'file','filename':'/root/sn1','size':21474836480},'job-id':'job1'}}
{"timestamp": {"seconds": 1639722920, "microseconds": 864008}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job1"}}
{"timestamp": {"seconds": 1639722920, "microseconds": 864055}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job1"}}
{"return": {}}
{"timestamp": {"seconds": 1639722921, "microseconds": 762531}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job1"}}
{"timestamp": {"seconds": 1639722921, "microseconds": 762575}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "job1"}}
{"timestamp": {"seconds": 1639722921, "microseconds": 762596}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "job1"}}
    {'execute':'blockdev-add','arguments':{'driver':'file','node-name':'drive_sn1','filename':'/root/sn1'}}
{"return": {}}
    {'execute':'blockdev-create','arguments':{'options': {'driver': 'qcow2','file':'drive_sn1','size':21474836480,'backing-file':'/home/data.qcow2','backing-fmt':'qcow2'},'job-id':'job2'}}
{"timestamp": {"seconds": 1639722937, "microseconds": 619983}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job2"}}
{"timestamp": {"seconds": 1639722937, "microseconds": 620029}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job2"}}
{"return": {}}
{"timestamp": {"seconds": 1639722937, "microseconds": 621572}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job2"}}
{"timestamp": {"seconds": 1639722937, "microseconds": 621602}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "job2"}}
{"timestamp": {"seconds": 1639722937, "microseconds": 621622}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "job2"}}
    {'execute':'blockdev-add','arguments':{'driver':'qcow2','node-name':'sn1','file':'drive_sn1','backing':null}}
{"return": {}}
    {'execute':'job-dismiss','arguments':{'id':'job1'}}
    {'execute':'job-dismiss','arguments':{'id':'job2'}}
{"timestamp": {"seconds": 1639722951, "microseconds": 844576}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "job1"}}
{"return": {}}
{"timestamp": {"seconds": 1639722951, "microseconds": 844937}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "job2"}}
{"return": {}}

 3.Do snapshot from sn1 to sn1
   {"execute":"blockdev-snapshot","arguments":{"node":"sn1","overlay":"sn1"}}


Test Result:
  In step3, snapshot failed with info:
{"error": {"class": "GenericError", "desc": "Making 'sn1' a backing child of 'sn1' would create a cycle"}}

Comment 12 Yanan Fu 2021-12-20 12:44:29 UTC
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 16 aihua liang 2021-12-23 06:06:09 UTC
As comment 11 and comment 12, set bug's status to "VERIFIED".

Comment 18 errata-xmlrpc 2022-05-17 12:23:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (new packages: qemu-kvm), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:2307