Created attachment 1678938 [details] gdb_debug_info_ppc64le-04152020 Description of problem: Boot up a guest, then start to create/add disk image immediately with blockdev-create/blockdev-add during the guest booting stage with a script. Then qemu (core dumped): [root@ibm-p8-garrison-05 ngu]# sh vm1.sh QEMU 4.2.92 monitor - type 'help' for more information (qemu) qemu-kvm: /builddir/build/BUILD/qemu-5.0.0-rc2/block/block-backend.c:1968: blk_get_aio_context: Assertion `ctx == blk->ctx' failed. vm1.sh: line 26: 241948 Aborted (core dumped) /usr/libexec/qemu-kvm -name 'avocado-vt-vm1' -sandbox on -machine pseries,max-cpu-compat=power8 -nodefaults -device VGA,bus=pci.0,addr=0x2 -m 1024 -smp 8,maxcpus=8,cores=4,threads=1,sockets=2 -cpu 'host' -chardev socket,id=qmp_id_qmpmonitor1,server,nowait,path=/var/tmp/avocado_1 -mon chardev=qmp_id_qmpmonitor1,mode=control -chardev socket,id=chardev_serial0,server,nowait,path=/var/tmp/avocado_2 -device spapr-vty,id=serial0,reg=0x30000000,chardev=chardev_serial0 -device qemu-xhci,id=usb1,bus=pci.0,addr=0x3 -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4 -blockdev node-name=file_image1,driver=file,aio=threads,filename=/home/ngu/rhel820-ppc64le-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off -blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 -device scsi-hd,id=image1,drive=drive_image1,write-cache=on -device virtio-net-pci,mac=9a:fa:14:5e:3d:13,id=idLviSMj,netdev=idlrY22o,bus=pci.0,addr=0x5 -netdev tap,id=idlrY22o,vhost=on -vnc :0 -rtc base=utc,clock=host -boot menu=off,order=cdn,once=c,strict=off -enable-kvm -monitor stdio Version-Release number of selected component (if applicable): Host kernel: kernel-4.18.0-193.6.el8.ppc64le Qemu: qemu-kvm-5.0.0-0.scrmod+el8.2.0+6253+83a14d38.200408.ppc64le Guest kernel: kernel-4.18.0-193.el8.ppc64le How reproducible: 4/5 Steps to Reproduce: 1. Boot up a guest with following cmd: /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox on \ -machine pseries,max-cpu-compat=power8 \ -nodefaults \ -device VGA,bus=pci.0,addr=0x2 \ -m 1024 \ -smp 8,maxcpus=8,cores=4,threads=1,sockets=2 \ -cpu 'host' \ -chardev socket,id=qmp_id_qmpmonitor1,server,nowait,path=/var/tmp/avocado_1 \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -chardev socket,id=chardev_serial0,server,nowait,path=/var/tmp/avocado_2 \ -device spapr-vty,id=serial0,reg=0x30000000,chardev=chardev_serial0 \ -device qemu-xhci,id=usb1,bus=pci.0,addr=0x3 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4 \ -blockdev node-name=file_image1,driver=file,aio=threads,filename=/home/ngu/rhel820-ppc64le-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 \ -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \ -device virtio-net-pci,mac=9a:fa:14:5e:3d:13,id=idLviSMj,netdev=idlrY22o,bus=pci.0,addr=0x5 \ -netdev tap,id=idlrY22o,vhost=on \ -vnc :0 \ -rtc base=utc,clock=host \ -boot menu=off,order=cdn,once=c,strict=off \ -enable-kvm \ -monitor stdio 2. Run following script immediately to actively create the disk image and then hot plug it: #/bin/bash for i in {1..4} do date echo "####create image sn$i####" echo -e "{'execute':'qmp_capabilities'} {'execute': 'blockdev-create', 'arguments': {'options': {'driver': 'file', 'filename': '/home/ngu/sn$i.qcow2', 'size': 21474836480}, 'job-id': 'file_sn$i'}}" | nc -U /var/tmp/avocado_1 echo -e "{'execute':'qmp_capabilities'} {'execute': 'job-dismiss', 'arguments': {'id': 'file_sn$i'}}" | nc -U /var/tmp/avocado_1 echo -e "{'execute':'qmp_capabilities'} {'execute': 'blockdev-add', 'arguments': {'node-name': 'file_sn$i', 'driver': 'file', 'filename': '/home/ngu/sn$i.qcow2', 'aio': 'threads'}}" | nc -U /var/tmp/avocado_1 echo -e "{'execute':'qmp_capabilities'} {'execute': 'blockdev-create', 'arguments': {'options': {'driver': 'qcow2', 'file': 'file_sn$i', 'size': 21474836480}, 'job-id': 'drive_sn$i'}}" | nc -U /var/tmp/avocado_1 sleep 1 echo -e "{'execute':'qmp_capabilities'} {'execute': 'job-dismiss', 'arguments': {'id': 'drive_sn$i'}}" | nc -U /var/tmp/avocado_1 echo -e "{'execute':'qmp_capabilities'} {'execute': 'blockdev-add', 'arguments': {'node-name': 'drive_sn$i', 'driver': 'qcow2', 'file': 'file_sn$i'}}" | nc -U /var/tmp/avocado_1 done Actual results: Qemu core dumped as in the description part. Expected results: The qemu guest run normally and can boot up without any problem. Additional info: Met the bug also on the latest RHELAV 8.2 qemu: qemu-kvm-4.2.0-19.module+el8.2.0+6296+6b821950
HI Aihua, please help to try the bug on x86, if so, we can change the Hardware to 'All' and you can take the bug. Thanks in advance. I found the bug when run following case in avocado: blockdev_commit_reboot This is the log: http://10.0.136.47/ngu/04142020-blockdevcommitboot/results.html
If successfully creating/adding the disk images, followed creating snapshot on them might induce the guest failed to boot up and enter grub. I met the problem 16/30 times in the avocado test http://10.0.136.47/ngu/04142020-blockdevcommitboot/results.html, please check the 'serial-serial0-avocado-vt-vm1.log'. 2020-04-13 10:01:21: Trying to load: from: /pci@800000020000000/scsi@4/disk@100000000000000 ... 2020-04-13 10:01:21: Successfully loaded 2020-04-13 10:01:42: SCSI-DISK: Failed to get disk capacity! 2020-04-13 10:02:02: SCSI-DISK: Failed to get disk capacity! 2020-04-13 10:02:02: error: ../../grub-core/kern/disk.c:258:no such partition. 2020-04-13 10:02:02: Entering rescue mode... 2020-04-13 10:02:02: grub rescue> 2020-04-13 10:05:44: 2020-04-13 10:05:44: grub rescue>
Hi,ngu Run blockdev_commit_reboot test by auto with virtio_blk for 120 times, don't hit this issue.
(In reply to aihua liang from comment #3) > Hi,ngu > > Run blockdev_commit_reboot test by auto with virtio_blk for 120 times, > don't hit this issue. Sorry, the wrong info,correct it as: Run blockdev_commit_reboot test by auto with virtio_scsi for 120 times on x86_64, don't hit this issue.
(In reply to aihua liang from comment #4) > (In reply to aihua liang from comment #3) > > Hi,ngu > > > > Run blockdev_commit_reboot test by auto with virtio_blk for 120 times, > > don't hit this issue. > > Sorry, the wrong info,correct it as: > Run blockdev_commit_reboot test by auto with virtio_scsi for 120 times on > x86_64, don't hit this issue. It's very odd. Set the hardware to ppc64le temporarily.
Is this a regression from the qemu-4.2 based qemu-kvm in RHEL-AV-8.2?
Test it on qemu-kvm-4.2.0-19.module+el8.2.0+6296+6b821950 on x86, still not hit this issue.
(In reply to David Gibson from comment #6) > Is this a regression from the qemu-4.2 based qemu-kvm in RHEL-AV-8.2? I could reproduce it on qemu-4.1 although the reproduce rate is much lower, I have only reproduced it 1 out of 20 times with the way in bug description part. [root@ibm-p9wr-04 ngu]# sh vm1.sh QEMU 4.1.0 monitor - type 'help' for more information (qemu) qemu-kvm: block/block-backend.c:1862: blk_get_aio_context: Assertion `ctx == blk->ctx' failed. vm1.sh: line 26: 248157 Aborted (core dumped) /usr/libexec/qemu-kvm -name 'avocado-vt-vm1' -sandbox on -machine pseries,max-cpu-compat=power8 -nodefaults -device VGA,bus=pci.0,addr=0x2 -m 1024 -smp 8,maxcpus=8,cores=4,threads=1,sockets=2 -cpu 'host' -chardev socket,id=qmp_id_qmpmonitor1,server,nowait,path=/var/tmp/avocado_1 -mon chardev=qmp_id_qmpmonitor1,mode=control -chardev socket,id=chardev_serial0,server,nowait,path=/var/tmp/avocado_2 -device spapr-vty,id=serial0,reg=0x30000000,chardev=chardev_serial0 -device qemu-xhci,id=usb1,bus=pci.0,addr=0x3 -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4 -blockdev node-name=file_image1,driver=file,aio=threads,filename=/home/ngu/rhel830-ppc64le-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off -blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 -device scsi-hd,id=image1,drive=drive_image1,write-cache=on -device virtio-net-pci,mac=9a:fa:14:5e:3d:13,id=idLviSMj,netdev=idlrY22o,bus=pci.0,addr=0x5 -netdev tap,id=idlrY22o,vhost=on -vnc :0 -rtc base=utc,clock=host -boot menu=off,order=cdn,once=c,strict=off -enable-kvm -monitor stdio [root@ibm-p9wr-04 ngu]# Host kernel: kernel-4.18.0-193.8.el8.ppc64le Qemu: qemu-kvm-4.1.0-23.module+el8.1.1+6238+f5d69f68.3.ppc64le
Created attachment 1680407 [details] gdb_debug_info_ppc64le-qemu4.1-04212020 I could reproduce the bug in qemu4.1 multiple times with a faster script.
I'm not sure, but this might be related to a qemu/SLOF version mismatch. Can you retest with Mirek's test SLOF package from http://batcave.lab.eng.brq.redhat.com/repos/test/SLOF-8.3-5.3/
(In reply to David Gibson from comment #10) > I'm not sure, but this might be related to a qemu/SLOF version mismatch. > Can you retest with Mirek's test SLOF package from > http://batcave.lab.eng.brq.redhat.com/repos/test/SLOF-8.3-5.3/ I don't think so, when I did test on qemu-kvm-4.2.0-10.module+el8.2.0+5740+c3dff59e.ppc64le and qemu-kvm-4.1.0-23.module+el8.1.1+6238+f5d69f68.3.ppc64le, I also updated SLOF to corresponding version in the same virt module. Also I have a try on qemu-kvm-5.0.0-0.scrmod+el8.3.0+6312+cee4f348 with Mirek's SLOF, the bug could be reproduced.
ngu, sorry I misread and didn't see that you had also reproduced with qemu-4.2. From the traces there isn't anything obviously ppc related about this. Can you reproduce it on x86?
(In reply to David Gibson from comment #12) > ngu, sorry I misread and didn't see that you had also reproduced with > qemu-4.2. > > From the traces there isn't anything obviously ppc related about this. Can > you reproduce it on x86? David, yes, I failed to reproduce it after several tries, it's really weird. From comment #7, the x86 feature owner Aihua failed to reproduce it there; and I had ever had a try on her machine with my local script, it's also a failure.
I could reproduce it with upstream QEMU on POWER. It seems there's a race between the QMP commands and a blk_drain_all() triggered by the guest. Maybe x86 doesn't cause the same race to happen ? I've worked in this area on a similar issue in the past (f45280cbf66d "block: fix QEMU crash with scsi-hd and drive_del") so I guess I can have a look.
Posted a patch upstream. Now in maintainer tree: https://repo.or.cz/qemu/kevin.git/commit/5463edf7b4fdc27d2b2d745d7f8c9fddb495d140
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt:8.3 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:5137