Bug 1888344 - VM with IO threads enabled crashes randomly during snapshot operations
Summary: VM with IO threads enabled crashes randomly during snapshot operations
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: 8.2
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Sergio Lopez
QA Contact: aihua liang
URL:
Whiteboard:
: 1967895 (view as bug list)
Depends On:
Blocks: 1897025 1973828 1985906
TreeView+ depends on / blocked
 
Reported: 2020-10-14 16:00 UTC by Anitha Udgiri
Modified: 2023-12-15 19:47 UTC (History)
12 users (show)

Fixed In Version: qemu-kvm-6.0.0-17.module+el8.5.0+11173+c9fce0bb
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1973828 (view as bug list)
Environment:
Last Closed: 2021-11-16 07:51:01 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2021:4684 0 None None None 2021-11-16 07:51:22 UTC

Comment 3 aihua liang 2020-10-15 12:12:23 UTC
Test on qemu-kvm-4.2.0-29.module+el8.2.1+7712+3c3fe332.2 for 20 times, can't reproduce this issue.

Comment 4 aihua liang 2020-10-15 12:18:25 UTC
Details:
Create snapshot chain with length 11 and iothread enable by auto test, can't hit this issue.
...
-object iothread,id=iothread0 \
    -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-2,addr=0x0,iothread=iothread0 \
    -blockdev node-name=file_image1,driver=file,aio=threads,filename=/home/kvm_autotest_root/images/rhel821-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device scsi-hd,id=image1,drive=drive_image1,w{'execute': 'blockdev-snapshot', 'arguments': {'node': 'drive_sn8', 'overlay': 'drive_sn9'}, 'id': 'sXpwrBBw'}
rite-cache=on \
    -blockdev node-name=file_sn10,driver=file,aio=threads,filename=/var/lib/avocado/data/avocado-vt/sn10.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_sn10,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_sn10 \
    -device scsi-hd,id=sn10,drive=drive_sn10,write-cache=on \
...
{'execute': 'blockdev-snapshot', 'arguments': {'node': 'drive_data1', 'overlay': 'drive_sn0'}, 'id': 'wuOhlHNJ'}
{'execute': 'blockdev-snapshot', 'arguments': {'node': 'drive_sn0', 'overlay': 'drive_sn1'}, 'id': 'PgAFMAuP'}
{'execute': 'blockdev-snapshot', 'arguments': {'node': 'drive_sn1', 'overlay': 'drive_sn2'}, 'id': 'uxVo0BLz'}
{'execute': 'blockdev-snapshot', 'arguments': {'node': 'drive_sn2', 'overlay': 'drive_sn3'}, 'id': 'euSdVYL4'}
{'execute': 'blockdev-snapshot', 'arguments': {'node': 'drive_sn3', 'overlay': 'drive_sn4'}, 'id': 'nCzebCYL'}
{'execute': 'blockdev-snapshot', 'arguments': {'node': 'drive_sn4', 'overlay': 'drive_sn5'}, 'id': 'VIifEExP'}
{'execute': 'blockdev-snapshot', 'arguments': {'node': 'drive_sn5', 'overlay': 'drive_sn6'}, 'id': 'BsvGDV6o'}
{'execute': 'blockdev-snapshot', 'arguments': {'node': 'drive_sn6', 'overlay': 'drive_sn7'}, 'id': 'iwMUuaNx'}
{'execute': 'blockdev-snapshot', 'arguments': {'node': 'drive_sn7', 'overlay': 'drive_sn8'}, 'id': '5rZQu3wq'}
{'execute': 'blockdev-snapshot', 'arguments': {'node': 'drive_sn8', 'overlay': 'drive_sn9'}, 'id': 'sXpwrBBw'}
{'execute': 'blockdev-snapshot', 'arguments': {'node': 'drive_sn9', 'overlay': 'drive_sn10'}, 'id': 'pf0xpTfc'}

(01/20) repeat1.Host_RHEL.m8.u2.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.1.x86_64.io-github-autotest-qemu.blockdev_snapshot_chains.q35: PASS (168.90 s)
 (02/20) repeat2.Host_RHEL.m8.u2.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.1.x86_64.io-github-autotest-qemu.blockdev_snapshot_chains.q35: PASS (171.83 s)
 (03/20) repeat3.Host_RHEL.m8.u2.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.1.x86_64.io-github-autotest-qemu.blockdev_snapshot_chains.q35: PASS (165.35 s)
 (04/20) repeat4.Host_RHEL.m8.u2.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.1.x86_64.io-github-autotest-qemu.blockdev_snapshot_chains.q35: PASS (164.45 s)
 (05/20) repeat5.Host_RHEL.m8.u2.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.1.x86_64.io-github-autotest-qemu.blockdev_snapshot_chains.q35: PASS (164.74 s)
 (06/20) repeat6.Host_RHEL.m8.u2.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.1.x86_64.io-github-autotest-qemu.blockdev_snapshot_chains.q35: PASS (166.24 s)
 (07/20) repeat7.Host_RHEL.m8.u2.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.1.x86_64.io-github-autotest-qemu.blockdev_snapshot_chains.q35: PASS (171.87 s)
 (08/20) repeat8.Host_RHEL.m8.u2.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.1.x86_64.io-github-autotest-qemu.blockdev_snapshot_chains.q35: PASS (173.18 s)
 (09/20) repeat9.Host_RHEL.m8.u2.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.1.x86_64.io-github-autotest-qemu.blockdev_snapshot_chains.q35: PASS (162.99 s)
 (10/20) repeat10.Host_RHEL.m8.u2.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.1.x86_64.io-github-autotest-qemu.blockdev_snapshot_chains.q35: PASS (165.92 s)
 (11/20) repeat11.Host_RHEL.m8.u2.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.1.x86_64.io-github-autotest-qemu.blockdev_snapshot_chains.q35: PASS (168.26 s)
 (12/20) repeat12.Host_RHEL.m8.u2.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.1.x86_64.io-github-autotest-qemu.blockdev_snapshot_chains.q35: PASS (167.72 s)
 (13/20) repeat13.Host_RHEL.m8.u2.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.1.x86_64.io-github-autotest-qemu.blockdev_snapshot_chains.q35: PASS (170.33 s)
 (14/20) repeat14.Host_RHEL.m8.u2.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.1.x86_64.io-github-autotest-qemu.blockdev_snapshot_chains.q35: PASS (169.58 s)
 (15/20) repeat15.Host_RHEL.m8.u2.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.1.x86_64.io-github-autotest-qemu.blockdev_snapshot_chains.q35: PASS (171.56 s)
 (16/20) repeat16.Host_RHEL.m8.u2.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.1.x86_64.io-github-autotest-qemu.blockdev_snapshot_chains.q35: PASS (173.00 s)
 (17/20) repeat17.Host_RHEL.m8.u2.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.1.x86_64.io-github-autotest-qemu.blockdev_snapshot_chains.q35: PASS (170.65 s)
 (18/20) repeat18.Host_RHEL.m8.u2.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.1.x86_64.io-github-autotest-qemu.blockdev_snapshot_chains.q35: PASS (173.43 s)
 (19/20) repeat19.Host_RHEL.m8.u2.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.1.x86_64.io-github-autotest-qemu.blockdev_snapshot_chains.q35: PASS (167.46 s)
 (20/20) repeat20.Host_RHEL.m8.u2.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.1.x86_64.io-github-autotest-qemu.blockdev_snapshot_chains.q35: PASS (164.03 s)

Comment 5 John Ferlan 2020-10-15 16:14:32 UTC
Fixing the product to RHEL Advanced Virtualization

Comment 6 Marina Kalinin 2020-11-09 22:27:35 UTC
To clarify, we are talking about RHV snapshots operations, not RHEL-KVM snapshots. 
It is when RHV takes the qcow2 image and creates another one and links the two under same vm disk, if to explain in one sentence.

Comment 7 Marina Kalinin 2020-11-09 22:30:16 UTC
Sergio, anything else we need from the customer to help reproducing this or maybe something to deploy on customer's site? Or you have enough information via the coredump?

Comment 8 CongLi 2020-11-10 02:29:09 UTC
(In reply to Marina Kalinin from comment #6)
> To clarify, we are talking about RHV snapshots operations, not RHEL-KVM
> snapshots. 
> It is when RHV takes the qcow2 image and creates another one and links the
> two under same vm disk, if to explain in one sentence.

Hi Niny,

Could you please have a try?

Thanks.

Comment 17 aihua liang 2021-05-10 11:35:06 UTC
Try to reproduce it again, still fail to reproduct it.

Test env:
 kernel version:4.18.0-193.el8.x86_64
 qemu-kvm version:qemu-kvm-4.2.0-29.module+el8.2.1+7712+3c3fe332.2
 Guest: RHEL7.6
 Machine type: pc
 Driveformat: virtio_blk
 Data plane: enable

Test Steps:
 1. Start guest with qemu cmd:
     /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -machine pc-i440fx-rhel7.6.0  \
    -nodefaults \
    -device VGA,bus=pci.0,addr=0x2 \
    -m 30720  \
    -smp 10,maxcpus=10,cores=5,threads=1,dies=1,sockets=2  \
    -cpu 'Cascadelake-Server',+kvm_pv_unhalt \
    -chardev socket,server=on,id=qmp_id_qmpmonitor1,wait=off,path=/tmp/monitor-qmpmonitor1-20210510-045626-P2QqGJ5E  \
    -mon chardev=qmp_id_qmpmonitor1,mode=control \
    -chardev socket,server=on,id=qmp_id_catch_monitor,wait=off,path=/tmp/monitor-catch_monitor-20210510-045626-P2QqGJ5E  \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idcKGxjw \
    -chardev socket,server=on,id=chardev_serial0,wait=off,path=/tmp/serial-serial0-20210510-045626-P2QqGJ5E \
    -device isa-serial,id=serial0,chardev=chardev_serial0  \
    -chardev socket,id=seabioslog_id_20210510-045626-P2QqGJ5E,path=/tmp/seabios-20210510-045626-P2QqGJ5E,server=on,wait=off \
    -device isa-debugcon,chardev=seabioslog_id_20210510-045626-P2QqGJ5E,iobase=0x402 \
    -device qemu-xhci,id=usb1,bus=pci.0,addr=0x3 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -object iothread,id=iothread0 \
    -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/rhel76-64-virtio.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,write-cache=on,bus=pci.0,addr=0x4,iothread=iothread0,scsi=off,write-cache=on \
    -device virtio-net-pci,mac=9a:0a:a8:f0:c5:08,id=id2V3RTk,netdev=id0s6sUn,bus=pci.0,addr=0x6  \
    -netdev tap,id=id0s6sUn,vhost=on  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot menu=off,order=cdn,once=c,strict=off \
    -enable-kvm \
    -monitor stdio \
    -qmp tcp:0:3000,server,nowait \

  2. Create a snapshot node, and do snapshot.
      #create snapshot node
     {'execute':'qmp_capabilities' }
     {'execute':'blockdev-create','arguments':{'options': {'driver':'file','filename':'/root/sn1','size':21474836480},'job-id':'job1'}}
     {'execute':'blockdev-add','arguments':{'driver':'file','node-name':'drive_sn1','filename':'/root/sn1'}}
     {'execute':'blockdev-create','arguments':{'options': {'driver': 'qcow2','file':'drive_sn1','size':21474836480,'backing-file':'/home/kvm_autotest_root/images/rhel76-64-virtio.qcow2','backing-fmt':'qcow2'},'job-id':'job2'}}
     {'execute':'blockdev-add','arguments':{'driver':'qcow2','node-name':'sn1','file':'drive_sn1','backing':null}}
     {'execute':'job-dismiss','arguments':{'id':'job1'}}
     {'execute':'job-dismiss','arguments':{'id':'job2'}}
     {'execute':'blockdev-snapshot','arguments':{'node':'drive_image1','overlay':'sn1'}}

  3. Create target snapshot image and backing image.
     #qemu-img create -f qcow2 mirror.qcow2 20G
     #qemu-img create -f qcow2 mirror_sn.qcow2 -b mirror.qcow2 20G

  4. DD file in guest.
     #dd if=/dev/urandom of=test bs=1M count=1000
 
  5. Add target snapshot mirror node and do mirror with sync=top.
     {'execute':'blockdev-add','arguments':{'driver':'file','node-name':'drive_mirror_sn','filename':'/home/mirror_sn.qcow2'}}
     {'execute':'blockdev-add','arguments':{'driver':'qcow2','node-name':'mirror_sn','file':'drive_mirror_sn','backing':null}}
     {"execute": "blockdev-mirror", "arguments": {"sync": "top", "device": "sn1","target":"mirror_sn","job-id": "j1"}}
     {"timestamp": {"seconds": 1620645242, "microseconds": 859775}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "j1"}}
{"timestamp": {"seconds": 1620645242, "microseconds": 859855}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "j1"}}
{"return": {}}
{"timestamp": {"seconds": 1620645242, "microseconds": 884363}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "j1"}}
{"timestamp": {"seconds": 1620645242, "microseconds": 884430}, "event": "BLOCK_JOB_READY", "data": {"device": "j1", "len": 21757952, "offset": 21757952, "speed": 0, "type": "mirror"}}

 6. Convert backing image from src to target.
    #qemu-img convert /home/kvm_autotest_root/images/rhel76-64-virtio.qcow2 /home/mirror.qcow2 -O qcow2

 7. Do snapshot in dst.
    {'execute':'blockdev-add','arguments':{'driver':'file','node-name':'drive_mirror','filename':'/home/mirror.qcow2'}}
    {'execute':'blockdev-add','arguments':{'driver':'qcow2','node-name':'mirror','file':'drive_mirror'}}
    {"execute":"blockdev-snapshot","arguments":{"node":"mirror","overlay":"mirror_sn"}}

 8. Complete mirror job
    {"execute":"job-complete","arguments":{"id":"j1"}}
{"return": {}}
{"timestamp": {"seconds": 1620645305, "microseconds": 139017}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "j1"}}
{"timestamp": {"seconds": 1620645305, "microseconds": 139062}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "j1"}}
{"timestamp": {"seconds": 1620645305, "microseconds": 139161}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "j1", "len": 22085632, "offset": 22085632, "speed": 0, "type": "mirror"}}
{"timestamp": {"seconds": 1620645305, "microseconds": 139201}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "j1"}}
{"timestamp": {"seconds": 1620645305, "microseconds": 139230}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "j1"}}

 9. Check block info
   (qemu)info block
mirror_sn: json:{"backing": {"driver": "qcow2", "file": {"driver": "file", "filename": "/home/mirror.qcow2"}}, "driver": "qcow2", "file": {"driver": "file", "filename": "/home/mirror_sn.qcow2"}} (qcow2)
    Attached to:      /machine/peripheral/image1/virtio-backend
    Cache mode:       writeback
    Backing file:     /home/mirror.qcow2 (chain depth: 1)

 10. Quit vm
    (qemu)system_powerdown

After step10, vm shutdown successfully.

Repeat 10 times from step1 to step10, can't reproduce this issue.


Hi, Sergio
 Can you help to check if the reproduce scenario is correct, thanks.

And Hi, Anitha 
 Do we have updates from customer now? Is this case still active? Thanks. I noticed the issue in customer portal is closed.

Comment 19 aihua liang 2021-06-08 07:34:50 UTC
*** Bug 1967895 has been marked as a duplicate of this bug. ***

Comment 26 aihua liang 2021-06-28 14:51:36 UTC
Run regression test on qemu-kvm-6.0.0-21.module+el8.5.0+11555+e0ab0d09, all test pass.

Logs link:
  1. virtio_blk+iothread+storage_vm_migration_blockdev:
       http://virtqetools.lab.eng.pek2.redhat.com/kvm_autotest_job_log/?jobid=5505620
       
  2. virtio_scsi+iothread+storage_vm_migration_blockdev:
       http://virtqetools.lab.eng.pek2.redhat.com/kvm_autotest_job_log/?jobid=5500885

Comment 27 aihua liang 2021-06-28 14:52:34 UTC
As comment 26, set bug's status to "Verified".

Comment 30 errata-xmlrpc 2021-11-16 07:51:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4684


Note You need to log in before you can comment on or make changes to this bug.