Bug 1888344

Summary: VM with IO threads enabled crashes randomly during snapshot operations
Product: Red Hat Enterprise Linux Advanced Virtualization Reporter: Anitha Udgiri <audgiri>
Component: qemu-kvmAssignee: Sergio Lopez <slopezpa>
qemu-kvm sub component: Block Jobs QA Contact: aihua liang <aliang>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: aliang, coli, jinzhao, juzhang, mjankula, mkalinin, ngu, qzhang, rhodain, slopezpa, virt-maint, ymankad
Version: 8.2Keywords: TestOnly, Triaged, ZStream
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-6.0.0-17.module+el8.5.0+11173+c9fce0bb Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1973828 (view as bug list) Environment:
Last Closed: 2021-11-16 07:51:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1897025, 1973828, 1985906    

Comment 3 aihua liang 2020-10-15 12:12:23 UTC
Test on qemu-kvm-4.2.0-29.module+el8.2.1+7712+3c3fe332.2 for 20 times, can't reproduce this issue.

Comment 4 aihua liang 2020-10-15 12:18:25 UTC
Details:
Create snapshot chain with length 11 and iothread enable by auto test, can't hit this issue.
...
-object iothread,id=iothread0 \
    -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-2,addr=0x0,iothread=iothread0 \
    -blockdev node-name=file_image1,driver=file,aio=threads,filename=/home/kvm_autotest_root/images/rhel821-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device scsi-hd,id=image1,drive=drive_image1,w{'execute': 'blockdev-snapshot', 'arguments': {'node': 'drive_sn8', 'overlay': 'drive_sn9'}, 'id': 'sXpwrBBw'}
rite-cache=on \
    -blockdev node-name=file_sn10,driver=file,aio=threads,filename=/var/lib/avocado/data/avocado-vt/sn10.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_sn10,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_sn10 \
    -device scsi-hd,id=sn10,drive=drive_sn10,write-cache=on \
...
{'execute': 'blockdev-snapshot', 'arguments': {'node': 'drive_data1', 'overlay': 'drive_sn0'}, 'id': 'wuOhlHNJ'}
{'execute': 'blockdev-snapshot', 'arguments': {'node': 'drive_sn0', 'overlay': 'drive_sn1'}, 'id': 'PgAFMAuP'}
{'execute': 'blockdev-snapshot', 'arguments': {'node': 'drive_sn1', 'overlay': 'drive_sn2'}, 'id': 'uxVo0BLz'}
{'execute': 'blockdev-snapshot', 'arguments': {'node': 'drive_sn2', 'overlay': 'drive_sn3'}, 'id': 'euSdVYL4'}
{'execute': 'blockdev-snapshot', 'arguments': {'node': 'drive_sn3', 'overlay': 'drive_sn4'}, 'id': 'nCzebCYL'}
{'execute': 'blockdev-snapshot', 'arguments': {'node': 'drive_sn4', 'overlay': 'drive_sn5'}, 'id': 'VIifEExP'}
{'execute': 'blockdev-snapshot', 'arguments': {'node': 'drive_sn5', 'overlay': 'drive_sn6'}, 'id': 'BsvGDV6o'}
{'execute': 'blockdev-snapshot', 'arguments': {'node': 'drive_sn6', 'overlay': 'drive_sn7'}, 'id': 'iwMUuaNx'}
{'execute': 'blockdev-snapshot', 'arguments': {'node': 'drive_sn7', 'overlay': 'drive_sn8'}, 'id': '5rZQu3wq'}
{'execute': 'blockdev-snapshot', 'arguments': {'node': 'drive_sn8', 'overlay': 'drive_sn9'}, 'id': 'sXpwrBBw'}
{'execute': 'blockdev-snapshot', 'arguments': {'node': 'drive_sn9', 'overlay': 'drive_sn10'}, 'id': 'pf0xpTfc'}

(01/20) repeat1.Host_RHEL.m8.u2.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.1.x86_64.io-github-autotest-qemu.blockdev_snapshot_chains.q35: PASS (168.90 s)
 (02/20) repeat2.Host_RHEL.m8.u2.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.1.x86_64.io-github-autotest-qemu.blockdev_snapshot_chains.q35: PASS (171.83 s)
 (03/20) repeat3.Host_RHEL.m8.u2.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.1.x86_64.io-github-autotest-qemu.blockdev_snapshot_chains.q35: PASS (165.35 s)
 (04/20) repeat4.Host_RHEL.m8.u2.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.1.x86_64.io-github-autotest-qemu.blockdev_snapshot_chains.q35: PASS (164.45 s)
 (05/20) repeat5.Host_RHEL.m8.u2.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.1.x86_64.io-github-autotest-qemu.blockdev_snapshot_chains.q35: PASS (164.74 s)
 (06/20) repeat6.Host_RHEL.m8.u2.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.1.x86_64.io-github-autotest-qemu.blockdev_snapshot_chains.q35: PASS (166.24 s)
 (07/20) repeat7.Host_RHEL.m8.u2.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.1.x86_64.io-github-autotest-qemu.blockdev_snapshot_chains.q35: PASS (171.87 s)
 (08/20) repeat8.Host_RHEL.m8.u2.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.1.x86_64.io-github-autotest-qemu.blockdev_snapshot_chains.q35: PASS (173.18 s)
 (09/20) repeat9.Host_RHEL.m8.u2.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.1.x86_64.io-github-autotest-qemu.blockdev_snapshot_chains.q35: PASS (162.99 s)
 (10/20) repeat10.Host_RHEL.m8.u2.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.1.x86_64.io-github-autotest-qemu.blockdev_snapshot_chains.q35: PASS (165.92 s)
 (11/20) repeat11.Host_RHEL.m8.u2.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.1.x86_64.io-github-autotest-qemu.blockdev_snapshot_chains.q35: PASS (168.26 s)
 (12/20) repeat12.Host_RHEL.m8.u2.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.1.x86_64.io-github-autotest-qemu.blockdev_snapshot_chains.q35: PASS (167.72 s)
 (13/20) repeat13.Host_RHEL.m8.u2.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.1.x86_64.io-github-autotest-qemu.blockdev_snapshot_chains.q35: PASS (170.33 s)
 (14/20) repeat14.Host_RHEL.m8.u2.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.1.x86_64.io-github-autotest-qemu.blockdev_snapshot_chains.q35: PASS (169.58 s)
 (15/20) repeat15.Host_RHEL.m8.u2.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.1.x86_64.io-github-autotest-qemu.blockdev_snapshot_chains.q35: PASS (171.56 s)
 (16/20) repeat16.Host_RHEL.m8.u2.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.1.x86_64.io-github-autotest-qemu.blockdev_snapshot_chains.q35: PASS (173.00 s)
 (17/20) repeat17.Host_RHEL.m8.u2.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.1.x86_64.io-github-autotest-qemu.blockdev_snapshot_chains.q35: PASS (170.65 s)
 (18/20) repeat18.Host_RHEL.m8.u2.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.1.x86_64.io-github-autotest-qemu.blockdev_snapshot_chains.q35: PASS (173.43 s)
 (19/20) repeat19.Host_RHEL.m8.u2.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.1.x86_64.io-github-autotest-qemu.blockdev_snapshot_chains.q35: PASS (167.46 s)
 (20/20) repeat20.Host_RHEL.m8.u2.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.2.1.x86_64.io-github-autotest-qemu.blockdev_snapshot_chains.q35: PASS (164.03 s)

Comment 5 John Ferlan 2020-10-15 16:14:32 UTC
Fixing the product to RHEL Advanced Virtualization

Comment 6 Marina Kalinin 2020-11-09 22:27:35 UTC
To clarify, we are talking about RHV snapshots operations, not RHEL-KVM snapshots. 
It is when RHV takes the qcow2 image and creates another one and links the two under same vm disk, if to explain in one sentence.

Comment 7 Marina Kalinin 2020-11-09 22:30:16 UTC
Sergio, anything else we need from the customer to help reproducing this or maybe something to deploy on customer's site? Or you have enough information via the coredump?

Comment 8 CongLi 2020-11-10 02:29:09 UTC
(In reply to Marina Kalinin from comment #6)
> To clarify, we are talking about RHV snapshots operations, not RHEL-KVM
> snapshots. 
> It is when RHV takes the qcow2 image and creates another one and links the
> two under same vm disk, if to explain in one sentence.

Hi Niny,

Could you please have a try?

Thanks.

Comment 17 aihua liang 2021-05-10 11:35:06 UTC
Try to reproduce it again, still fail to reproduct it.

Test env:
 kernel version:4.18.0-193.el8.x86_64
 qemu-kvm version:qemu-kvm-4.2.0-29.module+el8.2.1+7712+3c3fe332.2
 Guest: RHEL7.6
 Machine type: pc
 Driveformat: virtio_blk
 Data plane: enable

Test Steps:
 1. Start guest with qemu cmd:
     /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -machine pc-i440fx-rhel7.6.0  \
    -nodefaults \
    -device VGA,bus=pci.0,addr=0x2 \
    -m 30720  \
    -smp 10,maxcpus=10,cores=5,threads=1,dies=1,sockets=2  \
    -cpu 'Cascadelake-Server',+kvm_pv_unhalt \
    -chardev socket,server=on,id=qmp_id_qmpmonitor1,wait=off,path=/tmp/monitor-qmpmonitor1-20210510-045626-P2QqGJ5E  \
    -mon chardev=qmp_id_qmpmonitor1,mode=control \
    -chardev socket,server=on,id=qmp_id_catch_monitor,wait=off,path=/tmp/monitor-catch_monitor-20210510-045626-P2QqGJ5E  \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idcKGxjw \
    -chardev socket,server=on,id=chardev_serial0,wait=off,path=/tmp/serial-serial0-20210510-045626-P2QqGJ5E \
    -device isa-serial,id=serial0,chardev=chardev_serial0  \
    -chardev socket,id=seabioslog_id_20210510-045626-P2QqGJ5E,path=/tmp/seabios-20210510-045626-P2QqGJ5E,server=on,wait=off \
    -device isa-debugcon,chardev=seabioslog_id_20210510-045626-P2QqGJ5E,iobase=0x402 \
    -device qemu-xhci,id=usb1,bus=pci.0,addr=0x3 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -object iothread,id=iothread0 \
    -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/rhel76-64-virtio.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,write-cache=on,bus=pci.0,addr=0x4,iothread=iothread0,scsi=off,write-cache=on \
    -device virtio-net-pci,mac=9a:0a:a8:f0:c5:08,id=id2V3RTk,netdev=id0s6sUn,bus=pci.0,addr=0x6  \
    -netdev tap,id=id0s6sUn,vhost=on  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot menu=off,order=cdn,once=c,strict=off \
    -enable-kvm \
    -monitor stdio \
    -qmp tcp:0:3000,server,nowait \

  2. Create a snapshot node, and do snapshot.
      #create snapshot node
     {'execute':'qmp_capabilities' }
     {'execute':'blockdev-create','arguments':{'options': {'driver':'file','filename':'/root/sn1','size':21474836480},'job-id':'job1'}}
     {'execute':'blockdev-add','arguments':{'driver':'file','node-name':'drive_sn1','filename':'/root/sn1'}}
     {'execute':'blockdev-create','arguments':{'options': {'driver': 'qcow2','file':'drive_sn1','size':21474836480,'backing-file':'/home/kvm_autotest_root/images/rhel76-64-virtio.qcow2','backing-fmt':'qcow2'},'job-id':'job2'}}
     {'execute':'blockdev-add','arguments':{'driver':'qcow2','node-name':'sn1','file':'drive_sn1','backing':null}}
     {'execute':'job-dismiss','arguments':{'id':'job1'}}
     {'execute':'job-dismiss','arguments':{'id':'job2'}}
     {'execute':'blockdev-snapshot','arguments':{'node':'drive_image1','overlay':'sn1'}}

  3. Create target snapshot image and backing image.
     #qemu-img create -f qcow2 mirror.qcow2 20G
     #qemu-img create -f qcow2 mirror_sn.qcow2 -b mirror.qcow2 20G

  4. DD file in guest.
     #dd if=/dev/urandom of=test bs=1M count=1000
 
  5. Add target snapshot mirror node and do mirror with sync=top.
     {'execute':'blockdev-add','arguments':{'driver':'file','node-name':'drive_mirror_sn','filename':'/home/mirror_sn.qcow2'}}
     {'execute':'blockdev-add','arguments':{'driver':'qcow2','node-name':'mirror_sn','file':'drive_mirror_sn','backing':null}}
     {"execute": "blockdev-mirror", "arguments": {"sync": "top", "device": "sn1","target":"mirror_sn","job-id": "j1"}}
     {"timestamp": {"seconds": 1620645242, "microseconds": 859775}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "j1"}}
{"timestamp": {"seconds": 1620645242, "microseconds": 859855}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "j1"}}
{"return": {}}
{"timestamp": {"seconds": 1620645242, "microseconds": 884363}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "j1"}}
{"timestamp": {"seconds": 1620645242, "microseconds": 884430}, "event": "BLOCK_JOB_READY", "data": {"device": "j1", "len": 21757952, "offset": 21757952, "speed": 0, "type": "mirror"}}

 6. Convert backing image from src to target.
    #qemu-img convert /home/kvm_autotest_root/images/rhel76-64-virtio.qcow2 /home/mirror.qcow2 -O qcow2

 7. Do snapshot in dst.
    {'execute':'blockdev-add','arguments':{'driver':'file','node-name':'drive_mirror','filename':'/home/mirror.qcow2'}}
    {'execute':'blockdev-add','arguments':{'driver':'qcow2','node-name':'mirror','file':'drive_mirror'}}
    {"execute":"blockdev-snapshot","arguments":{"node":"mirror","overlay":"mirror_sn"}}

 8. Complete mirror job
    {"execute":"job-complete","arguments":{"id":"j1"}}
{"return": {}}
{"timestamp": {"seconds": 1620645305, "microseconds": 139017}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "j1"}}
{"timestamp": {"seconds": 1620645305, "microseconds": 139062}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "j1"}}
{"timestamp": {"seconds": 1620645305, "microseconds": 139161}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "j1", "len": 22085632, "offset": 22085632, "speed": 0, "type": "mirror"}}
{"timestamp": {"seconds": 1620645305, "microseconds": 139201}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "j1"}}
{"timestamp": {"seconds": 1620645305, "microseconds": 139230}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "j1"}}

 9. Check block info
   (qemu)info block
mirror_sn: json:{"backing": {"driver": "qcow2", "file": {"driver": "file", "filename": "/home/mirror.qcow2"}}, "driver": "qcow2", "file": {"driver": "file", "filename": "/home/mirror_sn.qcow2"}} (qcow2)
    Attached to:      /machine/peripheral/image1/virtio-backend
    Cache mode:       writeback
    Backing file:     /home/mirror.qcow2 (chain depth: 1)

 10. Quit vm
    (qemu)system_powerdown

After step10, vm shutdown successfully.

Repeat 10 times from step1 to step10, can't reproduce this issue.


Hi, Sergio
 Can you help to check if the reproduce scenario is correct, thanks.

And Hi, Anitha 
 Do we have updates from customer now? Is this case still active? Thanks. I noticed the issue in customer portal is closed.

Comment 19 aihua liang 2021-06-08 07:34:50 UTC
*** Bug 1967895 has been marked as a duplicate of this bug. ***

Comment 26 aihua liang 2021-06-28 14:51:36 UTC
Run regression test on qemu-kvm-6.0.0-21.module+el8.5.0+11555+e0ab0d09, all test pass.

Logs link:
  1. virtio_blk+iothread+storage_vm_migration_blockdev:
       http://virtqetools.lab.eng.pek2.redhat.com/kvm_autotest_job_log/?jobid=5505620
       
  2. virtio_scsi+iothread+storage_vm_migration_blockdev:
       http://virtqetools.lab.eng.pek2.redhat.com/kvm_autotest_job_log/?jobid=5500885

Comment 27 aihua liang 2021-06-28 14:52:34 UTC
As comment 26, set bug's status to "Verified".

Comment 30 errata-xmlrpc 2021-11-16 07:51:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4684