Bug 1609561
Summary: | qemu-kvm: Failed to lock byte and crashes while migration | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Polina <pagranat> | ||||
Component: | qemu-kvm-rhev | Assignee: | Jeff Cody <jcody> | ||||
Status: | CLOSED DUPLICATE | QA Contact: | CongLi <coli> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 7.5 | CC: | chayang, coli, hhan, juzhang, michal.skrivanek, michen, mzamazal, ngu, pkrempa, pvilayat, qzhang, tburke, virt-maint, xianwang, yuhuang | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2018-08-15 17:43:08 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1481022 | ||||||
Attachments: |
|
The VM has only one disk with a backing file (the second one is an empty cdrom): -drive if=none,id=drive-ua-18beca16-e731-4caa-b05d-f6f758930198,readonly=on,werror=report,rerror=report -device ide-cd,bus=ide.1,unit=0,drive=drive-ua-18beca16-e731-4caa-b05d-f6f758930198,id=ua-18beca16-e731-4caa-b05d-f6f758930198 -drive file=/rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_compute-ge-he-4-volume1/e0dc227a-3fc9-4454-adbd-e6d502167951/images/bb538f7d-8a8d-4aff-9086-b14366d13050/95b8ad11-5204-4057-ab5c-ccc0a51c5428,format=qcow2,if=none,id=drive-ua-bb538f7d-8a8d-4aff-9086-b14366d13050,serial=bb538f7d-8a8d-4aff-9086-b14366d13050,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-ua-bb538f7d-8a8d-4aff-9086-b14366d13050,id=ua-bb538f7d-8a8d-4aff-9086-b14366d13050,bootindex=1 As the file is using an local path it seems that the host is using the gluster volume via the glusterfs FUSE driver. QEMU then logs: 2018-07-29 11:25:44.598+0000: initiating migration Unexpected error in raw_apply_lock_bytes() at block/file-posix.c:642: 2018-07-29T11:25:47.638863Z qemu-kvm: Failed to lock byte 100 and aborts. Libvit then treats the VM as crashed. 2018-07-29 11:25:47.887+0000: shutting down, reason=crashed Reassigning to qemu. Not reproduced in libvirt. Version:libvirt-3.9.0-14.el7_5.6.x86_64 qemu-kvm-rhev-2.10.0-21.el7_5.4.x86_64 glusterfs-3.12.2-15.el7rhgs.x86_64 1. Prepare a glusterfs server 2. Mount the glusterfs to /mnt on migrate src and dst host 3. Prepare a VM with following xml: ... <iothreads>1</iothreads> ... <disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='threads' iothread='1'/> <source file='/mnt/A.qcow2'/> <backingStore/> <target dev='vda' bus='virtio'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x0d' function='0x0'/> </disk> <disk type='file' device='cdrom'> <driver error_policy='report'/> <target dev='hdc' bus='ide'/> <readonly/> <alias name='ide0-1-0'/> <address type='drive' controller='0' bus='1' target='0' unit='0'/> </disk> ... 4. Use iptable to block gluster connection # iptables -I INPUT -s $GLUSTER_SERVER -j DROP 5. Try to migrate the VM According to the following log of vdsm, I use --live --auto-converge to migrate the VM: migrationMethod='ONLINE', tunnelMigration='false', migrationDowntime='0', autoConverge='true', migrateCompressed='false' # virsh migrate fuse qemu+ssh://XX.XX.XX.XX/system --verbose --live --auto-converge error: internal error: qemu unexpectedly closed the monitor: 2018-07-31T05:04:03.953420Z qemu-kvm: load of migration failed: Input/output error Check qemu log, no 'crash' found. The qemu log of dst: 2018-07-31T05:04:03.621566Z qemu-kvm: -chardev pty,id=charserial0: char device redirected to /dev/pts/3 (label charserial0) 2018-07-31T05:04:03.953420Z qemu-kvm: load of migration failed: Input/output error 2018-07-31 05:04:03.974+0000: shutting down, reason=failed The qemu log of src: 2018-07-31 05:04:05.246+0000: initiating migration maybe you can try to block it only in the middle of migration? Tested this bug on: qemu-kvm-rhev-2.12.0-9.el7.x86_64 Steps: 1.run VM associated with gluster disk. -drive id=system_disk,if=none,snapshot=off,aio=threads,cache=writeback,format=qcow2,file=/mnt/gluster/rhel76-64-virtio-scsi.qcow2 \ -device virtio-blk-pci,id=system,drive=system_disk,bus=pci.0 \ 2. block gluster storage iptables -I INPUT -s $gluster_server_ip_addr -j DROP 3. migrate VM from src to dst, and check migration status in src host 4. quit src qemu. Results: 1. After step 3, migrate failed due to connection timeout. (qemu) qemu-kvm: Failed to connect socket: Connection timed out (qemu) info migrate globals: store-global-state: on only-migratable: off send-configuration: on send-section-footer: on decompress-error-check: on capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off return-path: off pause-before-switchover: off x-multifd: off dirty-bitmaps: off late-block-activate: off Migration status: failed (Failed to connect socket: Connection timed out) total time: 0 milliseconds 2. After step 4, quit qemu in src, qemu core dump. (qemu) q Unexpected error in raw_apply_lock_bytes() at block/file-posix.c:680: qemu-kvm: Failed to lock byte 100 qemu.sh: line 21: 2893 Aborted (core dumped) /usr/libexec/qemu-kvm -S -name 'avocado-vt-vm1' -sandbox off -machine pc -nodefaults -vga std -device virtio-net-pci,mac=9a:50:51:52:53:54,id=idQaGkcH,vectors=4,netdev=id1nw46d,bus=pci.0,addr=0x5 -netdev tap,id=id1nw46d,vhost=on -m 8192 -smp 8,cores=4,threads=1,sockets=2 -cpu host,+kvm_pv_unhalt -vnc :0 -rtc base=utc,clock=host,driftfix=slew -boot menu=off,strict=off,order=cdn,once=c -enable-kvm -monitor stdio -qmp tcp:localhost:4444,server,nowait -drive id=system_disk,if=none,snapshot=off,aio=threads,cache=writeback,format=qcow2,file=/mnt/gluster/rhel76-64-virtio-scsi.qcow2 -device virtio-blk-pci,id=system,drive=system_disk,bus=pci.0 Dump info: (gdb) bt #0 0x00007f92ab708207 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55 #1 0x00007f92ab7098f8 in __GI_abort () at abort.c:90 #2 0x00005653b6739b44 in error_handle_fatal (errp=<optimized out>, err=0x5653b93fb440) at util/error.c:38 #3 0x00005653b6739c1d in error_setv (errp=0x5653b73bacc0 <error_abort>, src=0x5653b68b0d8f "block/file-posix.c", line=680, func=0x5653b68b1730 <__func__.26520> "raw_apply_lock_bytes", err_class= ERROR_CLASS_GENERIC_ERROR, fmt=<optimized out>, ap=0x7ffe66b7fdb0, suffix=0x0) at util/error.c:71 #4 0x00005653b6739dae in error_setg_internal (errp=errp@entry=0x5653b73bacc0 <error_abort>, src=src@entry=0x5653b68b0d8f "block/file-posix.c", line=<optimized out>, func=<optimized out>, fmt=<optimized out>) at util/error.c:95 #5 0x00005653b66a449d in raw_apply_lock_bytes (fd=19, perm_lock_bits=11, shared_perm_lock_bits=18446744073709551594, unlock=unlock@entry=false, errp=errp@entry=0x5653b73bacc0 <error_abort>) at block/file-posix.c:686 #6 0x00005653b66a461b in raw_handle_perm_lock (bs=<optimized out>, op= RAW_PL_PREPARE, new_perm=11, new_shared=21, errp=0x5653b73bacc0 <error_abort>) at block/file-posix.c:775 #7 0x00005653b665b82f in bdrv_check_perm (bs=bs@entry=0x5653b9291400, q=0x0, q@entry=0x28c6efc60b1c3800, cumulative_perms=11, cumulative_shared_perms=21, ignore_children=ignore_children@entry=0x5653b9260320 = {...}, errp=errp@entry=0x5653b73bacc0 <error_abort>) at block.c:1697 #8 0x00005653b665b68b in bdrv_check_update_perm (bs=0x5653b9291400, q=0x28c6efc60b1c3800, q@entry=0x0, new_used_perm=new_used_perm@entry=11, new_shared_perm=new_shared_perm@entry=21, ignore_children=ignore_children@entry=0x5653b9260320 = {...}, errp=errp@entry=0x5653b73bacc0 <error_abort>) at block.c:1883 #9 0x00005653b665b91c in bdrv_check_perm (errp=0x5653b73bacc0 <error_abort>, ignore_children=0x5653b9260320 = {...}, shared=21, perm=11, q=0x0, c=0x5653b9187400) at block.c:1896 #10 0x00005653b665b91c in bdrv_check_perm (bs=bs@entry=0x5653b928e000, q=q@entry=0x0, cumulative_perms=cumulative_perms@entry=0, cumulative_shared_perms=cumulative_shared_perms@entry=31, ignore_children=ignore_children@entry=0x0, errp=0x5653b73bacc0 <error_abort>) at block.c:1713 #11 0x00005653b665bd5b in bdrv_replace_child (child=child@entry=0x5653b9187360, new_bs=new_bs@entry=0x0) at block.c:2079 ---Type <return> to continue, or q <return> to quit--- #12 0x00005653b665bdf7 in bdrv_detach_child (child=0x5653b9187360) at block.c:2153 #13 0x00005653b665f249 in bdrv_root_unref_child (child=<optimized out>) at block.c:2164 #14 0x00005653b66a0c01 in blk_remove_bs (blk=blk@entry=0x5653b928c000) at block/block-backend.c:776 #15 0x00005653b66a0c5b in blk_remove_all_bs () at block/block-backend.c:474 #16 0x00005653b665c4ff in bdrv_close_all () at block.c:3380 #17 0x00005653b63dae6b in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4776 (In reply to Michal Skrivanek from comment #4) > maybe you can try to block it only in the middle of migration? Since Congli had reproduced this on qemu-kvm-rhev in commment5. Cancel the needinfo. This appears to be a dupe of BZ #1600599. *** This bug has been marked as a duplicate of bug 1600599 *** |
Created attachment 1471356 [details] libvirt_qemu_vdsm_engine_logs.tar.gz Description of problem: Version-Release number of selected component (if applicable): rhv-release-4.2.5-5-001.noarch libvirt-3.9.0-14.el7_5.6.x86_64 vdsm-4.20.35-1.el7ev.x86_64 qemu-kvm-rhev-2.10.0-21.el7_5.4.x86_64 How reproducible:100% Steps to Reproduce: 1.run VM associated with gluster disk. 2.block gluster storage, three volumes, like iptables -I INPUT -s gluster01.scl.lab.tlv.redhat.com -j DROP iptables -I INPUT -s gluster02.scl.lab.tlv.redhat.com -j DROP iptables -I INPUT -s gluster03.scl.lab.tlv.redhat.com -j DROP 3. After a while, the host turns into Non-Operational. at this stage, engine tries to migrate the VM. Actual results: the process of migration crashes (please see the qemu.log attached) Expected results: VM must be reported as i/o error paused while storage is blocked Additional info: