Bug 1603104
Summary: | Qemu Aborted (core dumped) for 'qemu-kvm: Failed to lock byte 100' when remote NFS or GlusterFS volume stopped during the block mirror(or block commit/stream) process | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Gu Nini <ngu> | ||||||
Component: | qemu-kvm-rhev | Assignee: | Hanna Czenczek <hreitz> | ||||||
Status: | CLOSED ERRATA | QA Contact: | aihua liang <aliang> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 7.6 | CC: | areis, chayang, coli, jraju, juzhang, kwolf, nchandek, ngu, qzhang, rhodain, smykhail, victorlee0321, virt-maint, xianwang, yafu, yhong | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | qemu-kvm-rhev-2.12.0-27.el7 | Doc Type: | If docs needed, set a value | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 1981328 (view as bug list) | Environment: | |||||||
Last Closed: | 2019-08-22 09:18:46 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1649160, 1981328 | ||||||||
Attachments: |
|
Could you test this scratch build? https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=17262939 It has this patch from upstream and resolves the problem in BZ 1584982: https://lists.nongnu.org/archive/html/qemu-devel/2018-07/msg03750.html Created attachment 1469886 [details] gdb debug info_all_threads-with_fzheng201807201416_qemu_branch-07232018 (In reply to Fam Zheng from comment #2) > Could you test this scratch build? > > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=17262939 > > It has this patch from upstream and resolves the problem in BZ 1584982: > > https://lists.nongnu.org/archive/html/qemu-devel/2018-07/msg03750.html Have done the test, could also reproduce the bug: # rpm -qa|grep qemu qemu-kvm-tools-rhev-2.12.0-7.el7.fzheng201807201416.x86_64 qemu-img-rhev-2.12.0-7.el7.fzheng201807201416.x86_64 qemu-kvm-rhev-2.12.0-7.el7.fzheng201807201416.x86_64 ipxe-roms-qemu-20170123-1.git4e85b27.el7_4.1.noarch qemu-kvm-rhev-debuginfo-2.12.0-7.el7.fzheng201807201416.x86_64 qemu-kvm-common-rhev-2.12.0-7.el7.fzheng201807201416.x86_64 # ./vm22.sh rhel76.qcow2 hd1 QEMU 2.12.0 monitor - type 'help' for more information (qemu) Formatting '/mnt/mirror2', fmt=qcow2 size=10737418240 cluster_size=65536 lazy_refcounts=off refcount_bits=16 (qemu) (qemu) Unexpected error in raw_check_lock_bytes() at block/file-posix.c:702: qemu-kvm: Failed to get "consistent read" lock ./vm22.sh: line 29: 10849 Aborted (core dumped) /usr/libexec/qemu-kvm -name 'avocado-vt-vm1' ...... Indeed, there's one more place in the code that could be "optimized" away similarly to avoid this error. I'll write another version of the patch. It is reproduced with nfs server. =============================================================== Description of problem: Qemu core dumped after force quit when stop the nfs server. Version-Release number of selected component (if applicable): Host: 3.10.0-928.el7.x86_64 Guest: 3.10.0-928.el7.x86_64 Qemu: qemu-kvm-rhev-2.12.0-8.el7 How reproducible: 100% Steps to Reproduce: 1.Setup and configure a nfs server on localhost. e.g: [root@dhcp-8-111 ~]# vim /etc/exports /var/tmp/mnt_dir/ *(rw,sync,no_root_squash) [root@dhcp-8-111 ~]# mkdir -p /var/tmp/mnt_dir/ [root@dhcp-8-111 ~]# qemu-img create -f qcow2 /var/tmp/mnt_dir/nfs_corrupt.qcow2 10G Formatting '/var/tmp/mnt_dir/nfs_corrupt.qcow2', fmt=qcow2 size=10737418240 cluster_size=65536 lazy_refcounts=off refcount_bits=16 [root@dhcp-8-111 ~]# systemctl restart nfs.service 2.Mount the nfs file to localhost. e.g: [root@dhcp-8-111 ~]# mount -o rw,soft,timeo=30,retrans=1,vers=3 10.66.8.111:/var/tmp/mnt_dir/ /var/tmp/mnt_dir/ 3.Boot a guest with the above nfs iamge file. e.g: [root@dhcp-8-111 cmds]# cat guest.sh /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox off \ -machine pc \ -nodefaults \ -device VGA,bus=pci.0,addr=0x2 \ -device pvpanic,ioport=0x505,id=idLNrmW5 \ -chardev socket,id=serial_id_serial0,path=/var/tmp/serial-serial0-20180727-191800-6KpwjpqB,server,nowait \ -device isa-serial,chardev=serial_id_serial0 \ -chardev socket,id=seabioslog_id_20180727-191800-6KpwjpqB,path=/var/tmp/seabios-20180727-191800-6KpwjpqB,server,nowait \ -device isa-debugcon,chardev=seabioslog_id_20180727-191800-6KpwjpqB,iobase=0x402 \ -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x3 \ -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/rhel76-64-virtio.qcow2 \ -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=0x4 \ -drive id=drive_stg,if=none,werror=stop,snapshot=off,aio=threads,cache=none,format=qcow2,file=/var/tmp/mnt_dir/nfs_corrupt.qcow2 \ -device virtio-blk-pci,id=stg,drive=drive_stg,bus=pci.0,addr=0xa \ -device virtio-net-pci,mac=9a:62:63:64:65:66,id=idKCHf6S,vectors=4,netdev=idcsqaeC,bus=pci.0,addr=0x5 \ -netdev tap,id=idcsqaeC,vhost=on \ -m 4096 \ -smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -vnc :0 \ -rtc base=utc,clock=host,driftfix=slew \ -boot menu=off,strict=off,order=cdn,once=c \ -enable-kvm \ -monitor stdio \ 4. check the nfs image file info. e.g: [root@dhcp-8-137 ~]# lsblk lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT vda 252:0 0 20G 0 disk ├─vda1 252:1 0 1G 0 part /boot └─vda2 252:2 0 19G 0 part ├─rhel_dhcp--8--137-root 253:0 0 17G 0 lvm / └─rhel_dhcp--8--137-swap 253:1 0 2G 0 lvm [SWAP] vdb 252:16 0 10G 0 disk 5. Stop the nfs server. e.g: [root@dhcp-8-111 ~]# systemctl stop nfs.service 6. Format this device. e.g: [root@dhcp-8-137 ~]# mkfs.xfs /dev/vdb mkfs.xfs /dev/vdb meta-data=/dev/vdb isize=512 agcount=4, agsize=655360 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=0, sparse=0 data = bsize=4096 blocks=2621440, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal log bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 ... Guest will be hung here since the nfs server is stopped. 7. Check the qmeu info with hmp. e.g: (qemu) info status VM status: paused (io-error) (qemu) info status VM status: paused (io-error) 8. Force quit qmeu. e.g: (qemu) info status VM status: paused (io-error) (qemu) q Unexpected error in raw_apply_lock_bytes() at block/file-posix.c:680: qemu-kvm: Failed to lock byte 100 guest.sh: line 27: 2641 Aborted (core dumped) /usr/libexec/qemu-kvm -name 'avocado-vt-vm1' -sandbox off -machine pc -nodefaults -device VGA,bus=pci.0,addr=0x2 -device pvpanic,ioport=0x505,id=idLNrmW5 -chardev socket,id=serial_id_serial0,path=/var/tmp/serial-serial0-20180727-191800-6KpwjpqB,server,nowait -device isa-serial,chardev=serial_id_serial0 -chardev socket,id=seabioslog_id_20180727-191800-6KpwjpqB,path=/var/tmp/seabios-20180727-191800-6KpwjpqB,server,nowait -device isa-debugcon,chardev=seabioslog_id_20180727-191800-6KpwjpqB,iobase=0x402 -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x3 -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/rhel76-64-virtio.qcow2 -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=0x4 -drive id=drive_stg,if=none,werror=stop,snapshot=off,aio=threads,cache=none,format=qcow2,file=/var/tmp/mnt_dir/nfs_corrupt.qcow2 -device virtio-blk-pci,id=stg,drive=drive_stg,bus=pci.0,addr=0xa -device virtio-net-pci,mac=9a:62:63:64:65:66,id=idKCHf6S,vectors=4,netdev=idcsqaeC,bus=pci.0,addr=0x5 -netdev tap,id=idcsqaeC,vhost=on -m 4096 -smp 4,maxcpus=4,cores=2,threads=1,sockets=2 -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 -vnc :0 -rtc base=utc,clock=host,driftfix=slew -boot menu=off,strict=off,order=cdn,once=c -enable-kvm -monitor stdio [root@dhcp-8-111 cmds]# Actual results: Qemu core dumped. Expected results: Qemu should be quit normally. Additional info: gdb info: Unexpected error in raw_apply_lock_bytes() at block/file-posix.c:680: qemu-kvm: Failed to lock byte 100 Program received signal SIGABRT, Aborted. 0x00007fffdef5c207 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x00007fffdef5c207 in raise () at /lib64/libc.so.6 #1 0x00007fffdef5d8f8 in abort () at /lib64/libc.so.6 #2 0x0000555555ba2814 in error_handle_fatal () #3 0x0000555555ba28ed in error_setv () #4 0x0000555555ba2a7e in error_setg_internal () #5 0x0000555555b0d16d in raw_apply_lock_bytes () #6 0x0000555555b0d2eb in raw_handle_perm_lock () #7 0x0000555555ac44ff in bdrv_check_perm () #8 0x0000555555ac45ec in bdrv_check_perm () #9 0x0000555555ac4a2b in bdrv_replace_child () #10 0x0000555555ac4ac7 in bdrv_detach_child () #11 0x0000555555ac7f19 in bdrv_root_unref_child () #12 0x0000555555b098d1 in blk_remove_bs () #13 0x0000555555b0992b in blk_remove_all_bs () #14 0x0000555555ac51cf in bdrv_close_all () #15 0x0000555555840b0b in main () =============================================================== Thanks. The main fix should probably have been backported to qemu-kvm-rhev-2.12.0-23.el7 for BZ 1551486. There is another similar crash, though, see the reproducer (2) in BZ 1694148 comment 2 (for which 696aaaed579ac5bf5fa336216909b46d3d8f07a8 needs to be backported). Max Fix included in qemu-kvm-rhev-2.12.0-27.el7 Verified on qemu-kvm-rhev-2.12.0-27.el7.x86_64, it has been fixed, set bug's status to "Verified". Test steps: 1.Mount gluster volume mount.glusterfs intel-5405-32-2.englab.nay.redhat.com:/aliang /mnt/aliang 2.Start guest with cmds: /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -machine pc \ -nodefaults \ -device VGA,bus=pci.0,addr=0x2 \ -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20190423-215834-BzwOjODj,server,nowait \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20190423-215834-BzwOjODj,server,nowait \ -mon chardev=qmp_id_catch_monitor,mode=control \ -device pvpanic,ioport=0x505,id=idapUGH0 \ -chardev socket,id=serial_id_serial0,path=/var/tmp/serial-serial0-20190423-215834-BzwOjODj,server,nowait \ -device isa-serial,chardev=serial_id_serial0 \ -chardev socket,id=seabioslog_id_20190423-215834-BzwOjODj,path=/var/tmp/seabios-20190423-215834-BzwOjODj,server,nowait \ -device isa-debugcon,chardev=seabioslog_id_20190423-215834-BzwOjODj,iobase=0x402 \ -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x3 \ -object iothread,id=iothread0 \ -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/rhel77-64-virtio.qcow2 \ -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=0x4,iothread=iothread0 \ -drive id=drive_data1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/test.img \ -device virtio-blk-pci,id=data1,drive=drive_data1,bus=pci.0,addr=0x6,iothread=iothread0 \ -device virtio-net-pci,mac=9a:84:85:86:87:88,id=idc38p8G,vectors=4,netdev=idFM5N3v,bus=pci.0,addr=0x5 \ -netdev tap,id=idFM5N3v,vhost=on \ -m 2048 \ -smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \ -cpu 'Westmere',+kvm_pv_unhalt \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -vnc :0 \ -rtc base=utc,clock=host,driftfix=slew \ -boot menu=off,strict=off,order=cdn,once=c \ -enable-kvm \ -monitor stdio \ -qmp tcp:0:3000,server,nowait \ 3.Format data disk and write 900M on it (guest)#mkfs.ext4 /dev/vdb #dd if=/dev/urandom of=/dev/vdb bs=1M count=900 oflag=direct 4.Do mirror to target { "execute": "drive-mirror", "arguments": { "device": "drive_data1", "target": "/mnt/aliang/mirror", "format": "qcow2", "mode": "absolute-paths", "sync": "full","speed": 100}} {"execute":"query-block-jobs"} {"return": [{"auto-finalize": true, "io-status": "ok", "device": "drive_data1", "auto-dismiss": true, "busy": false, "len": 34668544, "offset": 720896, "status": "running", "paused": false, "speed": 100, "ready": false, "type": "mirror"}]} 5.Stop target volume and check block job status: (gluster server)# gluster volume stop aliang {"execute":"query-block-jobs"} {"return": [{"auto-finalize": true, "io-status": "ok", "device": "drive_data1", "auto-dismiss": true, "busy": false, "len": 34668544, "offset": 720896, "status": "running", "paused": false, "speed": 100, "ready": false, "type": "mirror"}]} 6.Start volume (gluster server)# gluster volume start aliang 7.Wait for some minutes, then check block job status {"execute":"query-block-jobs"} {"return": [{"auto-finalize": true, "io-status": "ok", "device": "drive_data1", "auto-dismiss": true, "busy": false, "len": 34668544, "offset": 720896, "status": "running", "paused": false, "speed": 100, "ready": false, "type": "mirror"}]} 8.Set block job speed to unlimit { "execute": "block-job-set-speed", "arguments": { "device": "drive_data1","speed":0}} {"return": {}} {"timestamp": {"seconds": 1556452597, "microseconds": 599458}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "drive_data1"}} {"timestamp": {"seconds": 1556452597, "microseconds": 599580}, "event": "BLOCK_JOB_READY", "data": {"device": "drive_data1", "len": 34668544, "offset": 34668544, "speed": 0, "type": "mirror"}} 9.Complete mirror job { "execute": "block-job-complete", "arguments": { "device": "drive_data1"}} {"return": {}} {"timestamp": {"seconds": 1556452635, "microseconds": 827927}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "drive_data1"}} {"timestamp": {"seconds": 1556452635, "microseconds": 827973}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "drive_data1"}} {"timestamp": {"seconds": 1556452635, "microseconds": 861664}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive_data1", "len": 34668544, "offset": 34668544, "speed": 0, "type": "mirror"}} {"timestamp": {"seconds": 1556452635, "microseconds": 861748}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "drive_data1"}} {"timestamp": {"seconds": 1556452635, "microseconds": 861800}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "drive_data1"}} Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:2553 I fixed it with mount -t nfs -o nolock nfs_address mount_point |
Created attachment 1459915 [details] gdb info for drive mirror on 07192018 Description of problem: Mount a glusterfs storage, start guest in localfs with a data disk(also in localfs), then do block mirror for the data disk to the glusterfs storage. Stop the glusterfs volume during the drive mirror process, it's found the qemu process core dumped. Version-Release number of selected component (if applicable): Host kernel: 3.10.0-918.el7.x86_64 Qemu-kvm-rhev: qemu-kvm-rhev-2.12.0-7.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1. Mount a glusterfs storage: # mount -t glusterfs xx.xx.xx.xx:/gv1 /home/kvm_autotest_root/images 2. Start a guest on localfs, and with a data disk drive_image2 3. In the guest, mkfs/mount the data disk, write some data 3. Do block mirror for the drive_image2 to the glusterfs storage: { "execute": "drive-mirror", "arguments": { "device": "drive_image2", "target": "/home/kvm_autotest_root/images/mirror2", "format": "qcow2", "mode": "absolute-paths", "sync": "full","speed": 100}} 4. Stop the glusterfs volume in the glusterfs server: # gluster volume stop gv1 Actual results: The qemu-kvm guest core dumped: # ./vm22.sh rhel76.qcow2 hd1 QEMU 2.12.0 monitor - type 'help' for more information (qemu) (qemu) Formatting '/home/kvm_autotest_root/images/mirror2', fmt=qcow2 size=1073741824 cluster_size=65536 lazy_refcounts=off refcount_bits=16 Unexpected error in raw_apply_lock_bytes() at block/file-posix.c:649: qemu-kvm: Failed to lock byte 100 ./vm22.sh: line 29: 1193 Aborted (core dumped) /usr/libexec/qemu-kvm -name 'avocado-vt-vm1' ...... Expected results: No core dump Additional info: block-commit and block-stream have the same problem