Bug 1404137
Summary: | 'block-job-cancel' can not cancel a "block-stream" job normally | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Yanan Fu <yfu> | ||||
Component: | qemu-kvm-rhev | Assignee: | John Snow <jsnow> | ||||
Status: | CLOSED ERRATA | QA Contact: | Qianqian Zhu <qizhu> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 7.3 | CC: | chayang, jsnow, juzhang, knoel, michen, mrezanin, qizhu, virt-bugs, virt-maint, yfu | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | qemu-kvm-rhev-2.9.0-1.el7 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1456088 (view as bug list) | Environment: | |||||
Last Closed: | 2017-08-01 23:39:45 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1456088 | ||||||
Attachments: |
|
Description
Yanan Fu
2016-12-13 07:34:25 UTC
Created attachment 1231173 [details]
debug log for this bug
Hit another issue in acceptance test, maybe for same reason. qemu version: qemu-kvm-rhev-2.6.0-28.el7_3.3.x86_64 kernel version: kernel-3.10.0-514.6.1.el7.x86_64 2017-01-05 06:18:19: {"execute": "query-block", "id": "VE5G6oWn"} 2017-01-05 06:18:19: {"return": [{"io-status": "ok", "device": "drive_image1", "locked": false, "removable": false, "inserted": {"iops_rd": 0, "detect_zeroes": "off", "image": {"virtual-size": 21474836480, "filename": "/usr/share/avocado/data/avocado-vt/images/rhel73-64-virtio-scsi.qcow2", "cluster-size": 65536, "format": "qcow2", "actual-size": 3972464640, "format-specific": {"type": "qcow2", "data": {"compat": "1.1", "lazy-refcounts": false, "refcount-bits": 16, "corrupt": false}}, "dirty-flag": false}, "iops_wr": 0, "ro": false, "node-name": "#block105", "backing_file_depth": 0, "drv": "qcow2", "iops": 0, "bps_wr": 0, "write_threshold": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "cache": {"no-flush": false, "direct": true, "writeback": true}, "file": "/usr/share/avocado/data/avocado-vt/images/rhel73-64-virtio-scsi.qcow2", "encryption_key_missing": false}, "type": "unknown"}], "id": "VE5G6oWn"} 2017-01-05 06:18:19: {"execute": "blockdev-snapshot-sync", "arguments": {"device": "drive_image1", "snapshot-file": "/usr/share/avocado/data/avocado-vt/images/sn1", "mode": "absolute-paths", "format": "qcow2"}, "id": "tQoC4edJ"} 2017-01-05 06:18:19: {"return": {}, "id": "tQoC4edJ"} 2017-01-05 06:18:19: {"execute": "query-block", "id": "vQ1kHOeN"} 2017-01-05 06:18:19: {"return": [{"io-status": "ok", "device": "drive_image1", "locked": false, "removable": false, "inserted": {"iops_rd": 0, "detect_zeroes": "off", "image": {"backing-image": {"virtual-size": 21474836480, "filename": "/usr/share/avocado/data/avocado-vt/images/rhel73-64-virtio-scsi.qcow2", "cluster-size": 65536, "format": "qcow2", "actual-size": 3972464640, "format-specific": {"type": "qcow2", "data": {"compat": "1.1", "lazy-refcounts": false, "refcount-bits": 16, "corrupt": false}}, "dirty-flag": false}, "backing-filename-format": "qcow2", "virtual-size": 21474836480, "filename": "/usr/share/avocado/data/avocado-vt/images/sn1", "cluster-size": 65536, "format": "qcow2", "actual-size": 200704, "format-specific": {"type": "qcow2", "data": {"compat": "1.1", "lazy-refcounts": false, "refcount-bits": 16, "corrupt": false}}, "full-backing-filename": "/usr/share/avocado/data/avocado-vt/images/rhel73-64-virtio-scsi.qcow2", "backing-filename": "/usr/share/avocado/data/avocado-vt/images/rhel73-64-virtio-scsi.qcow2", "dirty-flag": false}, "iops_wr": 0, "ro": false, "node-name": "#block818", "backing_file_depth": 1, "drv": "qcow2", "iops": 0, "bps_wr": 0, "write_threshold": 0, "backing_file": "/usr/share/avocado/data/avocado-vt/images/rhel73-64-virtio-scsi.qcow2", "encrypted": false, "bps": 0, "bps_rd": 0, "cache": {"no-flush": false, "direct": true, "writeback": true}, "file": "/usr/share/avocado/data/avocado-vt/images/sn1", "encryption_key_missing": false}, "type": "unknown"}], "id": "vQ1kHOeN"} 2017-01-05 06:18:19: {"execute": "block-stream", "arguments": {"device": "drive_image1", "speed": 0}, "id": "ZZ0YQrX1"} 2017-01-05 06:18:19: {"return": {}, "id": "ZZ0YQrX1"} 2017-01-05 06:18:19: {"execute": "query-block-jobs", "id": "J5hfm3xT"} 2017-01-05 06:18:19: {"return": [{"io-status": "ok", "device": "drive_image1", "busy": true, "len": 21474836480, "offset": 0, "paused": false, "speed": 0, "ready": false, "type": "stream"}], "id": "J5hfm3xT"} 2017-01-05 06:18:19: {"execute": "block-job-set-speed", "arguments": {"device": "drive_image1", "speed": 10485760}, "id": "LxsGdjL5"} 2017-01-05 06:20:47: {"timestamp": {"seconds": 1483615247, "microseconds": 872427}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive_image1", "len": 21474836480, "offset": 21474836480, "speed": 0, "type": "stream"}} 2017-01-05 06:20:47: {"id": "LxsGdjL5", "error": {"class": "DeviceNotActive", "desc": "No active block job on device 'drive_image1'"}} 2017-01-05 06:21:11: {"timestamp": {"seconds": 1483615268, "microseconds": 522056}, "event": "NIC_RX_FILTER_CHANGED", "data": {"name": "id7PgQyi", "path": "/machine/peripheral/id7PgQyi/virtio-backend"}} Execute "block-stream", then execute "block-job-set-speed" (06:18:19), but no return, and after 2 minutes, get "BLOCK_JOB_COMPLETED" event (06:20:47), then get the result of "block-job-set speed": 06:20:47: {"id": "LxsGdjL5", "error": {"class": "DeviceNotActive", "desc": "No active block job on device 'drive_image1'"}} With same qemu and kernel version in comment 3. Another failed scenario: 2017-01-05 03:16:28: {"execute": "query-block-jobs", "id": "luZRSqrk"} 2017-01-05 03:16:28: {"return": [{"io-status": "ok", "device": "drive_image1", "busy": true, "len": 32212254720, "offset": 562036736, "paused": false, "speed": 0, "ready": false, "type": "stream"}], "id": "luZRSqrk"} 2017-01-05 03:16:29: {"execute": "query-block-jobs", "id": "NbdasH0s"} 2017-01-05 03:32:29: {"execute": "quit", "id": "KxPD7TaU"} 2017-01-05 03:33:08: {"timestamp": {"seconds": 1483605188, "microseconds": 983567}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive_image1", "len": 32212254720, "offset": 32212254720, "speed": 0, "type": "stream"}} 2017-01-05 03:33:08: {"return": [], "id": "NbdasH0s"} 2017-01-05 03:33:08: {"return": {}, "id": "KxPD7TaU"} Execute "query-block-jobs" (03:16:29) has no return, and after several minutes, get "BLOCK_JOB_COMPLETED" event (03:33:08), then get the return for "query-block-jobs". Hi jsnow, Could you please have a look at this issue, we have hit this issue for not only one time, thanks! OK, will investigate. Trying to reproduce now, do you have any tricks for reproducing? Should I let the stream run for quite some time first, or try to cancel it immediately? What kind of storage media is the VM image on, and what kind of storage media is the snapshot on? (Is the stream operation slow, or fast?) It doesn't look like any of the logs I can still get to appear to reproduce a command line for me to try to reproduce. If you still have the CLI, it might be helpful for trying to recreate the conditions necessary for this softlock. I'll contact Yash Mankad in the meantime for some advice on replicating avocado setups for automating this test myself. Thanks! --js Hi John, Can hit this issue from time to time, please refer the CLI, it comes from avocado test, case name is "block_stream.simple_test.start_stream". CLI: /usr/libexec/qemu-kvm \ -S \ -name 'avocado-vt-vm1' \ -sandbox off \ -machine pc \ -nodefaults \ -vga qxl \ -device pci-bridge,id=pci_bridge,bus=pci.0,addr=03,chassis_nr=1 \ -device intel-hda,bus=pci.0,addr=04 \ -device hda-duplex \ -chardev socket,id=qmp_id_qmp1,path=/var/tmp/avocado_oegWUL/monitor-qmp1-20170216-115906-eZezw2uY,server,nowait \ -mon chardev=qmp_id_qmp1,mode=control \ -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/avocado_oegWUL/monitor-catch_monitor-20170216-115906-eZezw2uY,server,nowait \ -mon chardev=qmp_id_catch_monitor,mode=control \ -device pvpanic,ioport=0x505,id=idMyOH7w \ -chardev socket,id=serial_id_serial0,path=/var/tmp/avocado_oegWUL/serial-serial0-20170216-115906-eZezw2uY,server,nowait \ -device isa-serial,chardev=serial_id_serial0 \ -device virtio-serial-pci,id=virtio_serial_pci0,bus=pci.0,addr=05 \ -chardev socket,path=/var/tmp/avocado_oegWUL/virtio_port-vs-20170216-115906-eZezw2uY,nowait,id=idTf6FMg,server \ -device virtserialport,id=idlnaE32,name=vs,bus=virtio_serial_pci0.0,chardev=idTf6FMg \ -object rng-random,filename=/dev/random,id=passthrough-EzXGP1It \ -device virtio-rng-pci,id=virtio-rng-pci-H621ETef,rng=passthrough-EzXGP1It,bus=pci.0,addr=06 \ -chardev socket,id=seabioslog_id_20170216-115906-eZezw2uY,path=/var/tmp/avocado_oegWUL/seabios-20170216-115906-eZezw2uY,server,nowait \ -device isa-debugcon,chardev=seabioslog_id_20170216-115906-eZezw2uY,iobase=0x402 \ -device ich9-usb-ehci1,id=usb1,addr=1d.7,multifunction=on,bus=pci.0 \ -device ich9-usb-uhci1,id=usb1.0,multifunction=on,masterbus=usb1.0,addr=1d.0,firstport=0,bus=pci.0 \ -device ich9-usb-uhci2,id=usb1.1,multifunction=on,masterbus=usb1.0,addr=1d.2,firstport=2,bus=pci.0 \ -device ich9-usb-uhci3,id=usb1.2,multifunction=on,masterbus=usb1.0,addr=1d.4,firstport=4,bus=pci.0 \ -device nec-usb-xhci,id=usb2,bus=pci.0,addr=07 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=08 \ -drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/win10-64-virtio-scsi.qcow2 \ -device scsi-hd,id=image1,drive=drive_image1 \ -device virtio-net-pci,mac=9a:20:21:22:23:24,id=idyQ6zAf,vectors=4,netdev=idX3NVY3,bus=pci.0,addr=09 \ -netdev tap,id=idX3NVY3,vhost=on,vhostfd=21,fd=20 \ -m 8192 \ -smp 8,maxcpus=8,cores=4,threads=1,sockets=2 \ -cpu 'Opteron_G4',+kvm_pv_unhalt,hv_spinlocks=0x1fff,hv_vapic,hv_time \ -drive id=drive_cd1,if=none,snapshot=off,aio=native,cache=none,media=cdrom,file=/home/kvm_autotest_root/iso/windows/winutils.iso \ -device scsi-cd,id=cd1,drive=drive_cd1 \ -device usb-tablet,id=usb-tablet1,bus=usb2.0,port=1 \ -vnc :0 \ -rtc base=localtime,clock=host,driftfix=slew \ -boot order=cdn,once=c,menu=off,strict=off \ -no-hpet \ -enable-kvm \ -watchdog i6300esb \ -watchdog-action reset \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0a \ -monitor stdio \ Boot VM and do block_stream: 11:59:12 DEBUG| (monitor qmp1) Sending command 'blockdev-snapshot-sync' 11:59:12 DEBUG| Send command: {'execute': 'blockdev-snapshot-sync', 'arguments': {'device': u'drive_image1', 'snapshot-file': '/home/kvm_autotest_root/images/sn1', 'mode': 'absolute-paths', 'format': 'qcow2'}, 'id': 'ua7vPoev'} 11:59:12 INFO | [qemu output] Formatting '/home/kvm_autotest_root/images/sn1', fmt=qcow2 size=32212254720 backing_file=/home/kvm_autotest_root/images/win10-64-virtio-scsi.qcow2 backing_fmt=qcow2 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16 11:59:12 DEBUG| (monitor qmp1) Sending command 'block-stream' 11:59:12 DEBUG| Send command: {'execute': 'block-stream', 'arguments': {'device': u'drive_image1', 'speed': 0}, 'id': 'rSubpE9K'} 11:59:12 DEBUG| (monitor qmp1) Sending command 'query-block-jobs' 11:59:12 DEBUG| Send command: {'execute': 'query-block-jobs', 'id': 'AhH7RKBc'} keep execute "query-block-jobs", can hit this issue, qmp has no return, and get "BLOCK_JOB_COMPLETED" event after 15 mins. then get null return for "query-block-jobs". 2017-02-16 12:00:02: {"execute": "query-block-jobs", "id": "iQMNcLKk"} 2017-02-16 12:15:28: {"timestamp": {"seconds": 1487265310, "microseconds": 506756}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive_image1", "len": 32212254720, "offset": 32212254720, "speed": 0, "type": "stream"}} 2017-02-16 12:15:28: {"return": [], "id": "iQMNcLKk"} 2017-02-16 12:15:28: {"timestamp": {"seconds": 1487265320, "microseconds": 707396}, "event": "NIC_RX_FILTER_CHANGED", "data": {"name": "idyQ6zAf", "path": "/machine/peripheral/idyQ6zAf/virtio-backend"}} (In reply to John Snow from comment #6) > Trying to reproduce now, do you have any tricks for reproducing? Should I > let the stream run for quite some time first, or try to cancel it > immediately? > > What kind of storage media is the VM image on, and what kind of storage > media is the snapshot on? (Is the stream operation slow, or fast?) Hi John, This issue is not only related with block job cancel. According to the description and comment 3, comment 4, include three scenario: 1. "block-job-cancel" has no return, and get "BLOCK_JOB_COMPLETED" in the end, then get error for "block-job-cancel". Both "cancel the job immediately" or "let it run for quite some time first" are ok. 2. "block-job-set-speed" has no return, and get "BLOCK_JOB_COMPLETED" in the end, then get error for "block-job-set-speed". 3. "query-block-jobs" has no return, and get "BLOCK_JOB_COMPLETED" in the end, then get error for "query-block-jobs" These issue was reproduced with in avocado test (case name: block_stream.simple_test.start_stream can reproduce it), the storage media is a local qcow2 image, and the snapshot is a local qcow2 image too. In comment 8, qemu command line is provided, hope it useful, thanks ! In This may be related to https://bugzilla.redhat.com/show_bug.cgi?id=1367369 and I may ask you to re-test once the fix for that issue goes into the tree. Can we re-test under rhv 2.9.0 branch, please? I'd like to follow-up on comment #10. (In reply to John Snow from comment #11) > Can we re-test under rhv 2.9.0 branch, please? I'd like to follow-up on > comment #10. Hi John, Test version: qemu-kvm-rhev-2.9.0-1.el7, run test with avocado, case name "drive_mirror.simple_test..on_localfs.2qcow2", and repeat 10 times, total 60 cases, all pass, job link: http://10.66.4.244/kvm_autotest_job_log/?jobid=1835106 So, i think , it is ok with rhv 2.9 branch ! (In reply to Yanan Fu from comment #12) > (In reply to John Snow from comment #11) > > Can we re-test under rhv 2.9.0 branch, please? I'd like to follow-up on > > comment #10. > > Hi John, > Test version: qemu-kvm-rhev-2.9.0-1.el7, run test with avocado, case name > "drive_mirror.simple_test..on_localfs.2qcow2", and repeat 10 times, total 60 > cases, all pass, job link: > http://10.66.4.244/kvm_autotest_job_log/?jobid=1835106 > > So, i think , it is ok with rhv 2.9 branch ! AWESOME! Thanks, Yanan! :) Mirek, I'm moving this to POST to indicate that this was fixed by the 2.9.0 rebase. Per comment 12, moving to VERIFIED. Hi John, I meet this issue in acceptance test for the latest rhev7.3.z build. version: qemu-kvm-rhev-2.6.0-28.el7_3.10.x86_64 This issue was verified after rebase to rhv 2.9 branch and maybe related to bz 1367369 per comment 10, Do we need to clone and fix it on 7.3.z branch ? Best wishes Yanan Fu Yes, if the issue is being observed in 7.3.z still, please fork this BZ, assign me directly, and I will prepare a backport for you. I'll get started on that now. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 |