Red Hat Bugzilla – Bug 1264255
When hot-unplug a device which is doing block-commit, guest and qemu will hang until the commit finished, and call trace appears in guest
Last modified: 2017-08-01 23:24:58 EDT
Description of problem: Boot guest with a second disk, then do live snapshot (length of chain >=3). During block-commit from snapshot2 to snapshot1, hot-unplug this device, guest and qemu will hang until block-commit finished. And there are call trace in guest. Version-Release number of selected component (if applicable): Host:rhel7.2 Kernel:3.10.0-316.el7.x86_64 qemu-kvm-rhev:qemu-kvm-rhev-2.3.0-23.el7.x86_64 Guest:rhel6.7z Kernel:2.6.32-573.7.1.el6.x86_64 How reproducible: 100% Steps to Reproduce: 1. boot guest with a data disk # /usr/libexec/qemu-kvm -name rhel6.7z -machine pc-i440fx-rhel7.1.0,accel=kvm \ -cpu SandyBridge -m 4G,slots=256,maxmem=40G -numa node \ -smp 4,sockets=2,cores=2,threads=1 \ -uuid 82b1a01e-5f6c-4f5f-8d27-3855a74e6b61 \ -netdev tap,id=hostnet0 \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=12:54:00:5c:88:61 \ -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=16 \ -vnc 0.0.0.0:1 \ -monitor stdio \ -qmp tcp:0:5555,server,nowait \ -serial unix:/tmp/monitor,server,nowait \ -drive file=/mnt/disk1/rhel6.7_virtio.qcow2,format=qcow2,if=none,id=drive-virtio-blk0,werror=stop,rerror=stop \ -device virtio-blk-pci,drive=drive-virtio-blk0,id=virtio-blk0 \ -drive file=/mnt/disk2/data1.qcow2,format=qcow2,if=none,id=drive-virtio-blk1,werror=stop,rerror=stop \ -device virtio-blk-pci,drive=drive-virtio-blk1,id=virtio-blk1 \ 2. do live snapshot to this data disk { "execute": "blockdev-snapshot-sync", "arguments": { "device": "drive-virtio-blk1", "snapshot-file": "/mnt/disk4/snapshot1.qcow2", "format": "qcow2", "mode": "absolute-paths" } } {"return": {}} { "execute": "blockdev-snapshot-sync", "arguments": { "device": "drive-virtio-blk1", "snapshot-file": "/mnt/disk4/snapshot2.qcow2", "format": "qcow2", "mode": "absolute-paths" } } 3. during block-commit, hot-unplug this data disk, guest and qemu both hang. { "execute": "block-commit", "arguments": { "device": "drive-virtio-blk1","base": "/mnt/disk4/snapshot1.qcow2", "top": "/mnt/disk4/snapshot2.qcow2", "speed": 0 } } {"return": {}} {"execute":"device_del","arguments":{"id":"virtio-blk1"}} {"return": {}} 4. the device will be deleted until block-commit finished. Meanwhile, qemu and guest become responsive again. {"timestamp": {"seconds": 1442543658, "microseconds": 293415}, "event": "DEVICE_DELETED", "data": {"path": "/machine/peripheral/virtio-blk1/virtio-backend"}} {"timestamp": {"seconds": 1442543658, "microseconds": 293520}, "event": "DEVICE_DELETED", "data": {"device": "virtio-blk1", "path": "/machine/peripheral/virtio-blk1"}} {"timestamp": {"seconds": 1442543659, "microseconds": 39327}, "event": "BLOCK_JOB_READY", "data": {"device": "", "len": 4067885056, "offset": 4067885056, "speed": 0, "type": "commit"}} {"timestamp": {"seconds": 1442543659, "microseconds": 39415}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "", "len": 4067885056, "offset": 4067885056, "speed": 0, "type": "commit"}} 5. call trace appear in guest. # dmesg BUG: soft lockup - CPU#3 stuck for 67s! [udevd:567] Clocksource tsc unstable (delta = -137436967906 ns) Modules linked in: ext2 autofs4 ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 sg microcode serio_raw virtio_net i2c_piix4 i2c_core ext4 jbd2 mbcache virtio_blk sr_mod cdrom virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib] CPU 3 Modules linked in: ext2 autofs4 ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 sg microcode serio_raw virtio_net i2c_piix4 i2c_core ext4 jbd2 mbcache virtio_blk sr_mod cdrom virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib] Pid: 567, comm: udevd Not tainted 2.6.32-573.7.1.el6.x86_64 #1 Red Hat KVM RIP: 0010:[<ffffffff8105435a>] [<ffffffff8105435a>] flush_tlb_others_ipi+0x11a/0x130 RSP: 0000:ffff88013719bb38 EFLAGS: 00000246 RAX: 0000000000000000 RBX: ffff88013719bb78 RCX: 0000000000000004 RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff81e3e8d8 RBP: ffffffff8100bc0e R08: 0000000000000000 R09: 0000000000000000 R10: ffff8801370a8100 R11: 0000000000000246 R12: ffff88013719bb28 R13: ffffffff8100bc0e R14: 0000000100000001 R15: ffff880137223400 FS: 00007f7c25b3b7a0(0000) GS:ffff880028380000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7c275b25c8 CR3: 00000001369d0000 CR4: 00000000000406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process udevd (pid: 567, threadinfo ffff880137198000, task ffff88013726eab0) Stack: 00007f7c275b25c8 ffffffff81e3e8c0 ffff88013726eab0 ffff8801370a8100 <d> 00007f7c275b25c8 ffff8801370a83c8 0000000000000008 ffffea000440ace8 <d> ffff88013719bba8 ffffffff810543e6 ffff88013719bba8 ffff8801370a8100 Call Trace: [<ffffffff810543e6>] ? native_flush_tlb_others+0x76/0x90 [<ffffffff8105450e>] ? flush_tlb_page+0x5e/0xb0 [<ffffffff810534dd>] ? ptep_set_access_flags+0x6d/0x70 [<ffffffff811519e3>] ? do_wp_page+0x493/0x920 [<ffffffff8115266d>] ? handle_pte_fault+0x2cd/0xb20 [<ffffffff81456004>] ? move_addr_to_kernel+0x64/0x70 [<ffffffff81153159>] ? handle_mm_fault+0x299/0x3d0 [<ffffffff8104f156>] ? __do_page_fault+0x146/0x500 [<ffffffff8123e90b>] ? selinux_file_permission+0xfb/0x150 [<ffffffff8153eb4e>] ? do_page_fault+0x3e/0xa0 [<ffffffff8153bef5>] ? page_fault+0x25/0x30 Code: 7d f8 c9 c3 0f 1f 44 00 00 48 8b 05 69 22 bc 00 41 8d b5 f0 00 00 00 4c 89 e7 ff 90 e0 00 00 00 eb 09 0f 1f 80 00 00 00 00 f3 90 <8b> 35 20 87 bb 00 4c 89 e7 e8 58 ab 24 00 85 c0 74 ec eb 90 66 Call Trace: [<ffffffff81054368>] ? flush_tlb_others_ipi+0x128/0x130 [<ffffffff810543e6>] ? native_flush_tlb_others+0x76/0x90 [<ffffffff8105450e>] ? flush_tlb_page+0x5e/0xb0 [<ffffffff810534dd>] ? ptep_set_access_flags+0x6d/0x70 [<ffffffff811519e3>] ? do_wp_page+0x493/0x920 [<ffffffff8115266d>] ? handle_pte_fault+0x2cd/0xb20 [<ffffffff81456004>] ? move_addr_to_kernel+0x64/0x70 [<ffffffff81153159>] ? handle_mm_fault+0x299/0x3d0 [<ffffffff8104f156>] ? __do_page_fault+0x146/0x500 [<ffffffff8123e90b>] ? selinux_file_permission+0xfb/0x150 [<ffffffff8153eb4e>] ? do_page_fault+0x3e/0xa0 [<ffffffff8153bef5>] ? page_fault+0x25/0x30 Disabling lock debugging due to kernel taint virtio-pci 0000:00:05.0: PCI INT A disabled Actual results: 1. guest and qemu both hang until the block-commit is completed. 2. there are call trace in the guest Expected results: 1. guest and qemu should always be responsive 2. the block-commit should be canceled immediately 3. no call trace in the guest Additional info:
More info: 1. With above steps, block-commit to base file(at the bottom of snapshot chain) also hit the same issue. .... { "execute": "block-commit", "arguments": { "device": "drive-virtio-blk1"} } .... 2. During block-stream, then hot-unplug the device, guest and qemu will hang until stream finished. ....(Steps like above) { "execute": "block-stream", "arguments": { "device": "drive-virtio-blk1","base": "/mnt/disk2/data1.qcow2", "speed": 1000000000,"backing-file":"/mnt/disk2/data1.qcow2"} } {"return": {}} {"execute":"device_del","arguments":{"id":"virtio-blk1"}} {"return": {}} {"timestamp": {"seconds": 1442546589, "microseconds": 477854}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive-virtio-blk1", "len": 5368709120, "offset": 5368709120, "speed": 1000000000, "type": "stream"}} {"timestamp": {"seconds": 1442546589, "microseconds": 526804}, "event": "DEVICE_DELETED", "data": {"path": "/machine/peripheral/virtio-blk1/virtio-backend"}} {"timestamp": {"seconds": 1442546590, "microseconds": 423785}, "event": "DEVICE_DELETED", "data": {"device": "virtio-blk1", "path": "/machine/peripheral/virtio-blk1"}} check dmesg in guest: # dmesg Clocksource tsc unstable (delta = -120257117794 ns) virtio-pci 0000:00:05.0: PCI INT A disabled
On device deletion, the running block job will be cancelled, so this behavior no longer exists.
Reproduced on: qemu-kvm-rhev-2.3.0-22.el7.x86_64 kernel-3.10.0-640.el7.x86_64 Steps: 1. Launch guest with secondary disk: /usr/libexec/qemu-kvm -name rhel7_4-9343 -m 1G -smp 2 -drive file=/home/kvm_autotest_root/images/rhel74-64-virtio.qcow2,format=qcow2,if=none,cache=none,media=disk,werror=stop,rerror=stop,id=drive-0 -device virtio-blk-pci,drive=drive-0,id=virtio-blk-0,bootindex=0 -drive file=/home/test,format=qcow2,if=none,cache=none,aio=native,id=drive-virtio-blk1,werror=stop,rerror=stop -device virtio-blk-pci,drive=drive-virtio-blk1,id=virtio-blk1,bus=pci.0,addr=0x14,serial="QEMU-DISK2" -monitor stdio -qmp tcp:0:5555,server,nowait -vnc :3 -netdev tap,id=hostnet0,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=70:e2:84:14:0e:15 2.Live snapshot: { "execute": "blockdev-snapshot-sync", "arguments": { "device": "drive-virtio-blk1", "snapshot-file": "/home/data-sn1", "format": "qcow2", "mode": "absolute-paths" } } { "execute": "blockdev-snapshot-sync", "arguments": { "device": "drive-virtio-blk1", "snapshot-file": "/home/data-sn2", "format": "qcow2", "mode": "absolute-paths" } } 3. dd files inside guest to differ sn2 from sn1: dd if=/dev/urandom of=/dev/vdb2 bs=1M count=2000 4. Block commit: { "execute": "block-commit", "arguments": { "device": "drive-virtio-blk1","base":"/home/data-sn1", "top": "/home/data-sn2", "speed": 0 } } 5. Hot unplug device: {"execute":"device_del","arguments":{"id":"virtio-blk1"}} Results: Both qemu and guest hang, until block commit finishes. Verified on: qemu-kvm-rhev-2.9.0-1.el7.x86_64.rpm kernel-3.10.0-640.el7.x86_64 Steps same as above. Results: Both qemu and guest works well and block job is cancelled after hot unplug device: { "execute": "block-commit", "arguments": { "device": "drive-virtio-blk1","base":"/home/data-sn1", "top": "/home/data-sn2", "speed": 0 } } {"return": {}} {"execute":"device_del","arguments":{"id":"virtio-blk1"}} {"return": {}} {"timestamp": {"seconds": 1493807017, "microseconds": 459789}, "event": "DEVICE_DELETED", "data": {"path": "/machine/peripheral/virtio-blk1/virtio-backend"}} {"timestamp": {"seconds": 1493807017, "microseconds": 459871}, "event": "DEVICE_DELETED", "data": {"device": "virtio-blk1", "path": "/machine/peripheral/virtio-blk1"}} {"timestamp": {"seconds": 1493807017, "microseconds": 706040}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "drive-virtio-blk1", "len": 1048576000, "offset": 134217728, "speed": 0, "type": "commit"}} Moving to VERIFIED per above result.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392