Bug 1264255
| Summary: | When hot-unplug a device which is doing block-commit, guest and qemu will hang until the commit finished, and call trace appears in guest | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Pei Zhang <pezhang> |
| Component: | qemu-kvm-rhev | Assignee: | Jeff Cody <jcody> |
| Status: | CLOSED ERRATA | QA Contact: | Qianqian Zhu <qizhu> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 7.2 | CC: | chayang, huding, juzhang, knoel, michen, mrezanin, virt-maint, xfu |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | qemu-kvm-rhev-2.9.0-1.el7 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-08-01 23:29:42 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
More info:
1. With above steps, block-commit to base file(at the bottom of snapshot chain) also hit the same issue.
....
{ "execute": "block-commit", "arguments": { "device": "drive-virtio-blk1"} }
....
2. During block-stream, then hot-unplug the device, guest and qemu will hang until stream finished.
....(Steps like above)
{ "execute": "block-stream", "arguments": { "device": "drive-virtio-blk1","base": "/mnt/disk2/data1.qcow2", "speed": 1000000000,"backing-file":"/mnt/disk2/data1.qcow2"} }
{"return": {}}
{"execute":"device_del","arguments":{"id":"virtio-blk1"}}
{"return": {}}
{"timestamp": {"seconds": 1442546589, "microseconds": 477854}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive-virtio-blk1", "len": 5368709120, "offset": 5368709120, "speed": 1000000000, "type": "stream"}}
{"timestamp": {"seconds": 1442546589, "microseconds": 526804}, "event": "DEVICE_DELETED", "data": {"path": "/machine/peripheral/virtio-blk1/virtio-backend"}}
{"timestamp": {"seconds": 1442546590, "microseconds": 423785}, "event": "DEVICE_DELETED", "data": {"device": "virtio-blk1", "path": "/machine/peripheral/virtio-blk1"}}
check dmesg in guest:
# dmesg
Clocksource tsc unstable (delta = -120257117794 ns)
virtio-pci 0000:00:05.0: PCI INT A disabled
On device deletion, the running block job will be cancelled, so this behavior no longer exists. Reproduced on:
qemu-kvm-rhev-2.3.0-22.el7.x86_64
kernel-3.10.0-640.el7.x86_64
Steps:
1. Launch guest with secondary disk:
/usr/libexec/qemu-kvm -name rhel7_4-9343 -m 1G -smp 2 -drive file=/home/kvm_autotest_root/images/rhel74-64-virtio.qcow2,format=qcow2,if=none,cache=none,media=disk,werror=stop,rerror=stop,id=drive-0 -device virtio-blk-pci,drive=drive-0,id=virtio-blk-0,bootindex=0 -drive file=/home/test,format=qcow2,if=none,cache=none,aio=native,id=drive-virtio-blk1,werror=stop,rerror=stop -device virtio-blk-pci,drive=drive-virtio-blk1,id=virtio-blk1,bus=pci.0,addr=0x14,serial="QEMU-DISK2" -monitor stdio -qmp tcp:0:5555,server,nowait -vnc :3 -netdev tap,id=hostnet0,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=70:e2:84:14:0e:15
2.Live snapshot:
{ "execute": "blockdev-snapshot-sync", "arguments": { "device": "drive-virtio-blk1", "snapshot-file": "/home/data-sn1", "format": "qcow2", "mode": "absolute-paths" } }
{ "execute": "blockdev-snapshot-sync", "arguments": { "device": "drive-virtio-blk1", "snapshot-file": "/home/data-sn2", "format": "qcow2", "mode": "absolute-paths" } }
3. dd files inside guest to differ sn2 from sn1:
dd if=/dev/urandom of=/dev/vdb2 bs=1M count=2000
4. Block commit:
{ "execute": "block-commit", "arguments": { "device": "drive-virtio-blk1","base":"/home/data-sn1", "top": "/home/data-sn2", "speed": 0 } }
5. Hot unplug device:
{"execute":"device_del","arguments":{"id":"virtio-blk1"}}
Results:
Both qemu and guest hang, until block commit finishes.
Verified on:
qemu-kvm-rhev-2.9.0-1.el7.x86_64.rpm
kernel-3.10.0-640.el7.x86_64
Steps same as above.
Results:
Both qemu and guest works well and block job is cancelled after hot unplug device:
{ "execute": "block-commit", "arguments": { "device": "drive-virtio-blk1","base":"/home/data-sn1", "top": "/home/data-sn2", "speed": 0 } }
{"return": {}}
{"execute":"device_del","arguments":{"id":"virtio-blk1"}}
{"return": {}}
{"timestamp": {"seconds": 1493807017, "microseconds": 459789}, "event": "DEVICE_DELETED", "data": {"path": "/machine/peripheral/virtio-blk1/virtio-backend"}}
{"timestamp": {"seconds": 1493807017, "microseconds": 459871}, "event": "DEVICE_DELETED", "data": {"device": "virtio-blk1", "path": "/machine/peripheral/virtio-blk1"}}
{"timestamp": {"seconds": 1493807017, "microseconds": 706040}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "drive-virtio-blk1", "len": 1048576000, "offset": 134217728, "speed": 0, "type": "commit"}}
Moving to VERIFIED per above result.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 |
Description of problem: Boot guest with a second disk, then do live snapshot (length of chain >=3). During block-commit from snapshot2 to snapshot1, hot-unplug this device, guest and qemu will hang until block-commit finished. And there are call trace in guest. Version-Release number of selected component (if applicable): Host:rhel7.2 Kernel:3.10.0-316.el7.x86_64 qemu-kvm-rhev:qemu-kvm-rhev-2.3.0-23.el7.x86_64 Guest:rhel6.7z Kernel:2.6.32-573.7.1.el6.x86_64 How reproducible: 100% Steps to Reproduce: 1. boot guest with a data disk # /usr/libexec/qemu-kvm -name rhel6.7z -machine pc-i440fx-rhel7.1.0,accel=kvm \ -cpu SandyBridge -m 4G,slots=256,maxmem=40G -numa node \ -smp 4,sockets=2,cores=2,threads=1 \ -uuid 82b1a01e-5f6c-4f5f-8d27-3855a74e6b61 \ -netdev tap,id=hostnet0 \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=12:54:00:5c:88:61 \ -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=16 \ -vnc 0.0.0.0:1 \ -monitor stdio \ -qmp tcp:0:5555,server,nowait \ -serial unix:/tmp/monitor,server,nowait \ -drive file=/mnt/disk1/rhel6.7_virtio.qcow2,format=qcow2,if=none,id=drive-virtio-blk0,werror=stop,rerror=stop \ -device virtio-blk-pci,drive=drive-virtio-blk0,id=virtio-blk0 \ -drive file=/mnt/disk2/data1.qcow2,format=qcow2,if=none,id=drive-virtio-blk1,werror=stop,rerror=stop \ -device virtio-blk-pci,drive=drive-virtio-blk1,id=virtio-blk1 \ 2. do live snapshot to this data disk { "execute": "blockdev-snapshot-sync", "arguments": { "device": "drive-virtio-blk1", "snapshot-file": "/mnt/disk4/snapshot1.qcow2", "format": "qcow2", "mode": "absolute-paths" } } {"return": {}} { "execute": "blockdev-snapshot-sync", "arguments": { "device": "drive-virtio-blk1", "snapshot-file": "/mnt/disk4/snapshot2.qcow2", "format": "qcow2", "mode": "absolute-paths" } } 3. during block-commit, hot-unplug this data disk, guest and qemu both hang. { "execute": "block-commit", "arguments": { "device": "drive-virtio-blk1","base": "/mnt/disk4/snapshot1.qcow2", "top": "/mnt/disk4/snapshot2.qcow2", "speed": 0 } } {"return": {}} {"execute":"device_del","arguments":{"id":"virtio-blk1"}} {"return": {}} 4. the device will be deleted until block-commit finished. Meanwhile, qemu and guest become responsive again. {"timestamp": {"seconds": 1442543658, "microseconds": 293415}, "event": "DEVICE_DELETED", "data": {"path": "/machine/peripheral/virtio-blk1/virtio-backend"}} {"timestamp": {"seconds": 1442543658, "microseconds": 293520}, "event": "DEVICE_DELETED", "data": {"device": "virtio-blk1", "path": "/machine/peripheral/virtio-blk1"}} {"timestamp": {"seconds": 1442543659, "microseconds": 39327}, "event": "BLOCK_JOB_READY", "data": {"device": "", "len": 4067885056, "offset": 4067885056, "speed": 0, "type": "commit"}} {"timestamp": {"seconds": 1442543659, "microseconds": 39415}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "", "len": 4067885056, "offset": 4067885056, "speed": 0, "type": "commit"}} 5. call trace appear in guest. # dmesg BUG: soft lockup - CPU#3 stuck for 67s! [udevd:567] Clocksource tsc unstable (delta = -137436967906 ns) Modules linked in: ext2 autofs4 ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 sg microcode serio_raw virtio_net i2c_piix4 i2c_core ext4 jbd2 mbcache virtio_blk sr_mod cdrom virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib] CPU 3 Modules linked in: ext2 autofs4 ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 sg microcode serio_raw virtio_net i2c_piix4 i2c_core ext4 jbd2 mbcache virtio_blk sr_mod cdrom virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib] Pid: 567, comm: udevd Not tainted 2.6.32-573.7.1.el6.x86_64 #1 Red Hat KVM RIP: 0010:[<ffffffff8105435a>] [<ffffffff8105435a>] flush_tlb_others_ipi+0x11a/0x130 RSP: 0000:ffff88013719bb38 EFLAGS: 00000246 RAX: 0000000000000000 RBX: ffff88013719bb78 RCX: 0000000000000004 RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff81e3e8d8 RBP: ffffffff8100bc0e R08: 0000000000000000 R09: 0000000000000000 R10: ffff8801370a8100 R11: 0000000000000246 R12: ffff88013719bb28 R13: ffffffff8100bc0e R14: 0000000100000001 R15: ffff880137223400 FS: 00007f7c25b3b7a0(0000) GS:ffff880028380000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7c275b25c8 CR3: 00000001369d0000 CR4: 00000000000406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process udevd (pid: 567, threadinfo ffff880137198000, task ffff88013726eab0) Stack: 00007f7c275b25c8 ffffffff81e3e8c0 ffff88013726eab0 ffff8801370a8100 <d> 00007f7c275b25c8 ffff8801370a83c8 0000000000000008 ffffea000440ace8 <d> ffff88013719bba8 ffffffff810543e6 ffff88013719bba8 ffff8801370a8100 Call Trace: [<ffffffff810543e6>] ? native_flush_tlb_others+0x76/0x90 [<ffffffff8105450e>] ? flush_tlb_page+0x5e/0xb0 [<ffffffff810534dd>] ? ptep_set_access_flags+0x6d/0x70 [<ffffffff811519e3>] ? do_wp_page+0x493/0x920 [<ffffffff8115266d>] ? handle_pte_fault+0x2cd/0xb20 [<ffffffff81456004>] ? move_addr_to_kernel+0x64/0x70 [<ffffffff81153159>] ? handle_mm_fault+0x299/0x3d0 [<ffffffff8104f156>] ? __do_page_fault+0x146/0x500 [<ffffffff8123e90b>] ? selinux_file_permission+0xfb/0x150 [<ffffffff8153eb4e>] ? do_page_fault+0x3e/0xa0 [<ffffffff8153bef5>] ? page_fault+0x25/0x30 Code: 7d f8 c9 c3 0f 1f 44 00 00 48 8b 05 69 22 bc 00 41 8d b5 f0 00 00 00 4c 89 e7 ff 90 e0 00 00 00 eb 09 0f 1f 80 00 00 00 00 f3 90 <8b> 35 20 87 bb 00 4c 89 e7 e8 58 ab 24 00 85 c0 74 ec eb 90 66 Call Trace: [<ffffffff81054368>] ? flush_tlb_others_ipi+0x128/0x130 [<ffffffff810543e6>] ? native_flush_tlb_others+0x76/0x90 [<ffffffff8105450e>] ? flush_tlb_page+0x5e/0xb0 [<ffffffff810534dd>] ? ptep_set_access_flags+0x6d/0x70 [<ffffffff811519e3>] ? do_wp_page+0x493/0x920 [<ffffffff8115266d>] ? handle_pte_fault+0x2cd/0xb20 [<ffffffff81456004>] ? move_addr_to_kernel+0x64/0x70 [<ffffffff81153159>] ? handle_mm_fault+0x299/0x3d0 [<ffffffff8104f156>] ? __do_page_fault+0x146/0x500 [<ffffffff8123e90b>] ? selinux_file_permission+0xfb/0x150 [<ffffffff8153eb4e>] ? do_page_fault+0x3e/0xa0 [<ffffffff8153bef5>] ? page_fault+0x25/0x30 Disabling lock debugging due to kernel taint virtio-pci 0000:00:05.0: PCI INT A disabled Actual results: 1. guest and qemu both hang until the block-commit is completed. 2. there are call trace in the guest Expected results: 1. guest and qemu should always be responsive 2. the block-commit should be canceled immediately 3. no call trace in the guest Additional info: