Bug 1264255 - When hot-unplug a device which is doing block-commit, guest and qemu will hang until the commit finished, and call trace appears in guest
When hot-unplug a device which is doing block-commit, guest and qemu will han...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev (Show other bugs)
7.2
x86_64 Linux
medium Severity medium
: rc
: ---
Assigned To: Jeff Cody
Qianqian Zhu
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-09-17 22:52 EDT by Pei Zhang
Modified: 2017-08-01 23:24 EDT (History)
8 users (show)

See Also:
Fixed In Version: qemu-kvm-rhev-2.9.0-1.el7
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-08-01 19:29:42 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Pei Zhang 2015-09-17 22:52:57 EDT
Description of problem:
Boot guest with a second disk, then do live snapshot (length of chain >=3).  During block-commit from snapshot2 to snapshot1, hot-unplug this device, guest and qemu will hang until block-commit finished. And there are call trace in guest.

Version-Release number of selected component (if applicable):
Host:rhel7.2
Kernel:3.10.0-316.el7.x86_64
qemu-kvm-rhev:qemu-kvm-rhev-2.3.0-23.el7.x86_64

Guest:rhel6.7z
Kernel:2.6.32-573.7.1.el6.x86_64


How reproducible:
100%

Steps to Reproduce:
1. boot guest with a data disk
# /usr/libexec/qemu-kvm -name rhel6.7z -machine pc-i440fx-rhel7.1.0,accel=kvm \
-cpu SandyBridge -m 4G,slots=256,maxmem=40G -numa node \
-smp 4,sockets=2,cores=2,threads=1 \
-uuid 82b1a01e-5f6c-4f5f-8d27-3855a74e6b61 \
-netdev tap,id=hostnet0 \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=12:54:00:5c:88:61 \
-device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=16 \
-vnc 0.0.0.0:1 \
-monitor stdio \
-qmp tcp:0:5555,server,nowait \
-serial unix:/tmp/monitor,server,nowait \
-drive file=/mnt/disk1/rhel6.7_virtio.qcow2,format=qcow2,if=none,id=drive-virtio-blk0,werror=stop,rerror=stop \
-device virtio-blk-pci,drive=drive-virtio-blk0,id=virtio-blk0 \
-drive file=/mnt/disk2/data1.qcow2,format=qcow2,if=none,id=drive-virtio-blk1,werror=stop,rerror=stop \
-device virtio-blk-pci,drive=drive-virtio-blk1,id=virtio-blk1 \

2. do live snapshot to this data disk
{ "execute": "blockdev-snapshot-sync", "arguments": { "device": "drive-virtio-blk1", "snapshot-file": "/mnt/disk4/snapshot1.qcow2", "format": "qcow2", "mode": "absolute-paths" } }
{"return": {}}
{ "execute": "blockdev-snapshot-sync", "arguments": { "device": "drive-virtio-blk1", "snapshot-file": "/mnt/disk4/snapshot2.qcow2", "format": "qcow2", "mode": "absolute-paths" } }

3. during block-commit, hot-unplug this data disk, guest and qemu both hang.
{ "execute": "block-commit", "arguments": { "device": "drive-virtio-blk1","base": "/mnt/disk4/snapshot1.qcow2", "top": "/mnt/disk4/snapshot2.qcow2", "speed": 0 } }
{"return": {}}
{"execute":"device_del","arguments":{"id":"virtio-blk1"}}
{"return": {}}


4. the device will be deleted until block-commit finished. Meanwhile, qemu and guest become responsive again. 
{"timestamp": {"seconds": 1442543658, "microseconds": 293415}, "event": "DEVICE_DELETED", "data": {"path": "/machine/peripheral/virtio-blk1/virtio-backend"}}
{"timestamp": {"seconds": 1442543658, "microseconds": 293520}, "event": "DEVICE_DELETED", "data": {"device": "virtio-blk1", "path": "/machine/peripheral/virtio-blk1"}}
{"timestamp": {"seconds": 1442543659, "microseconds": 39327}, "event": "BLOCK_JOB_READY", "data": {"device": "", "len": 4067885056, "offset": 4067885056, "speed": 0, "type": "commit"}}
{"timestamp": {"seconds": 1442543659, "microseconds": 39415}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "", "len": 4067885056, "offset": 4067885056, "speed": 0, "type": "commit"}}

5. call trace appear in guest.
# dmesg
BUG: soft lockup - CPU#3 stuck for 67s! [udevd:567]
Clocksource tsc unstable (delta = -137436967906 ns)
Modules linked in: ext2 autofs4 ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 sg microcode serio_raw virtio_net i2c_piix4 i2c_core ext4 jbd2 mbcache virtio_blk sr_mod cdrom virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
CPU 3 
Modules linked in: ext2 autofs4 ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 sg microcode serio_raw virtio_net i2c_piix4 i2c_core ext4 jbd2 mbcache virtio_blk sr_mod cdrom virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]

Pid: 567, comm: udevd Not tainted 2.6.32-573.7.1.el6.x86_64 #1 Red Hat KVM
RIP: 0010:[<ffffffff8105435a>]  [<ffffffff8105435a>] flush_tlb_others_ipi+0x11a/0x130
RSP: 0000:ffff88013719bb38  EFLAGS: 00000246
RAX: 0000000000000000 RBX: ffff88013719bb78 RCX: 0000000000000004
RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff81e3e8d8
RBP: ffffffff8100bc0e R08: 0000000000000000 R09: 0000000000000000
R10: ffff8801370a8100 R11: 0000000000000246 R12: ffff88013719bb28
R13: ffffffff8100bc0e R14: 0000000100000001 R15: ffff880137223400
FS:  00007f7c25b3b7a0(0000) GS:ffff880028380000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f7c275b25c8 CR3: 00000001369d0000 CR4: 00000000000406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process udevd (pid: 567, threadinfo ffff880137198000, task ffff88013726eab0)
Stack:
 00007f7c275b25c8 ffffffff81e3e8c0 ffff88013726eab0 ffff8801370a8100
<d> 00007f7c275b25c8 ffff8801370a83c8 0000000000000008 ffffea000440ace8
<d> ffff88013719bba8 ffffffff810543e6 ffff88013719bba8 ffff8801370a8100
Call Trace:
 [<ffffffff810543e6>] ? native_flush_tlb_others+0x76/0x90
 [<ffffffff8105450e>] ? flush_tlb_page+0x5e/0xb0
 [<ffffffff810534dd>] ? ptep_set_access_flags+0x6d/0x70
 [<ffffffff811519e3>] ? do_wp_page+0x493/0x920
 [<ffffffff8115266d>] ? handle_pte_fault+0x2cd/0xb20
 [<ffffffff81456004>] ? move_addr_to_kernel+0x64/0x70
 [<ffffffff81153159>] ? handle_mm_fault+0x299/0x3d0
 [<ffffffff8104f156>] ? __do_page_fault+0x146/0x500
 [<ffffffff8123e90b>] ? selinux_file_permission+0xfb/0x150
 [<ffffffff8153eb4e>] ? do_page_fault+0x3e/0xa0
 [<ffffffff8153bef5>] ? page_fault+0x25/0x30
Code: 7d f8 c9 c3 0f 1f 44 00 00 48 8b 05 69 22 bc 00 41 8d b5 f0 00 00 00 4c 89 e7 ff 90 e0 00 00 00 eb 09 0f 1f 80 00 00 00 00 f3 90 <8b> 35 20 87 bb 00 4c 89 e7 e8 58 ab 24 00 85 c0 74 ec eb 90 66 
Call Trace:
 [<ffffffff81054368>] ? flush_tlb_others_ipi+0x128/0x130
 [<ffffffff810543e6>] ? native_flush_tlb_others+0x76/0x90
 [<ffffffff8105450e>] ? flush_tlb_page+0x5e/0xb0
 [<ffffffff810534dd>] ? ptep_set_access_flags+0x6d/0x70
 [<ffffffff811519e3>] ? do_wp_page+0x493/0x920
 [<ffffffff8115266d>] ? handle_pte_fault+0x2cd/0xb20
 [<ffffffff81456004>] ? move_addr_to_kernel+0x64/0x70
 [<ffffffff81153159>] ? handle_mm_fault+0x299/0x3d0
 [<ffffffff8104f156>] ? __do_page_fault+0x146/0x500
 [<ffffffff8123e90b>] ? selinux_file_permission+0xfb/0x150
 [<ffffffff8153eb4e>] ? do_page_fault+0x3e/0xa0
 [<ffffffff8153bef5>] ? page_fault+0x25/0x30
Disabling lock debugging due to kernel taint
virtio-pci 0000:00:05.0: PCI INT A disabled


Actual results:
1. guest and qemu both hang until the block-commit is completed.
2. there are call trace in the guest


Expected results:
1. guest and qemu should always be responsive
2. the block-commit should be canceled immediately
3. no call trace in the guest

Additional info:
Comment 2 Pei Zhang 2015-09-17 23:28:26 EDT
More info:
1. With above steps, block-commit to base file(at the bottom of snapshot chain) also hit the same issue. 

....
{ "execute": "block-commit", "arguments": { "device": "drive-virtio-blk1"} }
....

2. During block-stream, then hot-unplug the device, guest and qemu will hang until stream finished.

....(Steps like above)
{ "execute": "block-stream", "arguments": { "device": "drive-virtio-blk1","base": "/mnt/disk2/data1.qcow2", "speed": 1000000000,"backing-file":"/mnt/disk2/data1.qcow2"} }
{"return": {}}
{"execute":"device_del","arguments":{"id":"virtio-blk1"}}
{"return": {}}
{"timestamp": {"seconds": 1442546589, "microseconds": 477854}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive-virtio-blk1", "len": 5368709120, "offset": 5368709120, "speed": 1000000000, "type": "stream"}}
{"timestamp": {"seconds": 1442546589, "microseconds": 526804}, "event": "DEVICE_DELETED", "data": {"path": "/machine/peripheral/virtio-blk1/virtio-backend"}}
{"timestamp": {"seconds": 1442546590, "microseconds": 423785}, "event": "DEVICE_DELETED", "data": {"device": "virtio-blk1", "path": "/machine/peripheral/virtio-blk1"}}

check dmesg in guest:
# dmesg
Clocksource tsc unstable (delta = -120257117794 ns)
virtio-pci 0000:00:05.0: PCI INT A disabled
Comment 5 Jeff Cody 2017-04-19 15:15:29 EDT
On device deletion, the running block job will be cancelled, so this behavior no longer exists.
Comment 8 Qianqian Zhu 2017-05-03 06:25:42 EDT
Reproduced on:
qemu-kvm-rhev-2.3.0-22.el7.x86_64
kernel-3.10.0-640.el7.x86_64

Steps:
1. Launch guest with secondary disk:
/usr/libexec/qemu-kvm -name rhel7_4-9343 -m 1G -smp 2  -drive file=/home/kvm_autotest_root/images/rhel74-64-virtio.qcow2,format=qcow2,if=none,cache=none,media=disk,werror=stop,rerror=stop,id=drive-0 -device virtio-blk-pci,drive=drive-0,id=virtio-blk-0,bootindex=0 -drive file=/home/test,format=qcow2,if=none,cache=none,aio=native,id=drive-virtio-blk1,werror=stop,rerror=stop -device virtio-blk-pci,drive=drive-virtio-blk1,id=virtio-blk1,bus=pci.0,addr=0x14,serial="QEMU-DISK2"  -monitor stdio -qmp tcp:0:5555,server,nowait -vnc :3 -netdev tap,id=hostnet0,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=70:e2:84:14:0e:15

2.Live snapshot:
{ "execute": "blockdev-snapshot-sync", "arguments": { "device": "drive-virtio-blk1", "snapshot-file": "/home/data-sn1", "format": "qcow2", "mode": "absolute-paths" } }
{ "execute": "blockdev-snapshot-sync", "arguments": { "device": "drive-virtio-blk1", "snapshot-file": "/home/data-sn2", "format": "qcow2", "mode": "absolute-paths" } }

3. dd files inside guest to differ sn2 from sn1:
dd if=/dev/urandom of=/dev/vdb2 bs=1M count=2000

4. Block commit:
{ "execute": "block-commit", "arguments": { "device": "drive-virtio-blk1","base":"/home/data-sn1", "top": "/home/data-sn2", "speed": 0 } }

5. Hot unplug device:
{"execute":"device_del","arguments":{"id":"virtio-blk1"}}

Results:
Both qemu and guest hang, until block commit finishes.

Verified on:
qemu-kvm-rhev-2.9.0-1.el7.x86_64.rpm
kernel-3.10.0-640.el7.x86_64

Steps same as above.

Results:
Both qemu and guest works well and block job is cancelled after hot unplug device:
{ "execute": "block-commit", "arguments": { "device": "drive-virtio-blk1","base":"/home/data-sn1", "top": "/home/data-sn2", "speed": 0 } }
{"return": {}}
{"execute":"device_del","arguments":{"id":"virtio-blk1"}}
{"return": {}}
{"timestamp": {"seconds": 1493807017, "microseconds": 459789}, "event": "DEVICE_DELETED", "data": {"path": "/machine/peripheral/virtio-blk1/virtio-backend"}}
{"timestamp": {"seconds": 1493807017, "microseconds": 459871}, "event": "DEVICE_DELETED", "data": {"device": "virtio-blk1", "path": "/machine/peripheral/virtio-blk1"}}
{"timestamp": {"seconds": 1493807017, "microseconds": 706040}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "drive-virtio-blk1", "len": 1048576000, "offset": 134217728, "speed": 0, "type": "commit"}}

Moving to VERIFIED per above result.
Comment 10 errata-xmlrpc 2017-08-01 19:29:42 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392
Comment 11 errata-xmlrpc 2017-08-01 21:07:21 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392
Comment 12 errata-xmlrpc 2017-08-01 21:59:20 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392
Comment 13 errata-xmlrpc 2017-08-01 22:40:06 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392
Comment 14 errata-xmlrpc 2017-08-01 23:04:50 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392
Comment 15 errata-xmlrpc 2017-08-01 23:24:58 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Note You need to log in before you can comment on or make changes to this bug.