Bug 1050834

Summary: lockdep warning in flush_work() when hotunplugging a virtio-scsi disk (scsi-block + iscsi://)
Product: Red Hat Enterprise Linux 7 Reporter: Sibiao Luo <sluo>
Component: kernelAssignee: Paolo Bonzini <pbonzini>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.0CC: areis, chayang, hhuang, juli, juzhang, knoel, michen, pbonzini, qzhang, rbalakri, sluo, theo, virt-maint, xfu
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-3.10.0-180.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1066623 (view as bug list) Environment:
Last Closed: 2015-03-05 11:38:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1066623    

Description Sibiao Luo 2014-01-09 07:56:16 UTC
Description of problem:
boot up a guest with a passthrough libiscsi disk, then do hotunplug it that will cause guest call trace.

Version-Release number of selected component (if applicable):
host info:
3.10.0-66.el7.x86_64.debug
qemu-kvm-rhev-1.5.3-31.el7.x86_64
seabios-1.7.2.2-7.el7.x86_64
guest info:
3.10.0-66.el7.x86_64.debug

How reproducible:
3/3

Steps to Reproduce:
1.boot up a guest with a passthrough libiscsi disk.
e.g:...-drive file=iscsi://10.66.90.100:3260/iqn.2001-05.com.equallogic:0-8a0906-4fb1f7d03-455f49b421252a57-s2-sluo-270305-2/0,if=none,id=drive-data-disk,format=raw,cache=none,aio=native,werror=stop,rerror=stop -iscsi id=iqn1 -device virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x7 -device scsi-block,drive=drive-data-disk,bus=scsi1.0,id=data-disk
2.do hotunplug it.
{"execute":"qmp_capabilities"}
{"return": {}}
{ "execute": "device_del", "arguments": { "id":"data-disk" } }
{"timestamp": {"seconds": 1389253742, "microseconds": 499167}, "event": "DEVICE_DELETED", "data": {"device": "data-disk", "path": "/machine/peripheral/data-disk"}}
{"return": {}}
{ "execute": "device_del", "arguments": { "id":"scsi1" } }
{"return": {}}
{"timestamp": {"seconds": 1389253746, "microseconds": 668262}, "event": "DEVICE_DELETED", "data": {"path": "/machine/peripheral/scsi1/virtio-backend"}}
{"timestamp": {"seconds": 1389253746, "microseconds": 671996}, "event": "DEVICE_DELETED", "data": {"device": "scsi1", "path": "/machine/peripheral/scsi1"}}

Actual results:
after step 2, guest call trace, i will paste the call trace log later.

Expected results:
it should no any call trace when do hot-unplug.

Additional info:
# /usr/libexec/qemu-kvm -M pc -S -cpu SandyBridge -enable-kvm -m 4096 -smp 4,sockets=2,cores=2,threads=1 -no-kvm-pit-reinjection -usb -device usb-tablet,id=input0 -name sluo -uuid 990ea161-6b67-47b2-b803-19fb01d30d30 -rtc base=localtime,clock=host,driftfix=slew -device virtio-serial-pci,id=virtio-serial0,max_ports=16,vectors=0,bus=pci.0,addr=0x3 -chardev socket,id=channel1,path=/tmp/helloworld1,server,nowait -device virtserialport,chardev=channel1,name=com.redhat.rhevm.vdsm,bus=virtio-serial0.0,id=port1 -chardev socket,id=channel2,path=/tmp/helloworld2,server,nowait -device virtserialport,chardev=channel2,name=com.redhat.rhevm.vdsm,bus=virtio-serial0.0,id=port2 -drive file=iscsi://10.66.90.100:3260/iqn.2001-05.com.equallogic:0-8a0906-4c41f7d03-453f49b421052a57-s2-sluo-270305-1/0,if=none,id=drive-system-disk,cache=none,aio=native,werror=stop,rerror=stop -iscsi id=iqn0 -device virtio-blk-pci,vectors=0,bus=pci.0,addr=0x4,scsi=off,drive=drive-system-disk,id=system-disk,bootindex=1 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=00:01:02:03:40:21,bus=pci.0,addr=0x5 -device virtio-balloon-pci,id=ballooning,bus=pci.0,addr=0x6 -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -k en-us -boot menu=on -qmp tcp:0:5555,server,nowait -serial unix:/tmp/ttyS0,server,nowait -vnc :2 -spice disable-ticketing,port=5932 -monitor stdio -drive file=iscsi://10.66.90.100:3260/iqn.2001-05.com.equallogic:0-8a0906-4fb1f7d03-455f49b421252a57-s2-sluo-270305-2/0,if=none,id=drive-data-disk,format=raw,cache=none,aio=native,werror=stop,rerror=stop -iscsi id=iqn1 -device virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x7 -device scsi-block,drive=drive-data-disk,bus=scsi1.0,id=data-disk

Comment 1 Sibiao Luo 2014-01-09 07:57:19 UTC
guest] # dmesg 
[  121.635565] sd 2:0:0:0: alua: Detached
[  125.741393] INFO: trying to register non-static key.
[  125.742022] the code is fine but needs lockdep annotation.
[  125.742022] turning off the locking correctness validator.
[  125.742022] CPU: 0 PID: 43 Comm: kworker/0:1 Not tainted 3.10.0-66.el7.x86_64.debug #1
[  125.742022] Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011
[  125.742022] Workqueue: kacpi_hotplug _handle_hotplug_event_func
[  125.742022]  ffff880118408000 ffff880118403820 ffffffff816771c3 ffff880118403830
[  125.742022]  ffffffff81670c54 ffff8801184038a0 ffffffff810e1f65 00000000001d6240
[  125.742022]  0000000000000000 ffff880118403878 0000000000000003 ffff880100000000
[  125.742022] Call Trace:
[  125.742022]  [<ffffffff816771c3>] dump_stack+0x19/0x1b
[  125.742022]  [<ffffffff81670c54>] register_lock_class.part.25+0x38/0x3c
[  125.742022]  [<ffffffff810e1f65>] __lock_acquire+0xf65/0x1260
[  125.742022]  [<ffffffff810e2a52>] lock_acquire+0xa2/0x1f0
[  125.742022]  [<ffffffff8108c035>] ? flush_work+0x5/0x2e0
[  125.742022]  [<ffffffff8108c07e>] flush_work+0x4e/0x2e0
[  125.742022]  [<ffffffff8108c035>] ? flush_work+0x5/0x2e0
[  125.742022]  [<ffffffff8104e49f>] ? kvm_clock_read+0x2f/0x50
[  125.742022]  [<ffffffff810e09b9>] ? mark_held_locks+0xb9/0x140
[  125.742022]  [<ffffffff8108f242>] ? __cancel_work_timer+0x82/0x130
[  125.742022]  [<ffffffff8108f255>] __cancel_work_timer+0x95/0x130
[  125.742022]  [<ffffffff8108f300>] cancel_work_sync+0x10/0x20
[  125.742022]  [<ffffffffa00a949d>] virtscsi_remove+0x4d/0x90 [virtio_scsi]
[  125.742022]  [<ffffffffa0023093>] virtio_dev_remove+0x23/0x80 [virtio]
[  125.742022]  [<ffffffff81408fff>] __device_release_driver+0x7f/0xf0
[  125.742022]  [<ffffffff81409095>] device_release_driver+0x25/0x40
[  125.742022]  [<ffffffff814087fc>] bus_remove_device+0x11c/0x1a0
[  125.742022]  [<ffffffff81405032>] device_del+0x142/0x1e0
[  125.742022]  [<ffffffff814050ee>] device_unregister+0x1e/0x60
[  125.742022]  [<ffffffffa00234b6>] unregister_virtio_device+0x16/0x30 [virtio]
[  125.742022]  [<ffffffffa00b256b>] virtio_pci_remove+0x2b/0x70 [virtio_pci]
[  125.742022]  [<ffffffff813424ab>] pci_device_remove+0x3b/0xb0
[  125.742022]  [<ffffffff81408fff>] __device_release_driver+0x7f/0xf0
[  125.742022]  [<ffffffff81409095>] device_release_driver+0x25/0x40
[  125.742022]  [<ffffffff814087fc>] bus_remove_device+0x11c/0x1a0
[  125.742022]  [<ffffffff81405032>] device_del+0x142/0x1e0
[  125.742022]  [<ffffffff8133bf44>] pci_stop_bus_device+0x94/0xa0
[  125.742022]  [<ffffffff8133c032>] pci_stop_and_remove_bus_device+0x12/0x20
[  125.742022]  [<ffffffff813584ae>] acpiphp_disable_slot+0x8e/0x140
[  125.742022]  [<ffffffff81394e7b>] ? acpi_get_name+0x43/0xae
[  125.742022]  [<ffffffff81358ac0>] hotplug_event_func+0x70/0x1a0
[  125.742022]  [<ffffffff81358c17>] _handle_hotplug_event_func+0x27/0x60
[  125.742022]  [<ffffffff8108e2f1>] process_one_work+0x211/0x6e0
[  125.742022]  [<ffffffff8108e285>] ? process_one_work+0x1a5/0x6e0
[  125.742022]  [<ffffffff8108e8db>] worker_thread+0x11b/0x3a0
[  125.742022]  [<ffffffff8108e7c0>] ? process_one_work+0x6e0/0x6e0
[  125.742022]  [<ffffffff81096efd>] kthread+0xed/0x100
[  125.742022]  [<ffffffff81096e10>] ? insert_kthread_work+0x80/0x80
[  125.742022]  [<ffffffff8168a82c>] ret_from_fork+0x7c/0xb0
[  125.742022]  [<ffffffff81096e10>] ? insert_kthread_work+0x80/0x80
[  125.798732] ACPI: Device does not support D3cold

Comment 2 Sibiao Luo 2014-01-09 07:59:32 UTC
If use virtio-blk interface with the same test that will not reproduce this issue.
e,g:...-drive file=iscsi://10.66.90.100:3260/iqn.2001-05.com.equallogic:0-8a0906-4fb1f7d03-455f49b421252a57-s2-sluo-270305-2/0,if=none,id=drive-data-disk,format=raw,cache=none,aio=native,werror=stop,rerror=stop -iscsi id=iqn1 -device virtio-blk-pci,bus=pci.0,addr=0x7,drive=drive-data-disk,id=data-disk

{"execute":"qmp_capabilities"}
{"return": {}}
{ "execute": "device_del", "arguments": { "id":"data-disk" } }
{"return": {}}
{"timestamp": {"seconds": 1389254288, "microseconds": 334673}, "event": "DEVICE_DELETED", "data": {"path": "/machine/peripheral/data-disk/virtio-backend"}}
{"timestamp": {"seconds": 1389254288, "microseconds": 339527}, "event": "DEVICE_DELETED", "data": {"device": "data-disk", "path": "/machine/peripheral/data-disk"}}

guest ] # dmesg 
[   64.087481] ACPI: Device does not support D3cold

Best Regards,
sluo

Comment 3 Paolo Bonzini 2014-02-18 16:29:39 UTC
This only happens with kernel-debug, and the problem is in kernel/workqueue.c.  Moving to 7.1 and CCing Tejun.

Comment 4 Tejun Heo 2014-02-18 17:11:05 UTC
Are the work items initialized? Looks like lockdep_map hasn't been initialized yet.

Comment 5 Paolo Bonzini 2014-02-18 17:36:38 UTC
Hmm, probably not.

Comment 6 Tejun Heo 2014-02-18 17:38:25 UTC
Yeah, calling workqueue interface on uninitialized work items isn't a good idea even if they're zeroed. It's not failing catastrophically only through happy accidents.

Comment 7 Paolo Bonzini 2014-02-19 11:06:28 UTC
Sibiao, please retest on a RHEL7 guest with the kernel at

http://brewweb.devel.redhat.com/brew/taskinfo?taskID=7077592

Remember to use kernel-debug.

Comment 8 Paolo Bonzini 2014-02-20 13:23:49 UTC
(adding needinfo, see comment 7)

Comment 13 Jarod Wilson 2014-10-06 13:41:01 UTC
Patch(es) available on kernel-3.10.0-180.el7

Comment 16 Sibiao Luo 2014-11-13 07:53:09 UTC
Verify this issue on kernel-3.10.0-183.el7.x86_64 with the same testing.

host info:
# uname -r && rpm -q qemu-kvm
3.10.0-183.el7.x86_64
qemu-kvm-1.5.3-78.el7.x86_64
guest info:
3.10.0-183.el7.x86_64

Results:
Both guest/host and qemu work well without any crash after hot-unplug with the pass-through libiscsi block device.

Base on above, this issue has been fixed correctly, move this issue to VERIFIED status.

Best Regards,
sluo

Comment 18 errata-xmlrpc 2015-03-05 11:38:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0290.html