Description of problem:
Try to hot-unplugged a busy rng device from linux guest, the device can't be removed, device still can't be removed after making the device to be free.
Version-Release number of selected component (if applicable):
guest kernel: 3.10.0-123.el7
Steps to Reproduce:
1. launch guest with one rng device
qemu-kvm ... -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,id=device-rng0,rng=rng0
2. read random data from guest rng device
guest) # dd if=/dev/hwrng of=/dev/null &
3. try to hot-unplug device from monitor
(qemu) device_del device-rng0
4. check if device still exists
(qemu) info pci
5. kill dd process in guest
6. repeat step 3, 4
7, repeat step 2
step4: busy rng device can't be hot-unplugged
step6: unused device can't be hot-unplugged, and we can't read data from this device in guest
device can be hot-unplugged in step 6
With RHEL6 guest, when we hot-remove the device from QEMU monitor, the dd process in guest will exit and the device can be hot-removed from QEMU.
So I move this bug to RHEL7 kernel.
Likely a dup of bug 1081431. Note that I can't reproduce this bug upstream, or in my RHEL7.0 VM. Also, from the other bug, RHEL6 rmmod does succeed, but the guest panics after some time passes.
(In reply to Amit Shah from comment #3)
> Likely a dup of bug 1081431. Note that I can't reproduce this bug upstream,
> or in my RHEL7.0 VM. Also, from the other bug, RHEL6 rmmod does succeed,
> but the guest panics after some time passes.
I can't reproduce this bug in latest upstream kernel.
I can reproduce this bug in kernel-3.10.0-123.el7.x86_64 and kernel-3.10.0-140.el7.x86_64
*** This bug has been marked as a duplicate of bug 1081431 ***
It's not a duplicated bug of bug 1081431.
Posted a fix to Upstream:
When we try to hot-remove a busy virtio-rng device from QEMU monitor,
the device can't be hot-removed. Because virtio-rng driver hangs at
This patch fixed the hang by completing have_data completion before
unregistering a virtio-rng device.
I found _another_ hotunplug issue only in rhel7 kernel (upstream + amit 4 patches + PATCH  works well).
hot-remove a busy device, it's fail. Kill reading process (dd), still can't hot-remove the device. This is the difference with the hotplug issue I fixed in comment #6 by PATCH .
I try to backport mutiple dev support + core/rng fixes (amit 4 patches + my patch) to rhel7 kernel.
Then those two files (drivers/char/hw_random/core.c, drivers/char/hw_random/virtio-rng.c) are _almost completely same_ as Upstream. But this hotplug issue still exists. Strange ;/
In the same time, I also found another Bug 1127062.
 [PATCH] virtio-rng: complete have_data completion in removing device
Posted a 2nd version to upstream:
[PATCH v2] virtio-rng: fix stuck of hot-unplugging busy device
1. Hotplug remove virtio-rng0, dd process will exit with an error:
"dd: error reading ‘/dev/hwrng’: No such device"
virtio-rng0 disappear from 'info pci'
2. Re-read by dd, hotplug virtio-rng1, dd process exit with same
error, virtio-rng1 disappear
Test result of 3.10.0-145.el7.x86_64 & 3.10.0-161.el7.x86_64:
(this problem doesn't exist in latest upstream, core.c and virtio-rng.c are same as internal, it means we have some internal bug in other part)
Start guest, and directly hotunplug the rng device, wait some minutes, we can get this kernel message:
[ 360.634054] INFO: task kworker/0:4:598 blocked for more than 120 seconds.
[ 360.636207] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 360.638569] kworker/0:4 D ffff88007fc14600 0 598 2 0x00000080
[ 360.640817] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
[ 360.642530] ffff88007b8f9a80 0000000000000046 ffff88007b8f9fd8 0000000000014600
[ 360.644658] ffff88007b8f9fd8 0000000000014600 ffff88007b8f0000 ffffffff81991d60
[ 360.646440] ffffffff81991d64 ffff88007b8f0000 00000000ffffffff ffffffff81991d68
[ 360.648233] Call Trace:
[ 360.648804] [<ffffffff815ef479>] schedule_preempt_disabled+0x29/0x70
[ 360.650286] [<ffffffff815ed1a5>] __mutex_lock_slowpath+0xc5/0x1c0
[ 360.651717] [<ffffffff815ec60f>] mutex_lock+0x1f/0x2f
[ 360.652905] [<ffffffff813b0549>] hwrng_unregister+0x19/0x110
[ 360.654236] [<ffffffffa02630b1>] remove_common+0x41/0x70 [virtio_rng]
[ 360.655743] [<ffffffffa026310e>] virtrng_remove+0xe/0x10 [virtio_rng]
[ 360.657245] [<ffffffffa0022093>] virtio_dev_remove+0x23/0x80 [virtio]
[ 360.658750] [<ffffffff813c770f>] __device_release_driver+0x7f/0xf0
[ 360.660181] [<ffffffff813c77a3>] device_release_driver+0x23/0x30
[ 360.661570] [<ffffffff813c6f18>] bus_remove_device+0x108/0x180
[ 360.662951] [<ffffffff813c3475>] device_del+0x135/0x1d0
[ 360.664193] [<ffffffff813c352e>] device_unregister+0x1e/0x60
[ 360.665517] [<ffffffffa00224b6>] unregister_virtio_device+0x16/0x30 [virtio]
[ 360.668633] [<ffffffffa00b456b>] virtio_pci_remove+0x2b/0x70 [virtio_pci]
[ 360.671687] [<ffffffff812fcbfb>] pci_device_remove+0x3b/0xb0
[ 360.674482] [<ffffffff813c770f>] __device_release_driver+0x7f/0xf0
[ 360.677398] [<ffffffff813c77a3>] device_release_driver+0x23/0x30
[ 360.680253] [<ffffffff812f5cd4>] pci_stop_bus_device+0x94/0xa0
[ 360.683016] [<ffffffff812f5dc2>] pci_stop_and_remove_bus_device+0x12/0x20
[ 360.685993] [<ffffffff81313046>] disable_slot+0x76/0xd0
[ 360.688631] [<ffffffff81313e23>] acpiphp_disable_and_eject_slot+0x23/0xa0
[ 360.691589] [<ffffffff81313f4b>] hotplug_event+0xab/0x260
[ 360.694253] [<ffffffff8131412a>] hotplug_event_work+0x2a/0x60
[ 360.696979] [<ffffffff813317d3>] acpi_hotplug_work_fn+0x1c/0x27
[ 360.699724] [<ffffffff81089a1b>] process_one_work+0x17b/0x460
[ 360.702437] [<ffffffff8108a7eb>] worker_thread+0x11b/0x400
[ 360.705092] [<ffffffff8108a6d0>] ? rescuer_thread+0x400/0x400
[ 360.707791] [<ffffffff81091bbf>] kthread+0xcf/0xe0
[ 360.710342] [<ffffffff81091af0>] ? kthread_create_on_node+0x140/0x140
[ 360.713198] [<ffffffff815f90ec>] ret_from_fork+0x7c/0xb0
[ 360.715812] [<ffffffff81091af0>] ? kthread_create_on_node+0x140/0x140
I backported two patches to internal, the but stuck still exists.
[PATCH] virtio-rng: skip reading when we start to remove the device
[PATCH] virtio-rng: fix stuck of hot-unplugging busy device
The stuck message doesn't exist in upstream before applied my two patches.
*** Bug 1146437 has been marked as a duplicate of this bug. ***
Bug was fixed in Upstream:
[PATCH 1/2] virtio-rng: fix stuck of hot-unplugging busy device
[PATCH 2/2] virtio-rng: skip reading when we start to remove the device
[PATCH 1/6] hw_random: place mutex around read functions and buffers.
[PATCH 2/6] hw_random: move some code out mutex_lock for avoiding underlying deadlock
[PATCH 3/6] hw_random: use reference counts on each struct hwrng.
[PATCH 4/6] hw_random: fix unregister race.
[PATCH 5/6] hw_random: don't double-check old_rng.
[PATCH 6/6] hw_random: don't init list element we're about to add to list.
[PATCH 1/5] hwrng: core - Use struct completion for cleanup_done
[PATCH 2/5] hwrng: core - Fix current_rng init/cleanup race yet again
[PATCH 3/5] hwrng: core - Do not register device opportunistically
[PATCH 4/5] hwrng: core - Drop current rng in set_current_rng
[PATCH 5/5] hwrng: core - Move hwrng_init call into set_current_rng
(In reply to Amos Kong from comment #16)
> Bug was fixed in Upstream:
> [PATCH 1/2] virtio-rng: fix stuck of hot-unplugging busy device
> [PATCH 2/2] virtio-rng: skip reading when we start to remove the device
> [PATCH 1/6] hw_random: place mutex around read functions and buffers.
> [PATCH 2/6] hw_random: move some code out mutex_lock for avoiding underlying
> [PATCH 3/6] hw_random: use reference counts on each struct hwrng.
> [PATCH 4/6] hw_random: fix unregister race.
> [PATCH 5/6] hw_random: don't double-check old_rng.
> [PATCH 6/6] hw_random: don't init list element we're about to add to list.
> [PATCH 1/5] hwrng: core - Use struct completion for cleanup_done
> [PATCH 2/5] hwrng: core - Fix current_rng init/cleanup race yet again
> [PATCH 3/5] hwrng: core - Do not register device opportunistically
> [PATCH 4/5] hwrng: core - Drop current rng in set_current_rng
> [PATCH 5/5] hwrng: core - Move hwrng_init call into set_current_rng
The commit list here is the same as 1127062. Close as duplicated.
*** This bug has been marked as a duplicate of bug 1127062 ***