Description of problem: libvirt will hang when try to attach a invalid disk device to guest. Version-Release number of selected component (if applicable): libvirt-1.2.8-12.el7.x86_64 qemu-kvm-rhev-2.1.2-17.el7.x86_64 kernel-3.10.0-220.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1. start a vm: #virsh start vm1 2. when vm1 booted, attach a invalid disk device to it, such as follow: #virsh attach-disk vm1 /dev/tty7 vdb Actual result: libvirt hangs after following output, nothing can be done until kill and restart libvirtd. [root@lento xml]# virsh attach-disk vm1 /dev/tty7 vdb 2015-01-09 09:00:28.138+0000: 16758: info : libvirt version: 1.2.8, package: 12.el7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2015-01-07-07:41:03, x86-019.build.eng.bos.redhat.com) 2015-01-09 09:00:28.138+0000: 16758: warning : virKeepAliveTimerInternal:143 : No response from client 0x7f688eddc470 after 6 keepalive messages in 35 seconds 2015-01-09 09:00:28.138+0000: 16757: warning : virKeepAliveTimerInternal:143 : No response from client 0x7f688eddc470 after 6 keepalive messages in 35 seconds error: Failed to attach disk error: internal error: received hangup / error event on socket Expect result: Libvirt resumes after the error messages.
The following upstream series fixes the problem with locking the domains while gathering the domain list. With this change it should be possible to destroy the VM that is stuck by attaching a disk that blocks on reading. commit 85d8ede9eb1da870e553f43dea215606ec47d190 Author: Peter Krempa <pkrempa> Date: Wed Apr 29 16:41:20 2015 +0200 qemu: Convert qemuConnectGetAllDomainStats to use new helpers Use the new domain list collection helpers to avoid going through virDomainPtrs. This additionally implements filter capability when called through the api that accepts domain list filters. commit 83726a14d294c8cacf1e0decf5e55f84fba1c1c8 Author: Peter Krempa <pkrempa> Date: Wed Apr 29 16:15:53 2015 +0200 conf: Add helper to convert list of virDomains to a list of virDomainObjs Add virDomainObjListConvert that will take a list of virDomains, apply filters and return a list of virDomainObjs. commit cbe7bbf722a4c5b276238d4cc50c2ac5d407f800 Author: Peter Krempa <pkrempa> Date: Wed Apr 29 14:11:09 2015 +0200 conf: Refactor domain list collection critical section Until now the virDomainListAllDomains API would lock the domain list and then every single domain object to access and filter it. This would potentially allow a unresponsive VM to block the whole daemon if a *listAllDomains call would get stuck. To avoid this problem this patch collects a list of referenced domain objects first from the list and then unlocks it right away. The expensive operation requiring locking of the domain object is executed after the list lock is dropped. While a single blocked domain will still lock up a listAllDomains call, the domain list won't be held locked and thus other APIs won't be blocked. Additionally this patch also fixes the lookup code, where we'd ignore the vm->removing flag and thus potentially return domain objects that would be deleted very soon so calling any API wouldn't make sense. As other clients also could benefit from operating on a list of domain objects rather than the public domain descriptors a new intermediate API - virDomainObjListCollect - is introduced by this patch. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1181074 commit 7906d5fbbba1229f10777f581f91a30063d48847 Author: Peter Krempa <pkrempa> Date: Wed Apr 29 15:25:34 2015 +0200 conf: Rename virDomainObjListFilter type to virDomainObjListACLFilter The passed function is meant to filter domains according to ACL match. commit 684675c33b1a7c9322e2d6d5db8ba87b4d81bac4 Author: Peter Krempa <pkrempa> Date: Wed Apr 29 13:18:37 2015 +0200 conf: Extract code to filter domain list into a separate function Separate the code to simplify future refactors. commit a5e89ae16e8098e9b77a98517f2d10a5d60e7737 Author: Peter Krempa <pkrempa> Date: Wed Apr 29 11:54:58 2015 +0200 util: Make the virDomainListFree helper more universal Extend it to a universal helper used for clearing lists of any objects. Note that the argument type is specifically void * to allow implicit typecasting. Additionally add a helper that works on non-NULL terminated arrays once
Hi Peter, I tried this issue with following packages libvirt-1.2.17-2.el7.x86_64 qemu-kvm-rhev-2.3.0-12.el7.x86_64 3.10.0-297.el7.x86_64 And the behaviour is a little different but still fail Now, when I do: #virsh attach-disk vm1 /dev/tty7 vdb It's hanging there for 2 hours and nothing output in terminal. And the libvirt debug log attached, pls have a check.
Well, the patches that I've posted don't actually fix the hanging of the attach process. That happens since you've attached basically a pipe with nothing to be read from to the VM and libvirt is trying to read it. That is expected to happen and qemu would also get stuck once it would open it. The patches described above fix the problem that while virsh attach-disk was stuck a different virsh process would not be able to list the domains nor kill the one that got stuck. With the patches above you should be able to "virsh list" and "virsh destroy vm1" in a second instance.
(In reply to Peter Krempa from comment #4) > Well, the patches that I've posted don't actually fix the hanging of the > attach process. That happens since you've attached basically a pipe with > nothing to be read from to the VM and libvirt is trying to read it. That is > expected to happen and qemu would also get stuck once it would open it. > > The patches described above fix the problem that while virsh attach-disk was > stuck a different virsh process would not be able to list the domains nor > kill the one that got stuck. With the patches above you should be able to > "virsh list" and "virsh destroy vm1" in a second instance. When one terminal stuck by virsh-attach, I opened a second terminal to run virsh list, and it's still stuck [root@localhost log]# time virsh list Id Name State ---------------------------------------------------- 2 vm1 running real 0m0.013s user 0m0.005s sys 0m0.003s [root@localhost log]# time virsh list <=== hang here.
Hm, indeed, while whith the patches above the domain list access doesn't get stuck, the user-facing behavior doesn't change since the individual VMs have to be locked for filtering.
This bug is going to be addressed in next major release.
This bug was closed deferred as a result of bug triage. Please reopen if you disagree and provide justification why this bug should get enough priority. Most important would be information about impact on customer or layered product. Please indicate requested target release.