Bug 1181074 - libvirt will hang when try to attach a invalid disk device to guest.
Summary: libvirt will hang when try to attach a invalid disk device to guest.
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: libvirt
Version: 8.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: 8.1
Assignee: Virtualization Maintenance
QA Contact: Han Han
URL:
Whiteboard:
Depends On:
Blocks: 1401400
TreeView+ depends on / blocked
 
Reported: 2015-01-12 10:36 UTC by yisun
Modified: 2020-02-11 13:02 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-02-11 13:02:27 UTC
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)

Description yisun 2015-01-12 10:36:21 UTC
Description of problem:
 libvirt will hang when try to attach a invalid disk device to guest.


Version-Release number of selected component (if applicable):
 libvirt-1.2.8-12.el7.x86_64
 qemu-kvm-rhev-2.1.2-17.el7.x86_64
 kernel-3.10.0-220.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
 1. start a vm:
   #virsh start vm1
 2. when vm1 booted, attach a invalid disk device to it, such as follow:
   #virsh attach-disk vm1 /dev/tty7 vdb

 Actual result:
 libvirt hangs after following output, nothing can be done until kill and
 restart libvirtd.
 [root@lento xml]# virsh attach-disk vm1 /dev/tty7 vdb
 2015-01-09 09:00:28.138+0000: 16758: info : libvirt version: 1.2.8, package:
 12.el7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>,
 2015-01-07-07:41:03, x86-019.build.eng.bos.redhat.com)
 2015-01-09 09:00:28.138+0000: 16758: warning : virKeepAliveTimerInternal:143
 : No response from client 0x7f688eddc470 after 6 keepalive messages in 35
 seconds
 2015-01-09 09:00:28.138+0000: 16757: warning : virKeepAliveTimerInternal:143
 : No response from client 0x7f688eddc470 after 6 keepalive messages in 35
 seconds
 error: Failed to attach disk
 error: internal error: received hangup / error event on socket


 Expect result:
 Libvirt resumes after the error messages.

Comment 1 Peter Krempa 2015-05-11 06:55:20 UTC
The following upstream series fixes the problem with locking the domains while gathering the domain list. With this change it should be possible to destroy the VM that is stuck by attaching a disk that blocks on reading.

commit 85d8ede9eb1da870e553f43dea215606ec47d190
Author: Peter Krempa <pkrempa>
Date:   Wed Apr 29 16:41:20 2015 +0200

    qemu: Convert qemuConnectGetAllDomainStats to use new helpers
    
    Use the new domain list collection helpers to avoid going through
    virDomainPtrs.
    
    This additionally implements filter capability when called through the
    api that accepts domain list filters.

commit 83726a14d294c8cacf1e0decf5e55f84fba1c1c8
Author: Peter Krempa <pkrempa>
Date:   Wed Apr 29 16:15:53 2015 +0200

    conf: Add helper to convert list of virDomains to a list of virDomainObjs
    
    Add virDomainObjListConvert that will take a list of virDomains, apply
    filters and return a list of virDomainObjs.

commit cbe7bbf722a4c5b276238d4cc50c2ac5d407f800
Author: Peter Krempa <pkrempa>
Date:   Wed Apr 29 14:11:09 2015 +0200

    conf: Refactor domain list collection critical section
    
    Until now the virDomainListAllDomains API would lock the domain list and
    then every single domain object to access and filter it. This would
    potentially allow a unresponsive VM to block the whole daemon if a
    *listAllDomains call would get stuck.
    
    To avoid this problem this patch collects a list of referenced domain
    objects first from the list and then unlocks it right away. The
    expensive operation requiring locking of the domain object is executed
    after the list lock is dropped. While a single blocked domain will still
    lock up a listAllDomains call, the domain list won't be held locked and
    thus other APIs won't be blocked.
    
    Additionally this patch also fixes the lookup code, where we'd ignore
    the vm->removing flag and thus potentially return domain objects that
    would be deleted very soon so calling any API wouldn't make sense.
    
    As other clients also could benefit from operating on a list of domain
    objects rather than the public domain descriptors a new intermediate
    API - virDomainObjListCollect - is introduced by this patch.
    
    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1181074

commit 7906d5fbbba1229f10777f581f91a30063d48847
Author: Peter Krempa <pkrempa>
Date:   Wed Apr 29 15:25:34 2015 +0200

    conf: Rename virDomainObjListFilter type to virDomainObjListACLFilter
    
    The passed function is meant to filter domains according to ACL match.

commit 684675c33b1a7c9322e2d6d5db8ba87b4d81bac4
Author: Peter Krempa <pkrempa>
Date:   Wed Apr 29 13:18:37 2015 +0200

    conf: Extract code to filter domain list into a separate function
    
    Separate the code to simplify future refactors.

commit a5e89ae16e8098e9b77a98517f2d10a5d60e7737
Author: Peter Krempa <pkrempa>
Date:   Wed Apr 29 11:54:58 2015 +0200

    util: Make the virDomainListFree helper more universal
    
    Extend it to a universal helper used for clearing lists of any objects.
    Note that the argument type is specifically void * to allow implicit
    typecasting.
    
    Additionally add a helper that works on non-NULL terminated arrays once

Comment 3 yisun 2015-07-22 05:44:26 UTC
Hi Peter, 
I tried this issue with following packages
libvirt-1.2.17-2.el7.x86_64
qemu-kvm-rhev-2.3.0-12.el7.x86_64
3.10.0-297.el7.x86_64

And the behaviour is a little different but still fail

Now, when I do: #virsh attach-disk vm1 /dev/tty7 vdb
It's hanging there for 2 hours and nothing output in terminal. And the libvirt debug log attached, pls have a check.

Comment 4 Peter Krempa 2015-07-22 05:51:30 UTC
Well, the patches that I've posted don't actually fix the hanging of the attach process. That happens since you've attached basically a pipe with nothing to be read from to the VM and libvirt is trying to read it. That is expected to happen and qemu would also get stuck once it would open it.

The patches described above fix the problem that while virsh attach-disk was stuck a different virsh process would not be able to list the domains nor kill the one that got stuck. With the patches above you should be able to "virsh list" and "virsh destroy vm1" in a second instance.

Comment 5 yisun 2015-07-22 05:59:23 UTC
(In reply to Peter Krempa from comment #4)
> Well, the patches that I've posted don't actually fix the hanging of the
> attach process. That happens since you've attached basically a pipe with
> nothing to be read from to the VM and libvirt is trying to read it. That is
> expected to happen and qemu would also get stuck once it would open it.
> 
> The patches described above fix the problem that while virsh attach-disk was
> stuck a different virsh process would not be able to list the domains nor
> kill the one that got stuck. With the patches above you should be able to
> "virsh list" and "virsh destroy vm1" in a second instance.

When one terminal stuck by virsh-attach, I opened a second terminal to run virsh list, and it's still stuck 

[root@localhost log]# time virsh list
 Id    Name                           State
----------------------------------------------------
 2     vm1                            running


real	0m0.013s
user	0m0.005s
sys	0m0.003s
[root@localhost log]# time virsh list
<=== hang here.

Comment 6 Peter Krempa 2015-07-23 09:46:32 UTC
Hm, indeed,
while whith the patches above the domain list access doesn't get stuck, the user-facing behavior doesn't change since the individual VMs have to be locked for filtering.

Comment 9 Jaroslav Suchanek 2019-04-24 12:26:42 UTC
This bug is going to be addressed in next major release.

Comment 10 Jaroslav Suchanek 2020-02-11 13:02:27 UTC
This bug was closed deferred as a result of bug triage.

Please reopen if you disagree and provide justification why this bug should
get enough priority. Most important would be information about impact on
customer or layered product. Please indicate requested target release.


Note You need to log in before you can comment on or make changes to this bug.