Bug 574017
| Summary: | RFE: "virsh list" lists all guests in state "running", when the guests are paused on storage error | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Dan Yasny <dyasny> |
| Component: | libvirt | Assignee: | Jiri Denemark <jdenemar> |
| Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
| Severity: | medium | Docs Contact: | |
| Priority: | low | ||
| Version: | 5.5 | CC: | acathrow, berrange, cww, dallan, jdenemar, llim, mzhan, tburke, virt-maint, xen-maint |
| Target Milestone: | rc | Keywords: | FutureFeature, Triaged |
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | libvirt-0.8.2-12.el5 | Doc Type: | Enhancement |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2011-01-13 22:55:45 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 554476 | ||
|
Description
Dan Yasny
2010-03-16 12:18:30 UTC
libvirt does not set any disk error policy when launching QEMU, so it should be using the default policy, which is to report errors to the guest. The guest should not be pausing at all. Has the KVM default policy been changed somewhere to pause instead ? libvirt in RHEL5 cannot handle a scenario whre the guest pauses, because it does not have any way to receive an event notification of this How did you verify that the guest really is paused, as opposed to the guest OS /appearing/ to be paused by virtue of the kenrel being stuck handling disk I/O errors ? According to Gleb, the default is to stop on enospc in rhel5.5 and upstream. Assuming we stopped, why does virsh list show the VMs as running? > According to Gleb, the default is to stop on enospc in rhel5.5 and upstream. Current upstream is not relevant to this discussion. The RHEL5 behaviour is what's important & this is a deviation from upstream behaviour at the time of this version of QEMU. > Assuming we stopped, why does virsh list show the VMs as running? This is because libvirt has no way to knowing that QEMU stopped. The RHEL5 vintage QEMU had no event notification mechanism upstream. The events patches are custom RHEL addition for VDSM, which libvirt does not support. Fixed in libvirt-0.8.2-1.el5 This one isn't fixed actually, because RHEL5 QEMU doesn't support QMP / events. We would need to wire up the text monitor events to make this work. Ah I got confused by the "Upstream081" label Fixed in libvirt-0.8.2-12.el5 Verified with Passed in below environment:
RHEL5.6-Server-x86_64-KVM
kernel-2.6.18-232.el5
kvm-qemu-img-83-207.el5
libvirt-0.8.2-12.el5
Detailed steps:
1.Create nfs storage and check the size
# mount -t nfs 10.66.93.186:/var/lib/libvirt/images/ /var/lib/libvirt/migrate
# df -h /var/lib/libvirt/migrate/
Filesystem Size Used Avail Use% Mounted on
10.66.93.186:/var/lib/libvirt/images/
29G 16G 12G 57% /var/lib/libvirt/migrate
2.Create 2 guests(test1,test2)on nfs storage, 1 guest (rhel55)on local storage using virt-manager.Make sure the total size for test1 and test2 is larger than available space.
Like for test1:6G, test2:10G.And not allocate the entire virtual disk for these 2 guests.
3.In host,also create a file in nfs storage to prepare the space release in future
# dd if=/dev/zero of=/var/lib/libvirt/migrate/data.img bs=1024 count=1024000
4.After the guests are all finished installation,in host
# df -h /var/lib/libvirt/migrate/
Filesystem Size Used Avail Use% Mounted on
10.66.93.186:/var/lib/libvirt/images/
29G 22G 5.7G 80% /var/lib/libvirt/migrate
# virsh list --all
Id Name State
----------------------------------
3 rhel55 running
7 test2 running
8 test1 running
5.In guest test1 and test2,repeat writing files like following until make the nfs storage are full used.
# dd if=/dev/zero of=/tmp/write-test1 bs=1024 count=1024000
check nfs storage:
# df -h /var/lib/libvirt/migrate/
Filesystem Size Used Avail Use% Mounted on
10.66.93.186:/var/lib/libvirt/images/
29G 29G 0 100% /var/lib/libvirt/migrate
6.Now check guest status:
# virsh list --all
Id Name State
----------------------------------
3 rhel55 running
7 test2 running
8 test1 paused
Also Check if host can ping all the guests or not, found that guest test1 can not ping successfully,other guests can ping successfully.
# python /usr/share/doc/libvirt-python-0.8.2/events-python/event-test.py qemu:///system
...
myDomainEventIOErrorCallback: Domain test1(8) /var/lib/libvirt/migrate/test1.img ide0-hd0 1
myDomainEventCallback1 EVENT: Domain test1(8) Suspended IOError
myDomainEventCallback2 EVENT: Domain test1(8) Suspended IOError
7.Release the space for nfs storage in host
# rm -rf /var/lib/libvirt/migrate/data.img
8.Check if guest test1 can resume well or not
# virsh resume test1
Domain test1 resumed
# python /usr/share/doc/libvirt-python-0.8.2/events-python/event-test.py qemu:///system
.....
myDomainEventCallback1 EVENT: Domain test1(8) Resumed Unpaused
myDomainEventCallback2 EVENT: Domain test1(8) Resumed Unpaused
Finally, using ping and other commands in guest test1 to make sure it is resumed successfully.
------------------
This bug can be reproduced with libvirt-0.8.2-10.el5,using # virsh list --all, all the guests are running status all the time, but actually the guests with nfs storage are paused,and can not ping with host successfully. Libvirt event handle also has no output for IOerror.But Guest in local storage works fine all the time.
*** Bug 536946 has been marked as a duplicate of this bug. *** An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2011-0060.html *** Bug 536947 has been marked as a duplicate of this bug. *** |