Red Hat Bugzilla – Bug 771570
Restart libvirtd will get error and fail to reconnect domains on nfs storage
Last modified: 2012-06-20 02:43:26 EDT
Description of problem: There are defined and running domains on host, the domain images are on nfs share. After restart libvirtd, there will get xml parse errors and crash the running domains. Selinux is enforcing and virt_use_nfs is on. # sestatus SELinux status: enabled SELinuxfs mount: /selinux Current mode: enforcing Mode from config file: enforcing Policy version: 24 Policy from config file: targeted # getsebool virt_use_nfs virt_use_nfs --> on downgrade or update libvirt also have this problem. I've tried downgrade from 0.9.9.-0rc1 to 0.9.4-23.el6_2.2.x86_64 and update from 0.9.4-23.el6_2.2.x86_64 to 0.9.4-23.el6_2.3, guests will crash. Version-Release number of selected component (if applicable): libvirt-0.9.4-23.el6_2.3.x86_64 How reproducible: always Steps to Reproduce: 1. # virsh list Id Name State ---------------------------------- 2 apitest1 running 3 apitest2 running 4 apitest3 running 5 apitest4 running 6 apitest5 running 7 apitest6 running 8 apitest7 running 9 apitest8 running 10 apitest9 running 11 apitest10 running 2. restart libvirtd # service libvirtd restart Stopping libvirtd daemon: [ OK ] Starting libvirtd daemon: [ OK ] check log: 14:12:21.379: 56635: error : virSecurityLabelDefParseXMLHelper:2113 : XML error: security label is missing 3. # virsh list Id Name State ---------------------------------- Actual results: running domains crash Expected results: no crash Additional info: If the domain not redefined on current libvirt build, another error will show up after restart: 15:07:54.682: 81520: error : qemuDomainObjPrivateXMLParse:401 : internal error Unknown qemu capabilities flag piix3-usb-uhci
Is the bug present on 0.9.4-23.el6 and libvirt-0.9.4-23.el6_2.1 ? To be 100% clear can you stop and undefine all guests on that machine, then install 0.9.4-23.el6 on the machine, then redefine the guests and restart libvirtd, is the problem still happening. I'm trying to make sure that the parsing error that you are seeing are not coming from the fact that the domains were defined with an incompatible version of libvirt. Daniel
(In reply to comment #2) > Is the bug present on 0.9.4-23.el6 and libvirt-0.9.4-23.el6_2.1 ? > > To be 100% clear can you stop and undefine all guests on that > machine, then install 0.9.4-23.el6 on the machine, > then redefine the guests and restart libvirtd, is the problem > still happening. > I'm trying to make sure that the parsing error that you are seeing > are not coming from the fact that the domains were defined with an > incompatible version of libvirt. > > Daniel With domain fresh define after install libvirt, then restart libvirtd: libvirt-0.9.4-23: fine 0.9.4-23.el6_2.1: fine 0.9.4-23.el6_2.2: 17:01:10.086: 30314: error : virSecurityLabelDefParseXMLHelper:2113 : XML error: security label is missing domain crash libvirt-0.9.4-23.el6_2.3 17:04:43.813: 31250: error : virSecurityLabelDefParseXMLHelper:2113 : XML error: security label is missing domain crash libvirt-0.9.9-0rc1: 2012-01-04 08:45:26.266+0000: 26784: error : virSecurityLabelDefParseXMLHelper:2593 : XML error: security label is missing The running domain crash. So this is not only happen on this z-stream build, it begin from libvirt-0.9.9-0rc1.
What does "domain crash" mean ? Is the qemu-kvm process associated to the domain killed and doesn't run anymore, or just that the domain doesn't show up in the list produced by libvirt/virsh ? thanks, Daniel
(In reply to comment #4) > What does "domain crash" mean ? Is the qemu-kvm process associated to > the domain killed and doesn't run anymore, or just that the domain > doesn't show up in the list produced by libvirt/virsh ? > > thanks, > > Daniel 1. check domain status # virsh list Id Name State ---------------------------------- 2 apitest1 running # ps aux|grep qemu-kvm|grep -v grep qemu 1703 1.0 0.2 595724 28080 ? Sl 17:21 0:00 /usr/libexec/qemu-kvm -S -M rhel6.2.0 -no-kvm -m 215 -smp 1,sockets=1,cores=1,threads=1 -name apitest1 -uuid ce64f9ed-57e5-9d0d-7363-e04f4f9c1094 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/apitest1.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -no-acpi -drive file=/var/lib/libvirt/images/apitest1,if=none,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -usb -spice port=5900,addr=0.0.0.0,disable-ticketing -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 2.restart libvirtd # service libvirtd restart Stopping libvirtd daemon: [ OK ] Starting libvirtd daemon: [ OK ] 3.recheck # virsh list Id Name State ---------------------------------- # ps aux|grep qemu-kvm|grep -v grep qemu 1703 0.9 0.2 595724 28080 ? Sl 17:21 0:00 /usr/libexec/qemu-kvm -S -M rhel6.2.0 -no-kvm -m 215 -smp 1,sockets=1,cores=1,threads=1 -name apitest1 -uuid ce64f9ed-57e5-9d0d-7363-e04f4f9c1094 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/apitest1.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -no-acpi -drive file=/var/lib/libvirt/images/apitest1,if=none,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -usb -spice port=5900,addr=0.0.0.0,disable-ticketing -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 Confirmed on libvirt 0.9.4-23.el6_2.2, 0.9.4-23.el6_2.3 and 0.9.9-0rc1. So, no output from virsh list, but the qemu-kvm process still exist.
Can you attach the /var/run/libvirt/qemu/$dom.xml file of the domain that is failing to restore? I suspect that my recent changes to address bug 746666 have an error where the XML being generated on output is not quite being reparsed correctly back on input when libvirtd restarts, but I need to see the XML in question to make sure. Meanwhile, I'm still investigating, and hope to have a patch soon.
(In reply to comment #0) > Description of problem: > There are defined and running domains on host, the domain images are on > nfs share. After restart libvirtd, there will get xml parse errors and > crash the running domains. Selinux is enforcing and virt_use_nfs is on. > # sestatus > SELinux status: enabled > SELinuxfs mount: /selinux > Current mode: enforcing > Mode from config file: enforcing > Policy version: 24 > Policy from config file: targeted > # getsebool virt_use_nfs > virt_use_nfs --> on > > > downgrade or update libvirt also have this problem. > I've tried downgrade from 0.9.9.-0rc1 to 0.9.4-23.el6_2.2.x86_64 and > update from 0.9.4-23.el6_2.2.x86_64 to 0.9.4-23.el6_2.3, guests will crash. Downgrading is not generally a supported operation, although if we can trivially support it, we should. That is, once a guest XML has been written with a newer libvirt, there is no guarantee that an older libvirt will parse it correctly. I'm more worried about the upgrade path - if a guest was started during an older libvirt, then libvirt is upgraded, the restarted libvirtd should not have any problems reading that older xml.
Upstream patch proposed: https://www.redhat.com/archives/libvir-list/2012-January/msg00148.html
commit 302fe95ffa1bc5f1c61c0beb31a1adfbc38c668e Author: Eric Blake <eblake@redhat.com> Date: Wed Jan 4 16:01:24 2012 -0700 seclabel: fix regression in libvirtd restart Commit b434329 has a logic bug: seclabel overrides don't set def->type, but the default value is 0 (aka static). Restarting libvirtd would thus reject the XML for any domain with an override of <seclabel relabel='no'/> (which happens quite easily if a disk image lives on NFS), with a message: 2012-01-04 22:29:40.949+0000: 6769: error : virSecurityLabelDefParseXMLHelper:2593 : XML error: security label is missing Fix the logic to never read from an override's def->type, and to allow a missing <label> subelement when relabel is no. There's a lot of stupid double-negatives in the code (!norelabel) because of the way that we want the zero-initialized defaults to behave. * src/conf/domain_conf.c (virSecurityLabelDefParseXMLHelper): Use type field from correct location.
Test this bug on libvirt-0.9.9-1.el6.x86_64. Follow the reproduce steps of this bug, after restart libvirtd, guest can list by virsh list, which still in running status. So move bug to VERIFIED.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2012-0748.html