Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 771570 - Restart libvirtd will get error and fail to reconnect domains on nfs storage
Restart libvirtd will get error and fail to reconnect domains on nfs storage
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt (Show other bugs)
6.3
Unspecified Unspecified
high Severity high
: rc
: ---
Assigned To: Eric Blake
Virtualization Bugs
:
Depends On: 746666
Blocks:
  Show dependency treegraph
 
Reported: 2012-01-04 02:27 EST by Wayne Sun
Modified: 2012-06-20 02:43 EDT (History)
7 users (show)

See Also:
Fixed In Version: libvirt-0.9.9-1.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-06-20 02:43:26 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2012:0748 normal SHIPPED_LIVE Low: libvirt security, bug fix, and enhancement update 2012-06-19 15:31:38 EDT

  None (edit)
Description Wayne Sun 2012-01-04 02:27:08 EST
Description of problem:
There are defined and running domains on host, the domain images are on 
nfs share. After restart libvirtd, there will get xml parse errors and 
crash the running domains. Selinux is enforcing and virt_use_nfs is on.
# sestatus
SELinux status:                 enabled
SELinuxfs mount:                /selinux
Current mode:                   enforcing
Mode from config file:          enforcing
Policy version:                 24
Policy from config file:        targeted
# getsebool virt_use_nfs
virt_use_nfs --> on


downgrade or update libvirt also have this problem.
I've tried downgrade from 0.9.9.-0rc1 to 0.9.4-23.el6_2.2.x86_64 and 
update from 0.9.4-23.el6_2.2.x86_64 to 0.9.4-23.el6_2.3, guests will crash.

Version-Release number of selected component (if applicable):
libvirt-0.9.4-23.el6_2.3.x86_64


How reproducible:
always

Steps to Reproduce:
1. # virsh list
  Id Name                 State
----------------------------------
   2 apitest1             running
   3 apitest2             running
   4 apitest3             running
   5 apitest4             running
   6 apitest5             running
   7 apitest6             running
   8 apitest7             running
   9 apitest8             running
  10 apitest9             running
  11 apitest10            running

2. restart libvirtd
# service libvirtd restart
Stopping libvirtd daemon:                                  [  OK  ]
Starting libvirtd daemon:                                  [  OK  ]

check log:
14:12:21.379: 56635: error : virSecurityLabelDefParseXMLHelper:2113 : 
XML error: security label is missing

3.
# virsh list
  Id Name                 State
----------------------------------


  
Actual results:
running domains crash

Expected results:
no crash

Additional info:
If the domain not redefined on current libvirt build, another error will 
show up after restart:
15:07:54.682: 81520: error : qemuDomainObjPrivateXMLParse:401 : internal 
error Unknown qemu capabilities flag piix3-usb-uhci
Comment 2 Daniel Veillard 2012-01-04 02:59:56 EST
Is the bug present on 0.9.4-23.el6 and libvirt-0.9.4-23.el6_2.1 ?

To be 100% clear can you stop and undefine all guests on that
machine, then install 0.9.4-23.el6 on the machine,
then redefine the guests and restart libvirtd, is the problem
still happening.
I'm trying to make sure that the parsing error that you are seeing
are not coming from the fact that the domains were defined with an
incompatible version of libvirt.

Daniel
Comment 3 Wayne Sun 2012-01-04 04:07:56 EST
(In reply to comment #2)
> Is the bug present on 0.9.4-23.el6 and libvirt-0.9.4-23.el6_2.1 ?
> 
> To be 100% clear can you stop and undefine all guests on that
> machine, then install 0.9.4-23.el6 on the machine,
> then redefine the guests and restart libvirtd, is the problem
> still happening.
> I'm trying to make sure that the parsing error that you are seeing
> are not coming from the fact that the domains were defined with an
> incompatible version of libvirt.
> 
> Daniel

With domain fresh define after install libvirt, then restart libvirtd:

libvirt-0.9.4-23:
fine

0.9.4-23.el6_2.1:
fine

0.9.4-23.el6_2.2:
17:01:10.086: 30314: error : virSecurityLabelDefParseXMLHelper:2113 : XML error: security label is missing

domain crash

libvirt-0.9.4-23.el6_2.3
17:04:43.813: 31250: error : virSecurityLabelDefParseXMLHelper:2113 : XML error: security label is missing

domain crash

libvirt-0.9.9-0rc1:

2012-01-04 08:45:26.266+0000: 26784: error : virSecurityLabelDefParseXMLHelper:2593 : XML error: security label is missing

The running domain crash.

So this is not only happen on this z-stream build, it begin from libvirt-0.9.9-0rc1.
Comment 4 Daniel Veillard 2012-01-04 04:16:30 EST
What does "domain crash" mean ? Is the qemu-kvm process associated to
the domain killed and doesn't run anymore, or just that the domain 
doesn't show up in the list produced by libvirt/virsh ?

 thanks,

Daniel
Comment 5 Wayne Sun 2012-01-04 04:34:52 EST
(In reply to comment #4)
> What does "domain crash" mean ? Is the qemu-kvm process associated to
> the domain killed and doesn't run anymore, or just that the domain 
> doesn't show up in the list produced by libvirt/virsh ?
> 
>  thanks,
> 
> Daniel

1. check domain status
# virsh list
 Id Name                 State
----------------------------------
  2 apitest1             running


# ps aux|grep qemu-kvm|grep -v grep
qemu      1703  1.0  0.2 595724 28080 ?        Sl   17:21   0:00 /usr/libexec/qemu-kvm -S -M rhel6.2.0 -no-kvm -m 215 -smp 1,sockets=1,cores=1,threads=1 -name apitest1 -uuid ce64f9ed-57e5-9d0d-7363-e04f4f9c1094 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/apitest1.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -no-acpi -drive file=/var/lib/libvirt/images/apitest1,if=none,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -usb -spice port=5900,addr=0.0.0.0,disable-ticketing -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4

2.restart libvirtd
# service libvirtd restart
Stopping libvirtd daemon:                                  [  OK  ]
Starting libvirtd daemon:                                  [  OK  ]

3.recheck
# virsh list
 Id Name                 State
----------------------------------

# ps aux|grep qemu-kvm|grep -v grep
qemu      1703  0.9  0.2 595724 28080 ?        Sl   17:21   0:00 /usr/libexec/qemu-kvm -S -M rhel6.2.0 -no-kvm -m 215 -smp 1,sockets=1,cores=1,threads=1 -name apitest1 -uuid ce64f9ed-57e5-9d0d-7363-e04f4f9c1094 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/apitest1.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -no-acpi -drive file=/var/lib/libvirt/images/apitest1,if=none,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -usb -spice port=5900,addr=0.0.0.0,disable-ticketing -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4


Confirmed on libvirt 0.9.4-23.el6_2.2, 0.9.4-23.el6_2.3 and 0.9.9-0rc1.
So, no output from virsh list, but the qemu-kvm process still exist.
Comment 6 Eric Blake 2012-01-04 16:36:03 EST
Can you attach the /var/run/libvirt/qemu/$dom.xml file of the domain that is failing to restore?  I suspect that my recent changes to address bug 746666 have an error where the XML being generated on output is not quite being reparsed correctly back on input when libvirtd restarts, but I need to see the XML in question to make sure.

Meanwhile, I'm still investigating, and hope to have a patch soon.
Comment 7 Eric Blake 2012-01-04 17:01:41 EST
(In reply to comment #0)
> Description of problem:
> There are defined and running domains on host, the domain images are on 
> nfs share. After restart libvirtd, there will get xml parse errors and 
> crash the running domains. Selinux is enforcing and virt_use_nfs is on.
> # sestatus
> SELinux status:                 enabled
> SELinuxfs mount:                /selinux
> Current mode:                   enforcing
> Mode from config file:          enforcing
> Policy version:                 24
> Policy from config file:        targeted
> # getsebool virt_use_nfs
> virt_use_nfs --> on
> 
> 
> downgrade or update libvirt also have this problem.
> I've tried downgrade from 0.9.9.-0rc1 to 0.9.4-23.el6_2.2.x86_64 and 
> update from 0.9.4-23.el6_2.2.x86_64 to 0.9.4-23.el6_2.3, guests will crash.

Downgrading is not generally a supported operation, although if we can trivially support it, we should.  That is, once a guest XML has been written with a newer libvirt, there is no guarantee that an older libvirt will parse it correctly.

I'm more worried about the upgrade path - if a guest was started during an older libvirt, then libvirt is upgraded, the restarted libvirtd should not have any problems reading that older xml.
Comment 9 Eric Blake 2012-01-04 18:04:49 EST
Upstream patch proposed:
https://www.redhat.com/archives/libvir-list/2012-January/msg00148.html
Comment 10 Eric Blake 2012-01-05 11:00:08 EST
commit 302fe95ffa1bc5f1c61c0beb31a1adfbc38c668e
Author: Eric Blake <eblake@redhat.com>
Date:   Wed Jan 4 16:01:24 2012 -0700

    seclabel: fix regression in libvirtd restart
    
    Commit b434329 has a logic bug: seclabel overrides don't set
    def->type, but the default value is 0 (aka static).  Restarting
    libvirtd would thus reject the XML for any domain with an
    override of <seclabel relabel='no'/> (which happens quite
    easily if a disk image lives on NFS), with a message:
    
    2012-01-04 22:29:40.949+0000: 6769: error : virSecurityLabelDefParseXMLHelper:2593 : XML error: security label is missing
    Fix the logic to never read from an override's def->type, and
    to allow a missing <label> subelement when relabel is no.  There's
    a lot of stupid double-negatives in the code (!norelabel) because
    of the way that we want the zero-initialized defaults to behave.
    
    * src/conf/domain_conf.c (virSecurityLabelDefParseXMLHelper): Use
    type field from correct location.
Comment 12 yanbing du 2012-01-10 02:13:07 EST
Test this bug on libvirt-0.9.9-1.el6.x86_64.
Follow the reproduce steps of this bug, after restart libvirtd, guest can list by virsh list, which still in running status. So move bug to VERIFIED.
Comment 14 errata-xmlrpc 2012-06-20 02:43:26 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-0748.html

Note You need to log in before you can comment on or make changes to this bug.