Bug 771570

Summary: Restart libvirtd will get error and fail to reconnect domains on nfs storage
Product: Red Hat Enterprise Linux 6 Reporter: Wayne Sun <gsun>
Component: libvirtAssignee: Eric Blake <eblake>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 6.3CC: acathrow, dallan, mzhan, rwu, veillard, weizhan, ydu
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-0.9.9-1.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-20 06:43:26 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 746666    
Bug Blocks:    

Description Wayne Sun 2012-01-04 07:27:08 UTC
Description of problem:
There are defined and running domains on host, the domain images are on 
nfs share. After restart libvirtd, there will get xml parse errors and 
crash the running domains. Selinux is enforcing and virt_use_nfs is on.
# sestatus
SELinux status:                 enabled
SELinuxfs mount:                /selinux
Current mode:                   enforcing
Mode from config file:          enforcing
Policy version:                 24
Policy from config file:        targeted
# getsebool virt_use_nfs
virt_use_nfs --> on


downgrade or update libvirt also have this problem.
I've tried downgrade from 0.9.9.-0rc1 to 0.9.4-23.el6_2.2.x86_64 and 
update from 0.9.4-23.el6_2.2.x86_64 to 0.9.4-23.el6_2.3, guests will crash.

Version-Release number of selected component (if applicable):
libvirt-0.9.4-23.el6_2.3.x86_64


How reproducible:
always

Steps to Reproduce:
1. # virsh list
  Id Name                 State
----------------------------------
   2 apitest1             running
   3 apitest2             running
   4 apitest3             running
   5 apitest4             running
   6 apitest5             running
   7 apitest6             running
   8 apitest7             running
   9 apitest8             running
  10 apitest9             running
  11 apitest10            running

2. restart libvirtd
# service libvirtd restart
Stopping libvirtd daemon:                                  [  OK  ]
Starting libvirtd daemon:                                  [  OK  ]

check log:
14:12:21.379: 56635: error : virSecurityLabelDefParseXMLHelper:2113 : 
XML error: security label is missing

3.
# virsh list
  Id Name                 State
----------------------------------


  
Actual results:
running domains crash

Expected results:
no crash

Additional info:
If the domain not redefined on current libvirt build, another error will 
show up after restart:
15:07:54.682: 81520: error : qemuDomainObjPrivateXMLParse:401 : internal 
error Unknown qemu capabilities flag piix3-usb-uhci

Comment 2 Daniel Veillard 2012-01-04 07:59:56 UTC
Is the bug present on 0.9.4-23.el6 and libvirt-0.9.4-23.el6_2.1 ?

To be 100% clear can you stop and undefine all guests on that
machine, then install 0.9.4-23.el6 on the machine,
then redefine the guests and restart libvirtd, is the problem
still happening.
I'm trying to make sure that the parsing error that you are seeing
are not coming from the fact that the domains were defined with an
incompatible version of libvirt.

Daniel

Comment 3 Wayne Sun 2012-01-04 09:07:56 UTC
(In reply to comment #2)
> Is the bug present on 0.9.4-23.el6 and libvirt-0.9.4-23.el6_2.1 ?
> 
> To be 100% clear can you stop and undefine all guests on that
> machine, then install 0.9.4-23.el6 on the machine,
> then redefine the guests and restart libvirtd, is the problem
> still happening.
> I'm trying to make sure that the parsing error that you are seeing
> are not coming from the fact that the domains were defined with an
> incompatible version of libvirt.
> 
> Daniel

With domain fresh define after install libvirt, then restart libvirtd:

libvirt-0.9.4-23:
fine

0.9.4-23.el6_2.1:
fine

0.9.4-23.el6_2.2:
17:01:10.086: 30314: error : virSecurityLabelDefParseXMLHelper:2113 : XML error: security label is missing

domain crash

libvirt-0.9.4-23.el6_2.3
17:04:43.813: 31250: error : virSecurityLabelDefParseXMLHelper:2113 : XML error: security label is missing

domain crash

libvirt-0.9.9-0rc1:

2012-01-04 08:45:26.266+0000: 26784: error : virSecurityLabelDefParseXMLHelper:2593 : XML error: security label is missing

The running domain crash.

So this is not only happen on this z-stream build, it begin from libvirt-0.9.9-0rc1.

Comment 4 Daniel Veillard 2012-01-04 09:16:30 UTC
What does "domain crash" mean ? Is the qemu-kvm process associated to
the domain killed and doesn't run anymore, or just that the domain 
doesn't show up in the list produced by libvirt/virsh ?

 thanks,

Daniel

Comment 5 Wayne Sun 2012-01-04 09:34:52 UTC
(In reply to comment #4)
> What does "domain crash" mean ? Is the qemu-kvm process associated to
> the domain killed and doesn't run anymore, or just that the domain 
> doesn't show up in the list produced by libvirt/virsh ?
> 
>  thanks,
> 
> Daniel

1. check domain status
# virsh list
 Id Name                 State
----------------------------------
  2 apitest1             running


# ps aux|grep qemu-kvm|grep -v grep
qemu      1703  1.0  0.2 595724 28080 ?        Sl   17:21   0:00 /usr/libexec/qemu-kvm -S -M rhel6.2.0 -no-kvm -m 215 -smp 1,sockets=1,cores=1,threads=1 -name apitest1 -uuid ce64f9ed-57e5-9d0d-7363-e04f4f9c1094 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/apitest1.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -no-acpi -drive file=/var/lib/libvirt/images/apitest1,if=none,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -usb -spice port=5900,addr=0.0.0.0,disable-ticketing -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4

2.restart libvirtd
# service libvirtd restart
Stopping libvirtd daemon:                                  [  OK  ]
Starting libvirtd daemon:                                  [  OK  ]

3.recheck
# virsh list
 Id Name                 State
----------------------------------

# ps aux|grep qemu-kvm|grep -v grep
qemu      1703  0.9  0.2 595724 28080 ?        Sl   17:21   0:00 /usr/libexec/qemu-kvm -S -M rhel6.2.0 -no-kvm -m 215 -smp 1,sockets=1,cores=1,threads=1 -name apitest1 -uuid ce64f9ed-57e5-9d0d-7363-e04f4f9c1094 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/apitest1.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -no-acpi -drive file=/var/lib/libvirt/images/apitest1,if=none,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -usb -spice port=5900,addr=0.0.0.0,disable-ticketing -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4


Confirmed on libvirt 0.9.4-23.el6_2.2, 0.9.4-23.el6_2.3 and 0.9.9-0rc1.
So, no output from virsh list, but the qemu-kvm process still exist.

Comment 6 Eric Blake 2012-01-04 21:36:03 UTC
Can you attach the /var/run/libvirt/qemu/$dom.xml file of the domain that is failing to restore?  I suspect that my recent changes to address bug 746666 have an error where the XML being generated on output is not quite being reparsed correctly back on input when libvirtd restarts, but I need to see the XML in question to make sure.

Meanwhile, I'm still investigating, and hope to have a patch soon.

Comment 7 Eric Blake 2012-01-04 22:01:41 UTC
(In reply to comment #0)
> Description of problem:
> There are defined and running domains on host, the domain images are on 
> nfs share. After restart libvirtd, there will get xml parse errors and 
> crash the running domains. Selinux is enforcing and virt_use_nfs is on.
> # sestatus
> SELinux status:                 enabled
> SELinuxfs mount:                /selinux
> Current mode:                   enforcing
> Mode from config file:          enforcing
> Policy version:                 24
> Policy from config file:        targeted
> # getsebool virt_use_nfs
> virt_use_nfs --> on
> 
> 
> downgrade or update libvirt also have this problem.
> I've tried downgrade from 0.9.9.-0rc1 to 0.9.4-23.el6_2.2.x86_64 and 
> update from 0.9.4-23.el6_2.2.x86_64 to 0.9.4-23.el6_2.3, guests will crash.

Downgrading is not generally a supported operation, although if we can trivially support it, we should.  That is, once a guest XML has been written with a newer libvirt, there is no guarantee that an older libvirt will parse it correctly.

I'm more worried about the upgrade path - if a guest was started during an older libvirt, then libvirt is upgraded, the restarted libvirtd should not have any problems reading that older xml.

Comment 9 Eric Blake 2012-01-04 23:04:49 UTC
Upstream patch proposed:
https://www.redhat.com/archives/libvir-list/2012-January/msg00148.html

Comment 10 Eric Blake 2012-01-05 16:00:08 UTC
commit 302fe95ffa1bc5f1c61c0beb31a1adfbc38c668e
Author: Eric Blake <eblake>
Date:   Wed Jan 4 16:01:24 2012 -0700

    seclabel: fix regression in libvirtd restart
    
    Commit b434329 has a logic bug: seclabel overrides don't set
    def->type, but the default value is 0 (aka static).  Restarting
    libvirtd would thus reject the XML for any domain with an
    override of <seclabel relabel='no'/> (which happens quite
    easily if a disk image lives on NFS), with a message:
    
    2012-01-04 22:29:40.949+0000: 6769: error : virSecurityLabelDefParseXMLHelper:2593 : XML error: security label is missing
    Fix the logic to never read from an override's def->type, and
    to allow a missing <label> subelement when relabel is no.  There's
    a lot of stupid double-negatives in the code (!norelabel) because
    of the way that we want the zero-initialized defaults to behave.
    
    * src/conf/domain_conf.c (virSecurityLabelDefParseXMLHelper): Use
    type field from correct location.

Comment 12 yanbing du 2012-01-10 07:13:07 UTC
Test this bug on libvirt-0.9.9-1.el6.x86_64.
Follow the reproduce steps of this bug, after restart libvirtd, guest can list by virsh list, which still in running status. So move bug to VERIFIED.

Comment 14 errata-xmlrpc 2012-06-20 06:43:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-0748.html