Bug 1183893 - Fail to do managedsave/save/dump with the guest after change the security-driver from selinux to none
Summary: Fail to do managedsave/save/dump with the guest after change the security-dri...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.1
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: rc
: ---
Assignee: Erik Skultety
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-01-20 06:26 UTC by zhenfeng wang
Modified: 2015-11-19 06:08 UTC (History)
6 users (show)

Fixed In Version: libvirt-1.2.16-1.el7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-11-19 06:08:29 UTC
Target Upstream Version:


Attachments (Terms of Use)
1.2.15 daemon log (1.06 MB, text/plain)
2015-09-25 11:05 UTC, Erik Skultety
no flags Details
1.2.17 daemon log (1.57 MB, text/plain)
2015-09-25 11:06 UTC, Erik Skultety
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:2202 0 normal SHIPPED_LIVE libvirt bug fix and enhancement update 2015-11-19 08:17:58 UTC

Description zhenfeng wang 2015-01-20 06:26:15 UTC
Description of problem:
Fail to do managedsave/save/dump with the guest after change the security-driver from selinux to none

Version-Release number:
libvirt-1.2.8-13.el7.x86_64
kernel-3.10.0-222.el7.x86_64
qemu-kvm-rhev-2.1.2-20.el7.x86_64
selinux-policy-3.13.1-16.el7.noarch

How reproducible:
100%

Steps to Reproduce:

1.Enable selinux in both system and the qemu.conf
# getenforce
Enforcing
#cat /etc/libvirt/qemu.conf
security_driver='selinux'

2.Start a normal guest
#virsh start rhel7f

# ps -efZ|grep qemu
system_u:system_r:svirt_t:s0:c243,c794 qemu 10499  1  1 14:41 ?        00:00:17 /usr/libexec/qemu-kvm -name rhel7f

3.Disable the selinux in qemu.conf
#vim /etc/libvirt/qemu.conf
security_driver='none'
#systemctl restart libvirtd

4.Do managedsave/save/dump with the guest, will get the unclear error
# virsh managedsave rhel7f
error: Failed to save domain rhel7f state
error: internal error: unable to execute QEMU command 'getfd': No file descriptor supplied via SCM_RIGHTS

# virsh save rhel7f /tmp/rhel7f.save
error: Failed to save domain rhel7f to /tmp/rhel7f.save
error: internal error: unable to execute QEMU command 'getfd': No file descriptor supplied via SCM_RIGHTS

# virsh dump rhel7f /tmp/rhel7f.dump
error: Failed to core dump domain rhel7f to /tmp/rhel7f.dump
error: internal error: unable to execute QEMU command 'getfd': No file descriptor supplied via SCM_RIGHTS

5. Check the error info in libvirtd
2015-01-19 07:02:25.911+0000: 10778: debug : virJSONValueToString:1303 : result={"id":"libvirt-72","error":{"class":"GenericError","desc":"No file descriptor supplied via SCM_RIGHTS"}}
2015-01-19 07:02:25.911+0000: 10776: debug : virEventPollCalculateTimeout:340 : Calculate expiry of 2 timers
2015-01-19 07:02:25.911+0000: 10778: debug : qemuMonitorJSONCheckError:370 : unable to execute QEMU command {"execute":"getfd","arguments":{"fdname":"migrate"},"id":"libvirt-72"}: {"id":"libvirt-72","error":{"class":"GenericError","desc":"No file descriptor supplied via SCM_RIGHTS"}}
2015-01-19 07:02:25.911+0000: 10776: debug : virEventPollCalculateTimeout:348 : Got a timeout scheduled for 1421650950905
2015-01-19 07:02:25.911+0000: 10776: debug : virEventPollCalculateTimeout:361 : Schedule timeout then=1421650950905 now=1421650945911
2015-01-19 07:02:25.911+0000: 10776: debug : virEventPollCalculateTimeout:370 : Timeout at 1421650950905 due in 4994 ms
2015-01-19 07:02:25.911+0000: 10778: error : qemuMonitorJSONCheckError:381 : internal error: unable to execute QEMU command 'getfd': No file descriptor supplied via SCM_RIGHTS

6.it works well while do managedsave/save/dump with the guest after change the security_driver from none to selinux

7.Try it with RHEL6.7 host, The guest could do managedsave operation successfully afer change the security-driver from selinux to none

Actual results:
fail to do managedsave/save/dump and report an unclear error

Expected results:
if support do managedsave in such scenarios, then it should be done successfully, if unsupported do such operations, then it should report a clear error

Comment 1 Erik Skultety 2015-05-21 10:44:39 UTC
fixed upstream:

commit fb0b9a2cc50fb7a52eb4afea8ea48e39db74e45b
Author: Erik Skultety <eskultet@redhat.com>
Date:   Tue May 5 13:24:41 2015 +0200

    qemu: Log error if domain uses security driver which is not loaded
    
    When starting a domain, if a domain specifies security drivers we do not have
    loaded, we fail. However we don't check for this during
    reconnect, so any operation relying on security driver functionality would fail.
    If someone e.g. starts a domain with selinux driver loaded, then they change
    the security driver to 'none' in config, restart the daemon and call dump/save/..,
    QEMU will return an error.
    As we shouldn't kill the domain, we should at least log an error to let the
    user know that domain reconnect wasn't completely clean.

v1.2.15-120-gfb0b9a2

Comment 3 zhenfeng wang 2015-07-22 02:10:34 UTC
Hi Erik
Try to verify this bug with libvirt-1.2.17-2.el7.x86_64, fould that still could hit the issue in comment0, can you help check it ? thanks

Comment 4 Erik Skultety 2015-07-22 08:43:48 UTC
What kind of issue? The error returned from QEMU hasn't changed, what the patch in https://bugzilla.redhat.com/show_bug.cgi?id=1183893#c1 does, is that it logs an error to the daemon log (you might want to check if I'm correct, there should be an entry "Unable to find security driver for model selinux"), but it doesn't change the behaviour in any way and that's because the machine from user's point of view is functional (user experience is not affected), from libvirt's point of view it's crippled, but destroying the domain forcefully isn't a great solution as some users may be still using it.
I don't really think this is a bug actually, switching security drivers while having active domains is wrong and should be avoided and if one does that, they should expect some consequences (like libvirt being unable to properly manage the domain).

Comment 5 zhenfeng wang 2015-09-10 09:42:18 UTC
Hi Erik
Sorry to miss your comments previously, yes, it's not a great solution to destroy the domain forcefully while change the security driver from selinux to none. however, i think we could improve its error since it's unclear and i think the error like following might be better then comment0's error, how do you think about it?

 "Unable to find security driver for model selinux"

Comment 6 Erik Skultety 2015-09-24 11:49:06 UTC
Well, if it was a libvirt error, then we could think of providing some logic to handle this case with more specific error message. But it isn't, the thing here is, if QEMU indicates an error to libvirt, we take the error message QEMU provided through monitor (NOT mangling it in any way) propagating it upwards, so that virsh (in this case) outputs exactly what you see right now. Parsing strings just to squeeze out some information would be irrational in my opinion, one way to do it would be to read the "error class" from JSON string. However QEMU classified this error under GenericError which could be almost anything, so there probably isn't any efficient way to improve the error message. Moreover, as comment4 states, changing security drivers with active domains isn't without consequences. Comment4 also describes what the patch really does.

Comment 7 zhenfeng wang 2015-09-25 04:43:36 UTC
Hi Erik
Thanks for your explanation, ok, the error info is ok for me, but still have a doubt that in your comment4 said that what the patch in https://bugzilla.redhat.co/show_bug.cgi?id=1183893#c1 does, is that it logs an error to the daemon log, in fact, the error have already been included in libvirtd log while reported this bug. So check the patch in comment1, found following code added
+    /* if domain requests security driver we haven't loaded, report error, but
+     * do not kill the domain
+     */
+    ignore_value(virSecurityManagerCheckAllLabel(driver->securityManager,
+                                                 obj->def));
+
;

if i didn't misunderstand it, i think the checkpoints that the guest didn't shutdown/disappear automatically and report the upper error were ok for this bug verification while try it with the reproduce steps, right? can you help check anything additional need to do? thanks

Comment 8 Erik Skultety 2015-09-25 11:05:07 UTC
Created attachment 1077001 [details]
1.2.15 daemon log

Comment 9 Erik Skultety 2015-09-25 11:06:21 UTC
Created attachment 1077013 [details]
1.2.17 daemon log

Comment 10 Erik Skultety 2015-09-25 11:31:15 UTC
(In reply to zhenfeng wang from comment #7)
> Hi Erik
> Thanks for your explanation, ok, the error info is ok for me, but still have
> a doubt that in your comment4 said that what the patch in
> https://bugzilla.redhat.co/show_bug.cgi?id=1183893#c1 does, is that it logs
> an error to the daemon log, in fact, the error have already been included in
> libvirtd log while reported this bug. So check the patch in comment1, found
> following code added

Okay, so I created 2 attachments containing complete logs from 1.2.15 and 1.2.17 daemon respectively. To demonstrate the difference the above proposed patch (comment1) does, let's have a look at the most important part:

1.2.15:
2015-09-25 08:25:55.790+0000: 1322: debug : virDomainPCIAddressReserveAddr:317 : Reserving PCI slot 0000:00:03.0 (multifunction='off')
2015-09-25 08:25:55.790+0000: 1322: debug : virDomainPCIAddressReserveAddr:317 : Reserving PCI slot 0000:00:01.0 (multifunction='off')
2015-09-25 08:25:55.790+0000: 1322: debug : qemuDomainObjEnterMonitorInternal:1598 : Entering monitor (mon=0x7fa7f8000bd0 vm=0x7fa800125c60 name=f20live2)
2015-09-25 08:25:55.790+0000: 1322: info : virObjectRef:296 : OBJECT_REF: obj=0x7fa7f8000bd0


1.2.17:
2015-09-25 08:46:31.462+0000: 3246: debug : virDomainPCIAddressReserveAddr:327 : Reserving PCI slot 0000:00:03.0 (multifunction='off')
2015-09-25 08:46:31.462+0000: 3246: debug : virDomainPCIAddressReserveAddr:327 : Reserving PCI slot 0000:00:01.0 (multifunction='off')

!!!2015-09-25 08:46:31.462+0000: 3246: error : virSecurityManagerCheckModel:725 : unsupported configuration: Unable to find security driver for model selinux!!!

2015-09-25 08:46:31.462+0000: 3246: debug : qemuDomainObjEnterMonitorInternal:1752 : Entering monitor (mon=0x7f9c10001060 vm=0x7f9c24245fd0 name=f20live2)
2015-09-25 08:46:31.462+0000: 3246: info : virObjectRef:296 : OBJECT_REF: obj=0x7f9c10001060

Notice the part emphasized with '!!!', when QEMU returns this error "2015-09-25 08:26:40.079+0000: 1271: debug : virJSONValueFromString:1531 : string={"id": "libvirt-12", "error": {"class": "GenericError", "desc": "No file descriptor supplied via SCM_RIGHTS"}}
", one can now search the log for all potential causes and a failed attempt to load a security driver can imho be a good hint.

Comment 11 zhenfeng wang 2015-09-28 07:18:52 UTC
Thanks for Erik's detailed explanation, sorry to misunderstand it previously, currently could get the expect error info while switch the security_driver from selinux to none while start guest in selinxu mode , also the guest didn't disappear or shutoff while do some operation, like mangedsave/save/dump;

BTW, also try another scenario that start a guest while security_driver=none, then switch the security_driver from "none" to "selinux", the guest works well.

According to the upper descriptions, mark this bug verifed with libvirt-1.2.17-11.el7.x86_64.

Comment 13 errata-xmlrpc 2015-11-19 06:08:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2202.html


Note You need to log in before you can comment on or make changes to this bug.