Bug 1483466

Summary: Error when trying to start instance after virsh destroy/start of compute [virt setup]
Product: Red Hat Enterprise Linux 7 Reporter: Eran Kuris <ekuris>
Component: libvirtAssignee: Ján Tomko <jtomko>
Status: CLOSED DUPLICATE QA Contact: Jingjing Shao <jishao>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 7.4CC: berrange, bschmaus, chhu, dasmith, dyuan, eglynn, ekuris, jdenemar, jsuchane, kchamart, libvirt-maint, mbooth, owalsh, rbalakri, rbryant, sbauza, sferdjao, sgordon, srevivo, vromanso, xuzhang, yalzhang
Target Milestone: pre-dev-freezeKeywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-24 14:09:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
nova log none

Description Eran Kuris 2017-08-21 08:11:33 UTC
Created attachment 1316147 [details]
nova log

Description of problem:
Deployed virt setup 1 Undercloud, 1 controller, 1 compute. 
Created VM and rebooted the compute node.
after the compute node start I tried to start the instance but it stays on shutdown status. 


Version-Release number of selected component (if applicable):
[root@controller-0 ~]# cat /etc/yum.repos.d/latest-installed 
10   -p 2017-08-18.2
[root@controller-0 ~]# rpm -qa |grep neutron
openstack-neutron-sriov-nic-agent-9.4.0-2.el7ost.noarch
python-neutronclient-6.0.0-2.el7ost.noarch
openstack-neutron-bigswitch-lldp-9.42.7-1.el7ost.noarch
openstack-neutron-9.4.0-2.el7ost.noarch
openstack-neutron-lbaas-9.2.1-3.el7ost.noarch
python-neutron-lib-0.4.0-1.el7ost.noarch
openstack-neutron-common-9.4.0-2.el7ost.noarch
openstack-neutron-bigswitch-agent-9.42.7-1.el7ost.noarch
openstack-neutron-openvswitch-9.4.0-2.el7ost.noarch
openstack-neutron-metering-agent-9.4.0-2.el7ost.noarch
puppet-neutron-9.5.0-3.el7ost.noarch
python-neutron-9.4.0-2.el7ost.noarch
openstack-neutron-ml2-9.4.0-2.el7ost.noarch
python-neutron-tests-9.4.0-2.el7ost.noarch
python-neutron-lbaas-9.2.1-3.el7ost.noarch
[root@controller-0 ~]# rpm -qa |grep nova
python-novaclient-6.0.1-1.el7ost.noarch
openstack-nova-common-14.0.7-11.el7ost.noarch
openstack-nova-api-14.0.7-11.el7ost.noarch
puppet-nova-9.5.4-2.el7ost.noarch
python-nova-14.0.7-11.el7ost.noarch
openstack-nova-conductor-14.0.7-11.el7ost.noarch
openstack-nova-novncproxy-14.0.7-11.el7ost.noarch
openstack-nova-compute-14.0.7-11.el7ost.noarch
openstack-nova-scheduler-14.0.7-11.el7ost.noarch
openstack-nova-console-14.0.7-11.el7ost.noarch
openstack-nova-migration-14.0.7-11.el7ost.noarch
openstack-nova-cert-14.0.7-11.el7ost.noarch
[root@controller-0 ~]# rpm -qa |grep libvirt
libvirt-daemon-3.2.0-14.el7_4.2.x86_64
libvirt-daemon-driver-storage-scsi-3.2.0-14.el7_4.2.x86_64
libvirt-client-3.2.0-14.el7_4.2.x86_64
libvirt-daemon-driver-nodedev-3.2.0-14.el7_4.2.x86_64
libvirt-daemon-driver-storage-mpath-3.2.0-14.el7_4.2.x86_64
libvirt-daemon-kvm-3.2.0-14.el7_4.2.x86_64
libvirt-libs-3.2.0-14.el7_4.2.x86_64
libvirt-daemon-driver-network-3.2.0-14.el7_4.2.x86_64
libvirt-daemon-driver-interface-3.2.0-14.el7_4.2.x86_64
libvirt-daemon-config-nwfilter-3.2.0-14.el7_4.2.x86_64
libvirt-daemon-driver-storage-gluster-3.2.0-14.el7_4.2.x86_64
libvirt-daemon-driver-storage-disk-3.2.0-14.el7_4.2.x86_64
libvirt-3.2.0-14.el7_4.2.x86_64
libvirt-python-3.2.0-3.el7.x86_64
libvirt-daemon-driver-nwfilter-3.2.0-14.el7_4.2.x86_64
libvirt-daemon-driver-secret-3.2.0-14.el7_4.2.x86_64
libvirt-daemon-config-network-3.2.0-14.el7_4.2.x86_64
libvirt-daemon-driver-storage-iscsi-3.2.0-14.el7_4.2.x86_64
libvirt-daemon-driver-storage-rbd-3.2.0-14.el7_4.2.x86_64
libvirt-daemon-driver-qemu-3.2.0-14.el7_4.2.x86_64
libvirt-daemon-driver-lxc-3.2.0-14.el7_4.2.x86_64
libvirt-daemon-driver-storage-3.2.0-14.el7_4.2.x86_64
libvirt-daemon-driver-storage-core-3.2.0-14.el7_4.2.x86_64
libvirt-daemon-driver-storage-logical-3.2.0-14.el7_4.2.x86_64


How reproducible:
always

Steps to Reproduce:
1.Deployed virt setup 1 Undercloud, 1 controller, 1 compute. 
2. set environment network, subnet. router and instance.
3. reboot compute node
4.start the instance after the compute start-up-
[root@controller-0 ~]# openstack server start 8d448c2b-1d76-435d-a885-e45ef2c43a46 
5. check logs on compute node. 
openstack server list - the instance still on shutdown

Actual results:
the instance still on shutdown

Expected results:
the instance  started 

Additional info:
logs attached

Comment 1 Matthew Booth 2017-08-25 13:41:06 UTC
Could you please attach the following:

* nova.conf from your compute node
* nova-compute.log from your compute node (the whole file)

Also, what storage is /var/lib/nova/instances on the compute host? Is it local filesystem, or nfs, or maybe anything else weird?

How did you deploy the system?

Is there anything at all non-standard about the deployment?

Comment 2 Eran Kuris 2017-08-27 06:35:05 UTC
(In reply to Matthew Booth from comment #1)
> Could you please attach the following:
> 
> * nova.conf from your compute node
> * nova-compute.log from your compute node (the whole file)
> 
> Also, what storage is /var/lib/nova/instances on the compute host? Is it
> local filesystem, or nfs, or maybe anything else weird?
> 
> How did you deploy the system?
> 
> Is there anything at all non-standard about the deployment?

Hi Matthew, 
About the logs, I have no setup now to reproduce the issue. If you have time and resources please reproduce it. 

The storage was a local filesystem.

The deployment done with OSPD used Jenkins job. {link for the in private message}

It's standard virtual deployment

Comment 4 Eran Kuris 2017-08-27 13:20:16 UTC
Found steps to reproduce: 

1.Deployed virt setup 1 Undercloud, 1 controller, 1 compute. 
2. set environment network, subnet. router and instance.
3. virsh destroy compute-0
4. virsh start compute-0
5.start the instance after the compute start-up-
[root@controller-0 ~]# openstack server start 8d448c2b-1d76-435d-a885-e45ef2c43a46 
6. check logs on compute node. 
openstack server list - the instance still on shutdown

you can contact me to take a look at the setup.

Comment 5 Matthew Booth 2017-09-01 12:59:17 UTC
Looks like immediate cause is SELinux preventing virtlogd from rotating the console log:

type=AVC msg=audit(1504270605.922:145): avc:  denied  { unlink } for  pid=2618 com
m="virtlogd" name="console.log" dev="vda2" ino=121634898 scontext=system_u:system_
r:virtlogd_t:s0-s0:c0.c1023 tcontext=system_u:object_r:svirt_image_t:s0:c89,c755 t
class=file
type=SYSCALL msg=audit(1504270605.922:145): arch=c000003e syscall=87 success=no ex
it=-13 a0=7f5118000d30 a1=7f5118000cd9 a2=0 a3=6e6f632f63386334 items=0 ppid=1 pid
=2618 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=
(none) ses=4294967295 comm="virtlogd" exe="/usr/sbin/virtlogd" subj=system_u:syste
m_r:virtlogd_t:s0-s0:c0.c1023 key=(null)
type=PROCTITLE msg=audit(1504270605.922:145): proctitle="/usr/sbin/virtlogd"

Comment 6 Matthew Booth 2017-09-01 13:13:03 UTC
virtlogd_t has:

manage_files_pattern(virtlogd_t, virt_log_t, virt_log_t)
append_files_pattern(virtlogd_t, svirt_image_t, svirt_image_t)

So for whatever reason it *doesn't* have permissions to unlink console.log because it's svirt_image_t not virt_log_t.

So, should console.log be virt_log_t (seems more intuitive to me), or should virtlogd have manage_files_pattern on svirt_image_t?

Comment 7 Daniel Berrangé 2017-09-01 13:21:51 UTC
When virtlogd is enabled, qemu has no need to be able to write to log files directly and in fact we want to prevent it doing so, because virtlogd is intended to provide security protection against guest DOS on host disk space.

This is why log files should now use virt_log_t, not svirt_image_t.

Comment 8 Matthew Booth 2017-09-01 13:41:36 UTC
Domain xml contains:

    <console type='file'>
      <source path='/var/lib/nova/instances/d317f802-75e7-4cda-b621-37b6fcd14c8c/console.log'/>
      <target type='serial' port='0'/>
    </console>

/var/lib/nova/instances/d317f802-75e7-4cda-b621-37b6fcd14c8c/ has type nova_var_lib_t.

Nova doesn't do anything with SELinux. When Nova initially creates the file, it inherits nova_var_lib_t. When the domain starts this becomes svirt_image_t. When the domain shuts down gracefully this reverts to nova_var_lib_t.

To isolate the problem from Nova, I shut down the domain with:

# virsh destroy instance-00000001

I deleted console.log, then restarted the domain with:

# virsh start instance-00000001

console.log was created automatically, and had svirt_image_t. Note that an ungraceful shutdown at this point will leave the file as svirt_image_t, and it will subsequently not be possible to start the domain.

I think the libvirt folks need to investigate this. I also suspect it affects all OSP versions running on RHEL 7.4. I haven't tested RHEL 7.3, so I don't know if this is a regression.

Comment 10 Jaroslav Suchanek 2017-09-01 14:14:33 UTC

*** This bug has been marked as a duplicate of bug 1371125 ***

Comment 11 Matthew Booth 2017-10-27 12:45:27 UTC
I don't believe this bug is a duplicate of bug 1371125. Note that this bug occurs due to an SELinux label which appears to be explicitly applied to the console log by libvirt, not by the default file labelling policy.

Nova happens to pre-create console.log, but this is irrelevant. The bug occurs whether or not Nova pre-creates console.log, as described in comment 8.

When Nova pre-creates console.log it is given the default selinux context for that directory: nova_var_lib_t.

The context changes automatically to svirt_image_t ***when libvirt starts the domain***.

The context reverts automatically to nova_var_lib_t ***when libvirt shuts down the domain***.

The issue is that if libvirt does not shut down gracefully, the domain will not restart if console.log is labelled svirt_image_t. Recall that this label was applied automatically ***when libvirt started the domain***, not before.

Comment 12 Artom Lifshitz 2017-10-27 13:11:16 UTC
> Nova happens to pre-create console.log, but this is irrelevant.

To further reinforce this, nova's creation of console.log appeared [1] in OSP10. The behaviour described in this and other identical bugzillas goes back to at least OSP 9 [2]. In addition, I hacked up a patch removing nova's creation of console.log [3]. I ran a test build of that patch backported to OSP 10. After a compute node crash, instances were not able to be started because of selinux denying virtlogd's conslog.log unlink. 

[1] https://review.openstack.org/#/c/349541/
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1501957#c11
[3] https://review.openstack.org/#/c/515447/

Comment 13 Matthew Booth 2017-10-27 13:31:02 UTC
Also note that this bug differs from bug 1371125 in that the domain is able to start under normal conditions. The issue only occurs when it fails to restart after an ungraceful shutdown. The condition which causes it to fail, the SELinux label of the console log, was created automatically when libvirt started the domain previously.

Comment 14 Artom Lifshitz 2017-10-30 15:05:39 UTC
It looks like the virtlogd policies are in /etc/selinux/targeted/contexts/files/file_contexts, which are form selinux-policy-targeted. The virlogd binary itself is in libvirt-daemon, which doesn't include any selinux stuff. I'm wondering if this bugzilla needs to be retargeted to selinux-policy-targeted in order to make the virtlogd policy change that Matt suggested in comment #6.

Comment 16 Jaroslav Suchanek 2017-11-24 10:52:20 UTC
(In reply to Matthew Booth from comment #11)
> I don't believe this bug is a duplicate of bug 1371125. Note that this bug
> occurs due to an SELinux label which appears to be explicitly applied to the
> console log by libvirt, not by the default file labelling policy.
> 
> Nova happens to pre-create console.log, but this is irrelevant. The bug
> occurs whether or not Nova pre-creates console.log, as described in comment
> 8.
> 
> When Nova pre-creates console.log it is given the default selinux context
> for that directory: nova_var_lib_t.
> 
> The context changes automatically to svirt_image_t ***when libvirt starts
> the domain***.
> 
> The context reverts automatically to nova_var_lib_t ***when libvirt shuts
> down the domain***.
> 
> The issue is that if libvirt does not shut down gracefully, the domain will
> not restart if console.log is labelled svirt_image_t. Recall that this label
> was applied automatically ***when libvirt started the domain***, not before.

Maybe there is still some selinux labelling left from the pre-virtlogd times. It does not change the fact that the parent dir needs to be correctly labelled with virt_log_d. The policy needs to be fixed.

We will investigate the libvirt part. Same time I suggest to clone this to relevant nova/selinux-policy counterpart.

Comment 17 Ján Tomko 2017-11-24 13:28:49 UTC
Since commit e13e8808f9270f4b3b6f4abb8ec473eef81cc1b9
Author:     Pavel Hrdina <phrdina>
CommitDate: 2017-06-16 16:00:10 +0200

    security: don't relabel chardev source if virtlogd is used as stdio handler
    
    In the case that virtlogd is used as stdio handler we pass to QEMU
    only FD to a PIPE connected to virtlogd instead of the file itself.
    
    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1430988
    
    Signed-off-by: Pavel Hrdina <phrdina>
    Reviewed-by: Martin Kletzander <mkletzan>

git describe: v3.4.0-129-ge13e8808f contains: v3.5.0-rc1~63

libvirt should no longer set the svirt_image_t label

Comment 18 Ján Tomko 2017-11-24 14:09:56 UTC

*** This bug has been marked as a duplicate of bug 1430988 ***