Bug 1483466
| Summary: | Error when trying to start instance after virsh destroy/start of compute [virt setup] | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Eran Kuris <ekuris> | ||||
| Component: | libvirt | Assignee: | Ján Tomko <jtomko> | ||||
| Status: | CLOSED DUPLICATE | QA Contact: | Jingjing Shao <jishao> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 7.4 | CC: | berrange, bschmaus, chhu, dasmith, dyuan, eglynn, ekuris, jdenemar, jsuchane, kchamart, libvirt-maint, mbooth, owalsh, rbalakri, rbryant, sbauza, sferdjao, sgordon, srevivo, vromanso, xuzhang, yalzhang | ||||
| Target Milestone: | pre-dev-freeze | Keywords: | Reopened | ||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2017-11-24 14:09:56 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Could you please attach the following: * nova.conf from your compute node * nova-compute.log from your compute node (the whole file) Also, what storage is /var/lib/nova/instances on the compute host? Is it local filesystem, or nfs, or maybe anything else weird? How did you deploy the system? Is there anything at all non-standard about the deployment? (In reply to Matthew Booth from comment #1) > Could you please attach the following: > > * nova.conf from your compute node > * nova-compute.log from your compute node (the whole file) > > Also, what storage is /var/lib/nova/instances on the compute host? Is it > local filesystem, or nfs, or maybe anything else weird? > > How did you deploy the system? > > Is there anything at all non-standard about the deployment? Hi Matthew, About the logs, I have no setup now to reproduce the issue. If you have time and resources please reproduce it. The storage was a local filesystem. The deployment done with OSPD used Jenkins job. {link for the in private message} It's standard virtual deployment Found steps to reproduce: 1.Deployed virt setup 1 Undercloud, 1 controller, 1 compute. 2. set environment network, subnet. router and instance. 3. virsh destroy compute-0 4. virsh start compute-0 5.start the instance after the compute start-up- [root@controller-0 ~]# openstack server start 8d448c2b-1d76-435d-a885-e45ef2c43a46 6. check logs on compute node. openstack server list - the instance still on shutdown you can contact me to take a look at the setup. Looks like immediate cause is SELinux preventing virtlogd from rotating the console log:
type=AVC msg=audit(1504270605.922:145): avc: denied { unlink } for pid=2618 com
m="virtlogd" name="console.log" dev="vda2" ino=121634898 scontext=system_u:system_
r:virtlogd_t:s0-s0:c0.c1023 tcontext=system_u:object_r:svirt_image_t:s0:c89,c755 t
class=file
type=SYSCALL msg=audit(1504270605.922:145): arch=c000003e syscall=87 success=no ex
it=-13 a0=7f5118000d30 a1=7f5118000cd9 a2=0 a3=6e6f632f63386334 items=0 ppid=1 pid
=2618 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=
(none) ses=4294967295 comm="virtlogd" exe="/usr/sbin/virtlogd" subj=system_u:syste
m_r:virtlogd_t:s0-s0:c0.c1023 key=(null)
type=PROCTITLE msg=audit(1504270605.922:145): proctitle="/usr/sbin/virtlogd"
virtlogd_t has: manage_files_pattern(virtlogd_t, virt_log_t, virt_log_t) append_files_pattern(virtlogd_t, svirt_image_t, svirt_image_t) So for whatever reason it *doesn't* have permissions to unlink console.log because it's svirt_image_t not virt_log_t. So, should console.log be virt_log_t (seems more intuitive to me), or should virtlogd have manage_files_pattern on svirt_image_t? When virtlogd is enabled, qemu has no need to be able to write to log files directly and in fact we want to prevent it doing so, because virtlogd is intended to provide security protection against guest DOS on host disk space. This is why log files should now use virt_log_t, not svirt_image_t. Domain xml contains:
<console type='file'>
<source path='/var/lib/nova/instances/d317f802-75e7-4cda-b621-37b6fcd14c8c/console.log'/>
<target type='serial' port='0'/>
</console>
/var/lib/nova/instances/d317f802-75e7-4cda-b621-37b6fcd14c8c/ has type nova_var_lib_t.
Nova doesn't do anything with SELinux. When Nova initially creates the file, it inherits nova_var_lib_t. When the domain starts this becomes svirt_image_t. When the domain shuts down gracefully this reverts to nova_var_lib_t.
To isolate the problem from Nova, I shut down the domain with:
# virsh destroy instance-00000001
I deleted console.log, then restarted the domain with:
# virsh start instance-00000001
console.log was created automatically, and had svirt_image_t. Note that an ungraceful shutdown at this point will leave the file as svirt_image_t, and it will subsequently not be possible to start the domain.
I think the libvirt folks need to investigate this. I also suspect it affects all OSP versions running on RHEL 7.4. I haven't tested RHEL 7.3, so I don't know if this is a regression.
*** This bug has been marked as a duplicate of bug 1371125 *** I don't believe this bug is a duplicate of bug 1371125. Note that this bug occurs due to an SELinux label which appears to be explicitly applied to the console log by libvirt, not by the default file labelling policy. Nova happens to pre-create console.log, but this is irrelevant. The bug occurs whether or not Nova pre-creates console.log, as described in comment 8. When Nova pre-creates console.log it is given the default selinux context for that directory: nova_var_lib_t. The context changes automatically to svirt_image_t ***when libvirt starts the domain***. The context reverts automatically to nova_var_lib_t ***when libvirt shuts down the domain***. The issue is that if libvirt does not shut down gracefully, the domain will not restart if console.log is labelled svirt_image_t. Recall that this label was applied automatically ***when libvirt started the domain***, not before. > Nova happens to pre-create console.log, but this is irrelevant. To further reinforce this, nova's creation of console.log appeared [1] in OSP10. The behaviour described in this and other identical bugzillas goes back to at least OSP 9 [2]. In addition, I hacked up a patch removing nova's creation of console.log [3]. I ran a test build of that patch backported to OSP 10. After a compute node crash, instances were not able to be started because of selinux denying virtlogd's conslog.log unlink. [1] https://review.openstack.org/#/c/349541/ [2] https://bugzilla.redhat.com/show_bug.cgi?id=1501957#c11 [3] https://review.openstack.org/#/c/515447/ Also note that this bug differs from bug 1371125 in that the domain is able to start under normal conditions. The issue only occurs when it fails to restart after an ungraceful shutdown. The condition which causes it to fail, the SELinux label of the console log, was created automatically when libvirt started the domain previously. It looks like the virtlogd policies are in /etc/selinux/targeted/contexts/files/file_contexts, which are form selinux-policy-targeted. The virlogd binary itself is in libvirt-daemon, which doesn't include any selinux stuff. I'm wondering if this bugzilla needs to be retargeted to selinux-policy-targeted in order to make the virtlogd policy change that Matt suggested in comment #6. (In reply to Matthew Booth from comment #11) > I don't believe this bug is a duplicate of bug 1371125. Note that this bug > occurs due to an SELinux label which appears to be explicitly applied to the > console log by libvirt, not by the default file labelling policy. > > Nova happens to pre-create console.log, but this is irrelevant. The bug > occurs whether or not Nova pre-creates console.log, as described in comment > 8. > > When Nova pre-creates console.log it is given the default selinux context > for that directory: nova_var_lib_t. > > The context changes automatically to svirt_image_t ***when libvirt starts > the domain***. > > The context reverts automatically to nova_var_lib_t ***when libvirt shuts > down the domain***. > > The issue is that if libvirt does not shut down gracefully, the domain will > not restart if console.log is labelled svirt_image_t. Recall that this label > was applied automatically ***when libvirt started the domain***, not before. Maybe there is still some selinux labelling left from the pre-virtlogd times. It does not change the fact that the parent dir needs to be correctly labelled with virt_log_d. The policy needs to be fixed. We will investigate the libvirt part. Same time I suggest to clone this to relevant nova/selinux-policy counterpart. Since commit e13e8808f9270f4b3b6f4abb8ec473eef81cc1b9
Author: Pavel Hrdina <phrdina>
CommitDate: 2017-06-16 16:00:10 +0200
security: don't relabel chardev source if virtlogd is used as stdio handler
In the case that virtlogd is used as stdio handler we pass to QEMU
only FD to a PIPE connected to virtlogd instead of the file itself.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1430988
Signed-off-by: Pavel Hrdina <phrdina>
Reviewed-by: Martin Kletzander <mkletzan>
git describe: v3.4.0-129-ge13e8808f contains: v3.5.0-rc1~63
libvirt should no longer set the svirt_image_t label
*** This bug has been marked as a duplicate of bug 1430988 *** |
Created attachment 1316147 [details] nova log Description of problem: Deployed virt setup 1 Undercloud, 1 controller, 1 compute. Created VM and rebooted the compute node. after the compute node start I tried to start the instance but it stays on shutdown status. Version-Release number of selected component (if applicable): [root@controller-0 ~]# cat /etc/yum.repos.d/latest-installed 10 -p 2017-08-18.2 [root@controller-0 ~]# rpm -qa |grep neutron openstack-neutron-sriov-nic-agent-9.4.0-2.el7ost.noarch python-neutronclient-6.0.0-2.el7ost.noarch openstack-neutron-bigswitch-lldp-9.42.7-1.el7ost.noarch openstack-neutron-9.4.0-2.el7ost.noarch openstack-neutron-lbaas-9.2.1-3.el7ost.noarch python-neutron-lib-0.4.0-1.el7ost.noarch openstack-neutron-common-9.4.0-2.el7ost.noarch openstack-neutron-bigswitch-agent-9.42.7-1.el7ost.noarch openstack-neutron-openvswitch-9.4.0-2.el7ost.noarch openstack-neutron-metering-agent-9.4.0-2.el7ost.noarch puppet-neutron-9.5.0-3.el7ost.noarch python-neutron-9.4.0-2.el7ost.noarch openstack-neutron-ml2-9.4.0-2.el7ost.noarch python-neutron-tests-9.4.0-2.el7ost.noarch python-neutron-lbaas-9.2.1-3.el7ost.noarch [root@controller-0 ~]# rpm -qa |grep nova python-novaclient-6.0.1-1.el7ost.noarch openstack-nova-common-14.0.7-11.el7ost.noarch openstack-nova-api-14.0.7-11.el7ost.noarch puppet-nova-9.5.4-2.el7ost.noarch python-nova-14.0.7-11.el7ost.noarch openstack-nova-conductor-14.0.7-11.el7ost.noarch openstack-nova-novncproxy-14.0.7-11.el7ost.noarch openstack-nova-compute-14.0.7-11.el7ost.noarch openstack-nova-scheduler-14.0.7-11.el7ost.noarch openstack-nova-console-14.0.7-11.el7ost.noarch openstack-nova-migration-14.0.7-11.el7ost.noarch openstack-nova-cert-14.0.7-11.el7ost.noarch [root@controller-0 ~]# rpm -qa |grep libvirt libvirt-daemon-3.2.0-14.el7_4.2.x86_64 libvirt-daemon-driver-storage-scsi-3.2.0-14.el7_4.2.x86_64 libvirt-client-3.2.0-14.el7_4.2.x86_64 libvirt-daemon-driver-nodedev-3.2.0-14.el7_4.2.x86_64 libvirt-daemon-driver-storage-mpath-3.2.0-14.el7_4.2.x86_64 libvirt-daemon-kvm-3.2.0-14.el7_4.2.x86_64 libvirt-libs-3.2.0-14.el7_4.2.x86_64 libvirt-daemon-driver-network-3.2.0-14.el7_4.2.x86_64 libvirt-daemon-driver-interface-3.2.0-14.el7_4.2.x86_64 libvirt-daemon-config-nwfilter-3.2.0-14.el7_4.2.x86_64 libvirt-daemon-driver-storage-gluster-3.2.0-14.el7_4.2.x86_64 libvirt-daemon-driver-storage-disk-3.2.0-14.el7_4.2.x86_64 libvirt-3.2.0-14.el7_4.2.x86_64 libvirt-python-3.2.0-3.el7.x86_64 libvirt-daemon-driver-nwfilter-3.2.0-14.el7_4.2.x86_64 libvirt-daemon-driver-secret-3.2.0-14.el7_4.2.x86_64 libvirt-daemon-config-network-3.2.0-14.el7_4.2.x86_64 libvirt-daemon-driver-storage-iscsi-3.2.0-14.el7_4.2.x86_64 libvirt-daemon-driver-storage-rbd-3.2.0-14.el7_4.2.x86_64 libvirt-daemon-driver-qemu-3.2.0-14.el7_4.2.x86_64 libvirt-daemon-driver-lxc-3.2.0-14.el7_4.2.x86_64 libvirt-daemon-driver-storage-3.2.0-14.el7_4.2.x86_64 libvirt-daemon-driver-storage-core-3.2.0-14.el7_4.2.x86_64 libvirt-daemon-driver-storage-logical-3.2.0-14.el7_4.2.x86_64 How reproducible: always Steps to Reproduce: 1.Deployed virt setup 1 Undercloud, 1 controller, 1 compute. 2. set environment network, subnet. router and instance. 3. reboot compute node 4.start the instance after the compute start-up- [root@controller-0 ~]# openstack server start 8d448c2b-1d76-435d-a885-e45ef2c43a46 5. check logs on compute node. openstack server list - the instance still on shutdown Actual results: the instance still on shutdown Expected results: the instance started Additional info: logs attached