Description of problem: OSP13 3ctrl+3compute+3ceph + IPv4+GRE {"message": "Exceeded maximum number of retries. Exceeded max scheduling attempts 3 for instance 7e6fc2de-ffed-4a7d-b004-b164d42ab117. Last exception: internal error: process exited while connecting to monitor", "code": 500, "details": " File \"/usr/lib/python2.7/site-packages/nova/conductor/manager.py\", line 564, in build_instances | | | filter_properties, instances[0].uuid) Nova compute logs is empty - > https://bugzilla.redhat.com/show_bug.cgi?id=1541431 Workaround - setenforce ->0 on compute nodes Version-Release number of selected component (if applicable): OSP13 openstack-zaqar-6.0.0-0.20180123030653.1feeb36.el7ost.noarch openstack-nova-placement-api-17.0.0-0.20180123163703.27eadbc.el7ost.noarch openstack-selinux-0.8.13-0.20180108164838.c6158ce.el7ost.noarch openstack-glance-16.0.0-0.20180123074513.466c104.el7ost.noarch python-openstackclient-lang-3.13.0-0.20180109064941.8c5f755.el7ost.noarch openstack-swift-object-2.15.2-0.20180118220641.449d83f.el7ost.noarch openstack-neutron-openvswitch-12.0.0-0.20180123043113.d32ad6e.el7ost.noarch python2-openstackclient-3.13.0-0.20180109064941.8c5f755.el7ost.noarch openstack-heat-api-10.0.0-0.20180123134729.85a3e98.el7ost.noarch openstack-tripleo-heat-templates-8.0.0-0.20180122224017.el7ost.noarch openstack-mistral-engine-6.0.0-0.20180122153726.ae7950e.el7ost.noarch openstack-ironic-common-10.0.1-0.20180123173432.2c630d9.el7ost.noarch openstack-swift-account-2.15.2-0.20180118220641.449d83f.el7ost.noarch openstack-heat-engine-10.0.0-0.20180123134729.85a3e98.el7ost.noarch openstack-tripleo-ui-8.1.1-0.20180122135122.aef02d8.el7ost.noarch openstack-tripleo-common-containers-8.3.1-0.20180123050219.el7ost.noarch puppet-openstack_extras-12.2.0-0.20180119072029.b090b97.el7ost.noarch openstack-tripleo-image-elements-8.0.0-0.20180117094122.02d0985.el7ost.noarch openstack-nova-api-17.0.0-0.20180123163703.27eadbc.el7ost.noarch openstack-nova-conductor-17.0.0-0.20180123163703.27eadbc.el7ost.noarch openstack-nova-compute-17.0.0-0.20180123163703.27eadbc.el7ost.noarch openstack-neutron-ml2-12.0.0-0.20180123043113.d32ad6e.el7ost.noarch openstack-swift-proxy-2.15.2-0.20180118220641.449d83f.el7ost.noarch openstack-heat-common-10.0.0-0.20180123134729.85a3e98.el7ost.noarch openstack-ironic-conductor-10.0.1-0.20180123173432.2c630d9.el7ost.noarch openstack-tempest-17.2.1-0.20180123175631.fe655fe.el7ost.noarch openstack-mistral-api-6.0.0-0.20180122153726.ae7950e.el7ost.noarch openstack-tripleo-validations-8.1.1-0.20180119231917.2ff3c79.el7ost.noarch openstack-neutron-12.0.0-0.20180123043113.d32ad6e.el7ost.noarch openstack-tripleo-puppet-elements-8.0.0-0.20180117092204.120eca8.el7ost.noarch openstack-nova-common-17.0.0-0.20180123163703.27eadbc.el7ost.noarch openstack-swift-container-2.15.2-0.20180118220641.449d83f.el7ost.noarch openstack-neutron-common-12.0.0-0.20180123043113.d32ad6e.el7ost.noarch openstack-heat-api-cfn-10.0.0-0.20180123134729.85a3e98.el7ost.noarch openstack-ironic-staging-drivers-0.8.1-0.20180118014800.ae60c0b.el7ost.noarch puppet-openstacklib-12.2.0-0.20180119083603.12107fd.el7ost.noarch openstack-tripleo-common-8.3.1-0.20180123050219.el7ost.noarch openstack-mistral-executor-6.0.0-0.20180122153726.ae7950e.el7ost.noarch openstack-nova-scheduler-17.0.0-0.20180123163703.27eadbc.el7ost.noarch python2-openstacksdk-0.9.19-0.20171105041723.bd60aa4.el7ost.noarch openstack-keystone-13.0.0-0.20180123044720.9cd5f19.el7ost.noarch openstack-ironic-api-10.0.1-0.20180123173432.2c630d9.el7ost.noarch openstack-ironic-inspector-7.0.1-0.20180122225049.0cf3316.el7ost.noarch openstack-puppet-modules-11.0.0-0.20171011152327.71ad01c.el7ost.noarch openstack-mistral-common-6.0.0-0.20180122153726.ae7950e.el7ost.noarch How reproducible: Steps to Reproduce: 1.Deploy OSP13 with 3controller+3compute+3ceph nodes + IPv4GRE 2. Try to launch instance Actual results: Exceeded maximum number of retries. Exceeded max scheduling attempts 3 for instance 7e6fc2de-ffed-4a7d-b004-b164d42ab117. Last exception: internal error: process exited while connecting to monitor Expected results: Instance was launch Additional info:
Traceback: Instance failed to spawn: libvirtError: internal error: process exited while connecting to monitor Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2244, in _build_resources yield resources File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2037, in _build_and_run_instance block_device_info=block_device_info) File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2926, in spawn destroy_disks_on_failure=True) File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5361, in _create_domain_and_network destroy_disks_on_failure) File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ self.force_reraise() File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise six.reraise(self.type_, self.value, self.tb) File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5331, in _create_domain_and_network post_xml_callback=post_xml_callback) File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5266, in _create_domain guest.launch(pause=pause) File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 144, in launch self._encoded_xml, errors='ignore') File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ self.force_reraise() File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise six.reraise(self.type_, self.value, self.tb) File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 139, in launch return self._domain.createWithFlags(flags) File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 186, in doit result = proxy_call(self._autowrap, f, *args, **kwargs) File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 144, in proxy_call rv = execute(f, *args, **kwargs) File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 125, in execute six.reraise(c, e, tb) File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 83, in tworker rv = meth(*args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1099, in createWithFlags if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self) libvirtError: internal error: process exited while connecting to monitor [root@compute-0 heat-admin]# grep denied /var/log/audit/audit.log type=AVC msg=audit(1518136909.628:1688): avc: denied { write } for pid=54924 comm="qemu-kvm" path="pipe:[285743]" dev="pipefs" ino=285743 scontext=system_u:system_r:svirt_t:s0:c780,c989 tcontext=system_u:system_r:spc_t:s0 tclass=fifo_file type=AVC msg=audit(1518136909.628:1688): avc: denied { write } for pid=54924 comm="qemu-kvm" path="pipe:[285743]" dev="pipefs" ino=285743 scontext=system_u:system_r:svirt_t:s0:c780,c989 tcontext=system_u:system_r:spc_t:s0 tclass=fifo_file type=AVC msg=audit(1518136909.628:1688): avc: denied { write } for pid=54924 comm="qemu-kvm" path="pipe:[286029]" dev="pipefs" ino=286029 scontext=system_u:system_r:svirt_t:s0:c780,c989 tcontext=system_u:system_r:spc_t:s0 tclass=fifo_file type=AVC msg=audit(1518188712.923:3319): avc: denied { write } for pid=233891 comm="qemu-kvm" path="pipe:[4062348]" dev="pipefs" ino=4062348 scontext=system_u:system_r:svirt_t:s0:c577,c824 tcontext=system_u:system_r:spc_t:s0 tclass=fifo_file type=AVC msg=audit(1518188722.025:3332): avc: denied { append } for pid=233891 comm=43505520302F4B564D path="pipe:[4062463]" dev="pipefs" ino=4062463 scontext=system_u:system_r:svirt_t:s0:c577,c824 tcontext=system_u:system_r:spc_t:s0 tclass=fifo_file type=AVC msg=audit(1518188723.198:3334): avc: denied { append } for pid=233891 comm=43505520302F4B564D path="pipe:[4062463]" dev="pipefs" ino=4062463 scontext=system_u:system_r:svirt_t:s0:c577,c824 tcontext=system_u:system_r:spc_t:s0 tclass=fifo_file
Can you try relabelling the filesystem (and rebooting for it to take effect): $ touch /.autorelabel; reboot
Created attachment 1393870 [details] audit.log
This is a bit concerning: type=SELINUX_ERR msg=audit(1518189489.547:3336): op=security_compute_av reason=bounds scontext=system_u:system_r:svirt_t:s0:c577,c824 tcontext=system_u:system_r:spc_t:s0 tclass=dir perms=ioctl,read,getattr,lock,search,open type=AVC msg=audit(1518189489.547:3336): avc: denied { search } for pid=233891 comm="qemu-kvm" name="48323" dev="proc" ino=188612 scontext=system_u:system_r:svirt_t:s0:c577,c824 tcontext=system_u:system_r:spc_t:s0 tclass=dir type=SELINUX_ERR msg=audit(1518189489.547:3336): op=security_compute_av reason=bounds scontext=system_u:system_r:svirt_t:s0:c577,c824 tcontext=system_u:system_r:spc_t:s0 tclass=file perms=ioctl,read,getattr,lock,open type=AVC msg=audit(1518189489.547:3336): avc: denied { read } for pid=233891 comm="qemu-kvm" name="cmdline" dev="proc" ino=189961 scontext=system_u:system_r:svirt_t:s0:c577,c824 tcontext=system_u:system_r:spc_t:s0 tclass=file type=AVC msg=audit(1518189489.547:3336): avc: denied { open } for pid=233891 comm="qemu-kvm" path="/proc/48323/cmdline" dev="proc" ino=189961 scontext=system_u:system_r:svirt_t:s0:c577,c824 tcontext=system_u:system_r:spc_t:s0 tclass=file type=SYSCALL msg=audit(1518189489.547:3336): arch=c000003e syscall=2 success=yes exit=42 a0=55c60b9e5920 a1=0 a2=7fffe91e8750 a3=5a7dbbb1 items=0 ppid=48307 pid=233891 auid=4294967295 uid=107 gid=107 euid=107 suid=107 fsuid=107 egid=107 sgid=107 fsgid=107 tty=(none) ses=4294967295 comm="qemu-kvm" exe="/usr/libexec/qemu-kvm" subj=system_u:system_r:svirt_t:s0:c577,c824 key=(null) type=PROCTITLE msg=audit(1518189489.547:3336): proctitle=2F7573722F6C6962657865632F71656D752D6B766D002D6E616D650067756573743D696E7374616E63652D30303030303030622C64656275672D746872656164733D6F6E002D53002D6F626A656374007365637265742C69643D6D61737465724B6579302C666F726D61743D7261772C66696C653D2F7661722F6C69622F6C69 type=AVC msg=audit(1518189489.550:3337): avc: denied { getattr } for pid=233891 comm="qemu-kvm" path="/proc/48323/cmdline" dev="proc" ino=189961 scontext=system_u:system_r:svirt_t:s0:c577,c824 tcontext=system_u:system_r:spc_t:s0 tclass=file type=SYSCALL msg=audit(1518189489.550:3337): arch=c000003e syscall=5 success=yes exit=0 a0=2a a1=7fffe91e8660 a2=7fffe91e8660 a3=5a7dbbb1 items=0 ppid=48307 pid=233891 auid=4294967295 uid=107 gid=107 euid=107 suid=107 fsuid=107 egid=107 sgid=107 fsgid=107 tty=(none) ses=4294967295 comm="qemu-kvm" exe="/usr/libexec/qemu-kvm" subj=system_u:system_r:svirt_t:s0:c577,c824 key=(null) type=PROCTITLE msg=audit(1518189489.550:3337): proctitle=2F7573722F6C6962657865632F71656D752D6B766D002D6E616D650067756573743D696E7374616E63652D30303030303030622C64656275672D746872656164733D6F6E002D53002D6F626A656374007365637265742C69643D6D61737465724B6579302C666F726D61743D7261772C66696C653D2F7661722F6C69622F6C69 type=SELINUX_ERR msg=audit(1518189489.550:3338): op=security_compute_av reason=bounds scontext=system_u:system_r:svirt_t:s0:c577,c824 tcontext=system_u:system_r:spc_t:s0 tclass=fifo_file perms=ioctl,read,write,getattr,lock,append,open type=AVC msg=audit(1518189489.550:3338): avc: denied { write } for pid=233891 comm="qemu-kvm" path="pipe:[4062348]" dev="pipefs" ino=4062348 scontext=system_u:system_r:svirt_t:s0:c577,c824 tcontext=system_u:system_r:spc_t:s0 tclass=fifo_file I don't like the idea that a confined svirt needs access to write file w/ spc_t labels. That seems to transcend boundaries provided by svirt.
Dan, is there anything new in container-selinux for this kind of thing? It's running a VM in an spc_t that then transitions to svirt, ... but now needs to do something with spc_t again.
libvirt is running in a container and this is qemu writing to the qemu monitor pipe back to libvirt?
This would be the same as https://bugzilla.redhat.com/show_bug.cgi?id=1538651
(In reply to David Hill from comment #9) > This would be the same as https://bugzilla.redhat.com/show_bug.cgi?id=1538651 That one was for qemu where context was svirt_tcg_t, here we have svirt_t source context.
*** Bug 1545350 has been marked as a duplicate of this bug. ***
We need clarity on what 'spc_t' type is used for and why QEMU is attempting to write to it before we can suggest a solution. That is not something I've heard of before, so its questionable whether there's config fix required,vs an selinux policy extension
spc_t is the label of a "super priviled container" basically this is the label that you get when you run a container out of docker without confining it with SELinux. It is an unconfined domain and we let most domains communicate with it. These AVC's show that it is walking through the process data of the container process, which is not a problem and should be allowed. These AVC's are allowed in the current selinux-policy. audit2allow -i /tmp/t #============= svirt_t ============== #!!!! This avc is allowed in the current policy allow svirt_t spc_t:dir search; #!!!! This avc is allowed in the current policy allow svirt_t spc_t:fifo_file write; #!!!! This avc is allowed in the current policy allow svirt_t spc_t:file { getattr open read }; Are you getting these errors on RHEL7.5? This might be a change to container-selinux, that is causing the typebounds checks, and needs to be fixed in container-selinux.
Also got next message when count of compute nodes == 1 {"message": "Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance 23416b2e-c06c-4523-b8d4-58d0d60364df.", "code": 500, "details": " File \"/usr/lib/python2.7/site-packages/nova/conductor/manager.py\", line 578, in build_instances | | | raise exception.MaxRetriesExceeded(reason=msg)
selinux context of virtlogd on a 7.5 deployment: [root@compute-0 heat-admin]# ps -lxZ | grep virtlogd system_u:system_r:spc_t:s0 4 0 27956 27939 20 0 196016 4388 poll_s Ssl ? 0:00 /usr/sbin/virtlogd --config /etc/libvirt/virtlogd.conf
Yes, it's on RHEL 7.5
So, we can wait for a fix to container-selinux and use workarounds a la comment #13 in the mean time. That's fine.
Ollie, can you let us know what container-selinux package is installed?
(In reply to Lon Hohberger from comment #18) > Ollie, can you let us know what container-selinux package is installed? owlash is out today but AFAICT we used container-selinux-2.41-1.git126c1c0.el7.noarch.
*** Bug 1545532 has been marked as a duplicate of this bug. ***
(In reply to Lon Hohberger from comment #18) > Ollie, can you let us know what container-selinux package is installed? Would need to check Artem.
(In reply to Lon Hohberger from comment #18) > Ollie, can you let us know what container-selinux package is installed? Would need to check with Artem.
Confirmed; this should be working on 7.5 w/ [root@localhost ~]# audit2allow -i audit-1393870.txt #============= svirt_t ============== #!!!! This avc is allowed in the current policy allow svirt_t spc_t:dir search; #!!!! This avc is allowed in the current policy allow svirt_t spc_t:fifo_file { append write }; #!!!! This avc is allowed in the current policy allow svirt_t spc_t:file { getattr open read }; [root@localhost ~]# rpm -q openstack-selinux selinux-policy container-selinux openstack-selinux-0.8.13-0.20180108164838.c6158ce.el7ost.noarch selinux-policy-3.13.1-190.el7.noarch container-selinux-2.41-1.git126c1c0.el7.noarch
I'm also getting "This avc is allowed in the current policy" for attached audit.log on CentOS7.4 with: selinux-policy-3.13.1-166.el7_4.7.noarch container-selinux-2.36-1.gitff95335.el7.noarch openstack-selinux-0.8.13-0.20180108164838.c6158ce.el7.centos.noarch Please report version for the above on the test system.
[heat-admin@compute-0 ~]$ sudo rpm -qa "*container-selinux*" container-selinux-2.41-1.git126c1c0.el7.noarch openstack-selinux-0.8.13-0.20180108164838.c6158ce.el7ost.noarch selinux-policy-3.13.1-186.el7.noarch
*** Bug 1551651 has been marked as a duplicate of this bug. ***
*** Bug 1551617 has been marked as a duplicate of this bug. ***
Created attachment 1416727 [details] audit.log with disabled dontaudit rules
type=AVC msg=audit(1522752025.041:226): avc: denied { read } for pid=20007 comm="sshd" name="lastlog" dev="vda2" ino=8628871 scontext=system_u:system_r:sshd_t:s0-s0:c0.c1023 tcontext=system_u:object_r:var_log_t:s0 tclass=file This is the same as: https://bugzilla.redhat.com/show_bug.cgi?id=1562586 The problem there is that something recreated /var/log/lastlog with the wrong file label. When you create a file in /var/log, it will inherit /var/log's directory type. This is 'var_log_t'. The SELinux base policy in RHEL 7 has /var/log/lastlog labeled as 'lastlog_t', however, so there are two options when creating this file: - The creating process needs to call restorecon, or - we need a file transition rule Doing the former requires the process to simply call 'restorecon' on /var/log/lastlog from whatever process recreated it. To do the latter, we would need to know the security context of the process which deleted/recreated /var/log/lastlog; file transitions are extremely precise (need to know source context, target directory label (e.g. var_log_t), target file label (lastlog_t), and filename (lastlog) ). Anyway, if the label were correct, lastlog_t would be accessible. Default security policies allow: allow sshd_t lastlog_t : file { ioctl read write create getattr setattr lock append open } ;
*** Bug 1562586 has been marked as a duplicate of this bug. ***
At this point, the problem is figuring out what is deleting/recreating /var/log/lastlog.
Noticed this on a custom image (based on the lastest overcloud-full) that I couldn't ssh to because of issues with lastlog. compute-0/var/log/secure:Apr 5 09:08:34 computerealtime-0 sshd[3899]: lastlog_openseek: Couldn't stat /var/log/lastlog: No such file or directory compute-0/var/log/secure:Apr 5 09:09:31 computerealtime-0 su: pam_lastlog(su:session): file /var/log/lastlog created compute-0/var/log/secure:Apr 5 09:11:03 computerealtime-0 sshd[9590]: pam_lastlog(sshd:session): unable to open /var/log/lastlog: Permission denied I did hit this with a newer build of the custom image. 2 significant differences: 1 - I ran a virt-customize --selinux-relabel on the image after building it (although lastlog doesn't exist). 2 - I waited much longer to ssh after the overcloud VM was created. When I initially hit this I tried to ssh almost as soon as the VM booted.
s/I did hit this with a newer build of the custom image/I didn't hit this with a newer build of the custom image/
Should we have the image build somehow do a restorecon or something like that? The problem is that something is (re)creating lastlog with the wrong label, but we don't know what it is.
*** Bug 1569226 has been marked as a duplicate of this bug. ***
The goal here is to make the images have /var/log/lastlog. The virt-sysprep line is deleting it; defaults are deleting /var/log/lastlog*.
pam seems to be recreating it, but with the wrong label.
We're going to recreate lastlog, restorecon on it, and then re-nuke machine-id during image builds to ensure that lastlog exists and has the correct label.
VERIFIED +--------------------------------------+--------------+--------+------------+-------------+--------------------------------------+ (overcloud) [stack@undercloud-0 ~]$ nova list +--------------------------------------+--------------+--------+------------+-------------+--------------------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+--------------+--------+------------+-------------+--------------------------------------+ | 4d621325-ce47-4751-aa2b-c26b7de8f8ac | after_deploy | ACTIVE | - | Running | tenantvxlan=192.168.32.7, 10.0.0.185 | | 0d325810-caf5-4573-95a6-596f85206b92 | after_reboot | ACTIVE | - | Running | tenantvxlan=192.168.32.3, 10.0.0.184 | +--------------------------------------+--------------+--------+------------+-------------+--------------------------------------+ puddle - 2018-05-07.2 heat-admin@compute-0 ~]$ sudo sestatus SELinux status: enabled SELinuxfs mount: /sys/fs/selinux SELinux root directory: /etc/selinux Loaded policy name: targeted Current mode: enforcing Mode from config file: enforcing Policy MLS status: enabled Policy deny_unknown status: allowed Max kernel policy version: 31
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2083