Description of problem: I have observed this last night while testing TLS everywhere (RHOS_TRUNK-15.0-RHEL-8-20190815.n.1 puddle) It seems to me that at a certain point a getcert command (on the haproxy service cert) just sits there hanging. I see many getcerts being successful: Aug 19 21:13:52 controller-0.redhat.local puppet-user[25453]: Debug: Executing: '/usr/bin/getcert list' Aug 19 21:13:52 controller-0.redhat.local puppet-user[25453]: Debug: Issuing getcert command with args: ["request", "-I", "mysql", "-f", "/etc/pki/tls/certs/mysql.crt", "-c", "IPA", "-N", "CN=controller-0.internalapi.redhat.local", "-K", "mysql/controller-0.internalapi.redhat.local", "-D", "overcloud.internalapi.redhat.local", "-D", "controller-0.internalapi.redhat.local", "-w", "-k", "/etc/pki/tls/private/mysql.key"] Aug 19 21:13:52 controller-0.redhat.local puppet-user[25453]: Debug: Executing: '/usr/bin/getcert request -I mysql -f /etc/pki/tls/certs/mysql.crt -c IPA -N CN=controller-0.internalapi.redhat.local -K mysql/controller-0.internalapi.redhat.local -D overcloud.internalapi.redhat.local -D controller-0.internalapi.redhat.local -w -k /etc/pki/tls/private/mysql.key' Aug 19 21:13:52 controller-0.redhat.local puppet-user[25453]: Debug: Executing: '/usr/bin/getcert list -i mysql' Aug 19 21:13:52 controller-0.redhat.local puppet-user[25453]: Debug: Issuing getcert command with args: ["request", "-I", "rabbitmq", "-f", "/etc/pki/tls/certs/rabbitmq.crt", "-c", "IPA", "-N", "CN=controller-0.internalapi.redhat.local", "-K", "rabbitmq/controller-0.internalapi.redhat.local", "-D", "controller-0.internalapi.redhat.local", "-C", "/usr/bin/certmonger-rabbitmq-refresh.sh", "-w", "-k", "/etc/pki/tls/private/rabbitmq.key"] Aug 19 21:13:52 controller-0.redhat.local puppet-user[25453]: Debug: Executing: '/usr/bin/getcert request -I rabbitmq -f /etc/pki/tls/certs/rabbitmq.crt -c IPA -N CN=controller-0.internalapi.redhat.local -K rabbitmq/controller-0.internalapi.redhat.local -D controller-0.internalapi.redhat.local -C /usr/bin/certmonger-rabbitmq-refresh.sh -w -k /etc/pki/tls/private/rabbitmq.key' Aug 19 21:13:53 controller-0.redhat.local puppet-user[25453]: Debug: Executing: '/usr/bin/getcert list -i rabbitmq' Aug 19 21:13:53 controller-0.redhat.local puppet-user[25453]: Debug: Issuing getcert command with args: ["request", "-I", "redis", "-f", "/etc/pki/tls/certs/redis.crt", "-c", "IPA", "-N", "CN=controller-0.internalapi.redhat.local", "-K", "redis/controller-0.internalapi.redhat.local", "-D", "overcloud.internalapi.redhat.local", "-D", "controller-0.internalapi.redhat.local", "-C", "/usr/bin/certmonger-redis-refresh.sh", "-w", "-k", "/etc/pki/tls/private/redis.key"] Aug 19 21:13:53 controller-0.redhat.local puppet-user[25453]: Debug: Executing: '/usr/bin/getcert request -I redis -f /etc/pki/tls/certs/redis.crt -c IPA -N CN=controller-0.internalapi.redhat.local -K redis/controller-0.internalapi.redhat.local -D overcloud.internalapi.redhat.local -D controller-0.internalapi.redhat.local -C /usr/bin/certmonger-redis-refresh.sh -w -k /etc/pki/tls/private/redis.key' Aug 19 21:13:54 controller-0.redhat.local puppet-user[25453]: Debug: Executing: '/usr/bin/getcert list -i redis' Aug 19 21:13:54 controller-0.redhat.local puppet-user[25453]: Debug: Issuing getcert command with args: ["request", "-I", "novnc-proxy", "-f", "/etc/pki/tls/certs/novnc_proxy.crt", "-c", "IPA", "-N", "CN=controller-0.internalapi.redhat.local", "-K", "novnc-proxy/controller-0.internalapi.redhat.local", "-D", "controller-0.internalapi.redhat.local", "-w", "-k", "/etc/pki/tls/private/novnc_proxy.key"] Aug 19 21:13:54 controller-0.redhat.local puppet-user[25453]: Debug: Executing: '/usr/bin/getcert request -I novnc-proxy -f /etc/pki/tls/certs/novnc_proxy.crt -c IPA -N CN=controller-0.internalapi.redhat.local -K novnc-proxy/controller-0.internalapi.redhat.local -D controller-0.internalapi.redhat.local -w -k /etc/pki/tls/private/novnc_proxy.key' Aug 19 21:13:54 controller-0.redhat.local puppet-user[25453]: Debug: Executing: '/usr/bin/getcert list -i novnc-proxy' Aug 19 21:13:54 controller-0.redhat.local puppet-user[25453]: Debug: Issuing getcert command with args: ["request", "-I", "httpd-ctlplane", "-f", "/etc/pki/tls/certs/httpd/httpd-ctlplane.crt", "-c", "IPA", "-N", "CN=controller-0.ctlplane.redhat.local", "-K", "HTTP/controller-0.ctlplane.redhat.local", "-D", "controller-0.ctlplane.redhat.local", "-C", "pkill -USR1 httpd", "-w", "-k", "/etc/pki/tls/private/httpd/httpd-ctlplane.key"] Aug 19 21:13:54 controller-0.redhat.local puppet-user[25453]: Debug: Executing: '/usr/bin/getcert request -I httpd-ctlplane -f /etc/pki/tls/certs/httpd/httpd-ctlplane.crt -c IPA -N CN=controller-0.ctlplane.redhat.local -K HTTP/controller-0.ctlplane.redhat.local -D controller-0.ctlplane.redhat.local -C pkill -USR1 httpd -w -k /etc/pki/tls/private/httpd/httpd-ctlplane.key' Aug 19 21:13:55 controller-0.redhat.local puppet-user[25453]: Debug: Executing: '/usr/bin/getcert list -i httpd-ctlplane' Aug 19 21:13:55 controller-0.redhat.local puppet-user[25453]: Debug: Issuing getcert command with args: ["request", "-I", "httpd-external", "-f", "/etc/pki/tls/certs/httpd/httpd-external.crt", "-c", "IPA", "-N", "CN=controller-0.external.redhat.local", "-K", "HTTP/controller-0.external.redhat.local", "-D", "controller-0.external.redhat.local", "-C", "pkill -USR1 httpd", "-w", "-k", "/etc/pki/tls/private/httpd/httpd-external.key"] Aug 19 21:13:55 controller-0.redhat.local puppet-user[25453]: Debug: Executing: '/usr/bin/getcert request -I httpd-external -f /etc/pki/tls/certs/httpd/httpd-external.crt -c IPA -N CN=controller-0.external.redhat.local -K HTTP/controller-0.external.redhat.local -D controller-0.external.redhat.local -C pkill -USR1 httpd -w -k /etc/pki/tls/private/httpd/httpd-external.key' Aug 19 21:13:56 controller-0.redhat.local puppet-user[25453]: Debug: Executing: '/usr/bin/getcert list -i httpd-external' Aug 19 21:13:56 controller-0.redhat.local puppet-user[25453]: Debug: Issuing getcert command with args: ["request", "-I", "httpd-internal_api", "-f", "/etc/pki/tls/certs/httpd/httpd-internal_api.crt", "-c", "IPA", "-N", "CN=controller-0.internalapi.redhat.local", "-K", "HTTP/controller-0.internalapi.redhat.local", "-D", "controller-0.internalapi.redhat.local", "-C", "pkill -USR1 httpd", "-w", "-k", "/etc/pki/tls/private/httpd/httpd-internal_api.key"] Aug 19 21:13:56 controller-0.redhat.local puppet-user[25453]: Debug: Executing: '/usr/bin/getcert request -I httpd-internal_api -f /etc/pki/tls/certs/httpd/httpd-internal_api.crt -c IPA -N CN=controller-0.internalapi.redhat.local -K HTTP/controller-0.internalapi.redhat.local -D controller-0.internalapi.redhat.local -C pkill -USR1 httpd -w -k /etc/pki/tls/private/httpd/httpd-internal_api.key' Aug 19 21:13:56 controller-0.redhat.local puppet-user[25453]: Debug: Executing: '/usr/bin/getcert list -i httpd-internal_api' Aug 19 21:13:56 controller-0.redhat.local puppet-user[25453]: Debug: Issuing getcert command with args: ["request", "-I", "httpd-storage", "-f", "/etc/pki/tls/certs/httpd/httpd-storage.crt", "-c", "IPA", "-N", "CN=controller-0.storage.redhat.local", "-K", "HTTP/controller-0.storage.redhat.local", "-D", "controller-0.storage.redhat.local", "-C", "pkill -USR1 httpd", "-w", "-k", "/etc/pki/tls/private/httpd/httpd-storage.key"] Aug 19 21:13:56 controller-0.redhat.local puppet-user[25453]: Debug: Executing: '/usr/bin/getcert request -I httpd-storage -f /etc/pki/tls/certs/httpd/httpd-storage.crt -c IPA -N CN=controller-0.storage.redhat.local -K HTTP/controller-0.storage.redhat.local -D controller-0.storage.redhat.local -C pkill -USR1 httpd -w -k /etc/pki/tls/private/httpd/httpd-storage.key' Aug 19 21:13:57 controller-0.redhat.local puppet-user[25453]: Debug: Executing: '/usr/bin/getcert list -i httpd-storage' Aug 19 21:13:57 controller-0.redhat.local puppet-user[25453]: Debug: Issuing getcert command with args: ["request", "-I", "httpd-storage_mgmt", "-f", "/etc/pki/tls/certs/httpd/httpd-storage_mgmt.crt", "-c", "IPA", "-N", "CN=controller-0.storagemgmt.redhat.local", "-K", "HTTP/controller-0.storagemgmt.redhat.local", "-D", "controller-0.storagemgmt.redhat.local", "-C", "pkill -USR1 httpd", "-w", "-k", "/etc/pki/tls/private/httpd/httpd-storage_mgmt.key"] Aug 19 21:13:57 controller-0.redhat.local puppet-user[25453]: Debug: Executing: '/usr/bin/getcert request -I httpd-storage_mgmt -f /etc/pki/tls/certs/httpd/httpd-storage_mgmt.crt -c IPA -N CN=controller-0.storagemgmt.redhat.local -K HTTP/controller-0.storagemgmt.redhat.local -D controller-0.storagemgmt.redhat.local -C pkill -USR1 httpd -w -k /etc/pki/tls/private/httpd/httpd-storage_mgmt.key' Aug 19 21:13:58 controller-0.redhat.local puppet-user[25453]: Debug: Executing: '/usr/bin/getcert list -i httpd-storage_mgmt' Aug 19 21:13:58 controller-0.redhat.local puppet-user[25453]: Debug: Issuing getcert command with args: ["request", "-I", "libvirt-vnc-client-cert", "-f", "/etc/pki/libvirt-vnc/client-cert.pem", "-c", "IPA", "-N", "CN=controller-0.internalapi.redhat.local", "-K", "libvirt-vnc/controller-0.internalapi.redhat.local", "-D", "controller-0.internalapi.redhat.local", "-C", "systemctl reload libvirtd", "-w", "-k", "/etc/pki/libvirt-vnc/client-key.pem", "-F", "/etc/ipa/ca.crt"] Aug 19 21:13:58 controller-0.redhat.local puppet-user[25453]: Debug: Executing: '/usr/bin/getcert request -I libvirt-vnc-client-cert -f /etc/pki/libvirt-vnc/client-cert.pem -c IPA -N CN=controller-0.internalapi.redhat.local -K libvirt-vnc/controller-0.internalapi.redhat.local -D controller-0.internalapi.redhat.local -C systemctl reload libvirtd -w -k /etc/pki/libvirt-vnc/client-key.pem -F /etc/ipa/ca.crt' Aug 19 21:13:58 controller-0.redhat.local puppet-user[25453]: Warning: Could not get certificate: Execution of '/usr/bin/getcert request -I libvirt-vnc-client-cert -f /etc/pki/libvirt-vnc/client-cert.pem -c IPA -N CN=controller-0.internalapi.redhat.local -K libvirt-vnc/controller-0.internalapi.redhat.local -D controller-0.internalapi.redhat.local -C systemctl reload libvirtd -w -k /etc/pki/libvirt-vnc/client-key.pem -F /etc/ipa/ca.crt' returned 4: New signing request "libvirt-vnc-client-cert" added. Aug 19 21:13:58 controller-0.redhat.local puppet-user[25453]: Debug: Executing: '/usr/bin/getcert list -i libvirt-vnc-client-cert' Aug 19 21:13:59 controller-0.redhat.local puppet-user[25453]: Debug: Issuing getcert command with args: ["request", "-I", "haproxy-ctlplane-cert", "-f", "/etc/pki/tls/certs/haproxy/overcloud-haproxy-ctlplane.crt", "-c", "IPA", "-N", "CN=controller-0.ctlplane.redhat.local", "-K", "haproxy/controller-0.ctlplane.redhat.local", "-D", "overcloud.ctlplane.redhat.local", "-D", "controller-0.ctlplane.redhat.local", "-U", "id-kp-clientAuth", "-U", "id-kp-serverAuth", "-C", "/usr/bin/certmonger-haproxy-refresh.sh reload ctlplane", "-w", "-k", "/etc/pki/tls/private/haproxy/overcloud-haproxy-ctlplane.key"] Aug 19 21:13:59 controller-0.redhat.local puppet-user[25453]: Debug: Executing: '/usr/bin/getcert request -I haproxy-ctlplane-cert -f /etc/pki/tls/certs/haproxy/overcloud-haproxy-ctlplane.crt -c IPA -N CN=controller-0.ctlplane.redhat.local -K haproxy/controller-0.ctlplane.redhat.local -D overcloud.ctlplane.redhat.local -D controller-0.ctlplane.redhat.local -U id-kp-clientAuth -U id-kp-serverAuth -C /usr/bin/certmonger-haproxy-refresh.sh reload ctlplane -w -k /etc/pki/tls/private/haproxy/overcloud-haproxy-ctlplane.key' And the haproxy one is the last one and then puppet just hangs forever and the jobs timeout: [root@controller-0 ~]# journalctl -t puppet-user |tail -n5 Aug 19 21:13:59 controller-0.redhat.local puppet-user[25453]: Debug: /Stage[main]/Tripleo::Profile::Base::Certmonger_user/Tripleo::Certmonger::Libvirt_vnc[libvirt-vnc-client-cert]/File[/etc/pki/libvirt-vnc/client-key.pem]: The container Tripleo::Certmonger::Libvirt_vnc[libvirt-vnc-client-cert] will propagate my refresh event Aug 19 21:13:59 controller-0.redhat.local puppet-user[25453]: Debug: Tripleo::Certmonger::Libvirt_vnc[libvirt-vnc-client-cert]: The container Class[Tripleo::Profile::Base::Certmonger_user] will propagate my refresh event Aug 19 21:13:59 controller-0.redhat.local puppet-user[25453]: Notice: /Stage[main]/Tripleo::Profile::Base::Certmonger_user/Tripleo::Certmonger::Haproxy[haproxy-ctlplane]/Certmonger_certificate[haproxy-ctlplane-cert]/ensure: created Aug 19 21:13:59 controller-0.redhat.local puppet-user[25453]: Debug: Issuing getcert command with args: ["request", "-I", "haproxy-ctlplane-cert", "-f", "/etc/pki/tls/certs/haproxy/overcloud-haproxy-ctlplane.crt", "-c", "IPA", "-N", "CN=controller-0.ctlplane.redhat.local", "-K", "haproxy/controller-0.ctlplane.redhat.local", "-D", "overcloud.ctlplane.redhat.local", "-D", "controller-0.ctlplane.redhat.local", "-U", "id-kp-clientAuth", "-U", "id-kp-serverAuth", "-C", "/usr/bin/certmonger-haproxy-refresh.sh reload ctlplane", "-w", "-k", "/etc/pki/tls/private/haproxy/overcloud-haproxy-ctlplane.key"] - This is the last puppet line which hangs and nothing happens afterwards: Aug 19 21:13:59 controller-0.redhat.local puppet-user[25453]: Debug: Executing: '/usr/bin/getcert request -I haproxy-ctlplane-cert -f /etc/pki/tls/certs/haproxy/overcloud-haproxy-ctlplane.crt -c IPA -N CN=controller-0.ctlplane.redhat.local -K haproxy/controller-0.ctlplane.redhat.local -D overcloud.ctlplane.redhat.local -D controller-0.ctlplane.redhat.local -U id-kp-clientAuth -U id-kp-serverAuth -C /usr/bin/certmonger-haproxy-refresh.sh reload ctlplane -w -k /etc/pki/tls/private/haproxy/overcloud-haproxy-ctlplane.key' After that the deployment times out with: TASK [Run puppet host configuration for step 1] ******************************** Monday 19 August 2019 21:12:46 +0000 (0:00:00.313) 0:07:45.479 ********* Overcloud configuration failed. Ansible timed out at 5180 seconds.
*** Bug 1743523 has been marked as a duplicate of this bug. ***
Created attachment 1606069 [details] permissive audit.log
I am seeing a lot of pkill commands on many different targets. That reminds me of bug 1703242 and the design issues discussed upstream in https://specs.openstack.org/openstack/tripleo-specs/specs/train/certificate-management.html#problem-description, though that other bug was around renewing the certificates as opposed to a new deployment. The THT file you link to (https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/apache/apache-baremetal-puppet.j2.yaml#L118) seems different than the file mentioned on that bug (certmonger-haproxy-refresh.sh) although you do have it in comment 3. How did you narrow down to these two denials in A) for pkill? Regarding B), the two denials related to ca.crt mention system_u:object_r:etc_t:s0 as a target so we can probably ignore the unconfined file in /var/lib. I would have expected the file to be of type cert_t rather the more generic etc_t, but I suppose we can allow certmonger to write to etc_t for now. There are a few other denials related to certmonger trying to access hiera files like /etc/puppet/hiera.yaml and /etc/puppet/hieradata/controller_extraconfig.json. Do you reckon these are unlikely to be causing this particular issue at this point?
For better visibility on how many contexts are touched by certmonger in this audit file: #============= certmonger_t ============== allow certmonger_t NetworkManager_t:dir { getattr search }; allow certmonger_t NetworkManager_t:file { open read }; allow certmonger_t auditd_t:dir { getattr search }; allow certmonger_t auditd_t:file { open read }; allow certmonger_t chronyd_t:dir { getattr search }; allow certmonger_t chronyd_t:file { open read }; allow certmonger_t cluster_t:dir { getattr search }; allow certmonger_t cluster_t:file { open read }; allow certmonger_t container_runtime_exec_t:file { execute execute_no_trans getattr open read }; allow certmonger_t container_share_t:blk_file { create unlink }; allow certmonger_t container_share_t:dir { add_name getattr ioctl mounton open read remove_name setattr write }; allow certmonger_t container_share_t:file { getattr lock open read write }; allow certmonger_t container_var_lib_t:dir getattr; allow certmonger_t container_var_lib_t:file { getattr lock open read write }; allow certmonger_t container_var_run_t:dir setattr; allow certmonger_t container_var_run_t:file { getattr lock open read write }; allow certmonger_t crond_t:dir { getattr search }; allow certmonger_t crond_t:file { open read }; allow certmonger_t etc_t:file write; # <---- related to ca.crt, cf. comments above allow certmonger_t fs_t:filesystem unmount; allow certmonger_t getty_t:dir { getattr search }; allow certmonger_t getty_t:file { open read }; allow certmonger_t gssproxy_t:dir { getattr search }; allow certmonger_t gssproxy_t:file { open read }; allow certmonger_t iptables_exec_t:file { execute execute_no_trans getattr open read }; allow certmonger_t irqbalance_t:dir { getattr search }; allow certmonger_t irqbalance_t:file { open read }; allow certmonger_t kernel_t:dir { getattr search }; allow certmonger_t kernel_t:file { open read }; allow certmonger_t oddjob_t:dir { getattr search }; allow certmonger_t oddjob_t:file { open read }; allow certmonger_t openvswitch_t:dir { getattr search }; allow certmonger_t openvswitch_t:file { open read }; allow certmonger_t policykit_t:dir { getattr search }; allow certmonger_t policykit_t:file { open read }; allow certmonger_t puppet_etc_t:file { getattr ioctl open read }; allow certmonger_t rhsmcertd_t:dir { getattr search }; allow certmonger_t rhsmcertd_t:file { open read }; allow certmonger_t rngd_t:dir { getattr search }; allow certmonger_t rngd_t:file { open read }; allow certmonger_t rpcbind_t:dir { getattr search }; allow certmonger_t rpcbind_t:file { open read }; allow certmonger_t self:capability { mknod sys_resource }; allow certmonger_t self:netlink_netfilter_socket { bind create getattr }; allow certmonger_t self:process setrlimit; allow certmonger_t sshd_t:dir { getattr search }; allow certmonger_t sshd_t:file { open read }; allow certmonger_t sssd_t:dir { getattr search }; allow certmonger_t sssd_t:file { open read }; allow certmonger_t sysctl_net_t:dir search; allow certmonger_t sysctl_net_t:file { open read }; allow certmonger_t syslogd_t:dir { getattr search }; allow certmonger_t syslogd_t:file { open read }; allow certmonger_t system_dbusd_t:dir { getattr search }; allow certmonger_t system_dbusd_t:file { open read }; allow certmonger_t systemd_logind_t:dir { getattr search }; allow certmonger_t systemd_logind_t:file { open read }; allow certmonger_t systemd_machined_t:dir { getattr search }; allow certmonger_t systemd_machined_t:file { open read }; allow certmonger_t tuned_t:dir { getattr search }; allow certmonger_t tuned_t:file { open read }; allow certmonger_t udev_t:dir { getattr search }; allow certmonger_t udev_t:file { open read }; allow certmonger_t unconfined_t:dir { getattr search }; allow certmonger_t unconfined_t:file { open read }; allow certmonger_t virt_qemu_ga_t:dir { getattr search }; allow certmonger_t virt_qemu_ga_t:file { open read }; allow certmonger_t virtd_t:dir { getattr search }; allow certmonger_t virtd_t:file { open read }; allow certmonger_t virtd_unit_file_t:service reload; On the plus side, the majority appear to be read-only permissions...?
On B), I can confirm that the issue accessing ca.crt would manifest itself in that way (a timeout on getcert request), a similar timeout occurred in https://bugs.launchpad.net/tripleo/+bug/1830734 . Option "-w" in getcert doesn't timeout.
(In reply to Julie Pichon from comment #8) > I am seeing a lot of pkill commands on many different targets. That reminds > me of bug 1703242 and the design issues discussed upstream in > https://specs.openstack.org/openstack/tripleo-specs/specs/train/certificate- > management.html#problem-description, though that other bug was around > renewing the certificates as opposed to a new deployment. > > The THT file you link to > (https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/ > apache/apache-baremetal-puppet.j2.yaml#L118) seems different than the file > mentioned on that bug (certmonger-haproxy-refresh.sh) although you do have > it in comment 3. How did you narrow down to these two denials in A) for > pkill? Oh so I blindly took the last two pkill denials under the (maybe wrong?) assumption that we had a generic 'allow certmonger to run pkill' kind of issue. In terms of code we have pkill (or kill in the case of 3) in the following places: 1) https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/apache/apache-baremetal-puppet.j2.yaml#L118 2) https://github.com/openstack/puppet-tripleo/blob/master/files/certmonger-redis-refresh.sh#L14 3) https://github.com/openstack/puppet-tripleo/blob/master/files/certmonger-haproxy-refresh.sh#L46 (this one uses kill and not pkill) > Regarding B), the two denials related to ca.crt mention > system_u:object_r:etc_t:s0 as a target so we can probably ignore the > unconfined file in /var/lib. I would have expected the file to be of type > cert_t rather the more generic etc_t, but I suppose we can allow certmonger > to write to etc_t for now. Agree on the /var/lib > There are a few other denials related to certmonger trying to access hiera > files like /etc/puppet/hiera.yaml and > /etc/puppet/hieradata/controller_extraconfig.json. Do you reckon these are > unlikely to be causing this particular issue at this point? I think I missed those tbh. They do look relevant although am not entirely sure. Want to throw a patch my way for fixing B) and we see if the deploy works and then we investigate A) and the hiera denials a bit more in detail?
The denied write to etc_t is likely due to the use of the '-F' option to 'getcert request' seen in this snippet of the provided debug output: Aug 19 21:13:58 controller-0.redhat.local puppet-user[25453]: Warning: Could not get certificate: Execution of '/usr/bin/getcert request -I libvirt-vnc-client-cert -f /etc/pki/libvirt-vnc/client-cert.pem -c IPA -N CN=controller-0.internalapi.redhat.local -K libvirt-vnc/controller-0.internalapi.redhat.local -D controller-0.internalapi.redhat.local -C systemctl reload libvirtd -w -k /etc/pki/libvirt-vnc/client-key.pem -F /etc/ipa/ca.crt' returned 4: New signing request "libvirt-vnc-client-cert" added. The '-F' option will attempt to download and save the CA certificate into the location provided. There was a recent change to make libvirt-vnc use the CA cert from /etc/ipa/ca.crt, as it would previously store its own copy. Perhaps there is a remnant that uses '-F' that needs to be cleaned up. We definitely don't want to have certmonger attempt to overwrite /etc/ipa/ca.crt, as this is obtained when the node is enrolled with IdM. If we find and clean this up, the write AVC should go away. I am not sure if that is related to the timeout.
This problem may have started happening when this change merged: https://review.opendev.org/#/c/673749/ As mentioned in my previous comment, there is likely existing code for using 'getcert request ... -F', the the change in path to the CA cert is causing troubles due to the use of this option.
Useful comments, thank you - dumping some additional information after looking a bit more at the non-read-only denials: container_runtime_exec_t (e.g. execute, execute_no_trans): - /usr/bin/certmonger-rabbitmq-refresh.sh - /usr/bin/certmonger-haproxy-refresh.sh reload ctlplane - podman exec cp /var/lib/kolla/config_files/src-tls/etc/pki/tls/certs/haproxy/overcloud-haproxy-ctlplane.pem /etc/pki/tls/certs/ - podman exec cp /var/lib/kolla/config_files/src-tls/etc/pki/tls/certs/haproxy/overcloud-haproxy-ctlplane.pem /etc/pki/tls/certs/ I suspect we will need these? Though not entirely sure why the podman exec is running with certmonger_t as source context. container_share_t starts to looks more suspicious, even though it starts from the same two scripts: - /usr/bin/certmonger-rabbitmq-refresh.sh - /usr/bin/certmonger-haproxy-refresh.sh reload ctlplane Lots of mentions of the container storage overlays, not sure that we want certmonger to be able to mess with these: type=AVC msg=audit(20/08/19 11:22:32.425:7865) : avc: denied { ioctl } for pid=26655 comm=podman path=/var/lib/containers/storage/overlay dev="vda2" ino=92275029 ioctlcmd=0x581f scontext=system_u:system_r:certmonger_t:s0 tcontext=unconfined_u:object_r:container_share_t:s0 tclass=dir permissive=1 type=AVC msg=audit(20/08/19 11:22:32.425:7866) : avc: denied { unlink } for pid=26655 comm=podman name=backingFsBlockDev dev="vda2" ino=92275030 scontext=system_u:system_r:certmonger_t:s0 tcontext=unconfined_u:object_r:container_share_t:s0 tclass=blk_file permissive=1 type=AVC msg=audit(20/08/19 11:22:32.425:7866) : avc: denied { remove_name } for pid=26655 comm=podman name=backingFsBlockDev dev="vda2" ino=92275030 scontext=system_u:system_r:certmonger_t:s0 tcontext=unconfined_u:object_r:container_share_t:s0 tclass=dir permissive=1 type=AVC msg=audit(20/08/19 11:22:32.425:7866) : avc: denied { write } for pid=26655 comm=podman name=overlay dev="vda2" ino=92275029 scontext=system_u:system_r:certmonger_t:s0 tcontext=unconfined_u:object_r:container_share_t:s0 tclass=dir permissive=1 type=AVC msg=audit(20/08/19 11:22:32.425:7868) : avc: denied { read write } for pid=26655 comm=podman name=layers.lock dev="vda2" ino=46138520 scontext=system_u:system_r:certmonger_t:s0 tcontext=unconfined_u:object_r:container_share_t:s0 tclass=file permissive=1 Similarly with container_var_lib_t: type=AVC msg=audit(20/08/19 11:22:32.423:7856) : avc: denied { lock } for pid=26655 comm=podman path=/var/lib/containers/storage/libpod/bolt_state.db dev="vda2" ino=85984286 scontext=system_u:system_r:certmonger_t:s0 tcontext=unconfined_u:object_r:container_var_lib_t:s0 tclass=file permissive=1 type=AVC msg=audit(20/08/19 11:22:32.423:7855) : avc: denied { read write } for pid=26655 comm=podman name=bolt_state.db dev="vda2" ino=85984286 scontext=system_u:system_r:certmonger_t:s0 tcontext=unconfined_u:object_r:container_var_lib_t:s0 tclass=file permissive=1 And again for container_var_run_t: type=AVC msg=audit(20/08/19 11:22:32.432:7876) : avc: denied { write } for pid=26655 comm=podman name=alive.lck dev="tmpfs" ino=90484 scontext=system_u:system_r:certmonger_t:s0 tcontext=unconfined_u:object_r:container_var_run_t:s0 tclass=file permissive=1 type=AVC msg=audit(20/08/19 11:22:39.634:7971) : avc: denied { lock } for pid=26953 comm=podman path=/run/libpod/alive.lck dev="tmpfs" ino=90484 scontext=system_u:system_r:certmonger_t:s0 tcontext=unconfined_u:object_r:container_var_run_t:s0 tclass=file permissive=1 fs_t:filesystem unmount AVCs seems to originate from running "podman ps --format={{.Names}}"? iptables_exec_t (execute, execute_no_trans) - /usr/sbin/iptables --version coming from "comm=podman path=/usr/sbin/xtables-nft-multi"? Again something related to podman... The netlink_netfilter_socket AVCs are all related to iptables as well. self:capability, setrlimit, again, the script and some podman commands that look suspicious: /usr/bin/certmonger-rabbitmq-refresh.sh type=AVC msg=audit(20/08/19 11:22:32.422:7853) : avc: denied { setrlimit } for pid=26655 comm=podman scontext=system_u:system_r:certmonge r_t:s0 tcontext=system_u:system_r:certmonger_t:s0 tclass=process permissive=1 type=AVC msg=audit(20/08/19 11:22:32.422:7853) : avc: denied { sys_resource } for pid=26655 comm=podman capability=sys_resource scontext=system_u:system_r:certmonger_t:s0 tcontext=system_u:system_r:certmonger_t:s0 tclass=capability permissive=1 type=AVC msg=audit(20/08/19 11:22:32.425:7867) : avc: denied { create } for pid=26655 comm=podman name=backingFsBlockDev scontext=system_ u:system_r:certmonger_t:s0 tcontext=system_u:object_r:container_share_t:s0 tclass=blk_file permissive=1 type=AVC msg=audit(20/08/19 11:22:32.425:7867) : avc: denied { add_name } for pid=26655 comm=podman name=backingFsBlockDev scontext=syste m_u:system_r:certmonger_t:s0 tcontext=unconfined_u:object_r:container_share_t:s0 tclass=dir permissive=1 type=AVC msg=audit(20/08/19 11:22:32.425:7867) : avc: denied { mknod } for pid=26655 comm=podman capability=mknod scontext=system_u:system_r:certmonger_t:s0 tcontext=system_u:system_r:certmonger_t:s0 tclass=capability permissive=1 "podman exec cp /var/lib/kolla/config_files/src-tls/etc/pki/tls/certs/rabbitmq.crt /etc/pki/tls/certs/rabbitmq.crt" leads to "type=AVC msg=audit(20/08/19 11:22:32.585:7879) : avc: denied { sys_resource } for pid=26669 comm=podman capability=sys_resource scontext=system_u:system_r:certmonger_t:s0 tcontext=system_u:system_r:certmonger_t:s0 tclass=capability permissive=1" There is only the one virtd denial, not sure what it's related to although certmonger_t is the source context again: type=USER_AVC msg=audit(20/08/19 11:22:38.440:7949) : pid=1 uid=root auid=unset ses=unset subj=system_u:system_r:init_t:s0 msg='avc: denied { reload } for auid=unset uid=root gid=root path=/usr/lib/systemd/system/libvirtd.service cmdline="" scontext=system_u:system_r:certmonger_t:s0 tcontext=system_u:object_r:virtd_unit_file_t:s0 tclass=service permissive=1 exe=/usr/lib/systemd/systemd sauid=root hostname=? addr=? terminal=?'
Assuming my theory outlined in comment#12 is correct, these snippets of code are where the option is specified/used for the VNC cert (there may be other locations with the same issue): https://github.com/openstack/puppet-tripleo/blob/6f790d624198eeb9219b26848c05f0edafd09dab/manifests/certmonger/libvirt_vnc.pp#L43-L45 https://github.com/openstack/tripleo-heat-templates/blob/456c8da28cd94dad380da34d5cfd54682dbf9549/deployment/nova/nova-libvirt-container-puppet.yaml#L466-L472
(In reply to Michele Baldessari from comment #11) > Want to throw a patch my way for fixing B) and we see if the deploy works > and then we investigate A) and the hiera denials a bit more in detail? Nathan's comments suggest that this probably should be fixed as a code change somewhere instead of a selinux rule, but I'm not sure how to easily test with a new rule if we wanted to try it. Adding the rule manually won't be enough pre-deploy, since the denials were seen on a controller. If I create a test rpm, would the controllers/overcloud nodes be able to pick it up during the deploy also, rather than whatever is in the compose?
We suspect that our recently merged workaround[1] for an underlying bug in certmonger[2] is the source of this. We have submitted a revert[3] for each release while we investigate further. [1] - https://review.opendev.org/#/c/673887/ [2] - https://bugzilla.redhat.com/show_bug.cgi?id=1710632 [3] - https://review.opendev.org/#/c/677548/
Thank you for the update. Is this expected to resolve the main "hanging" issue, and so we can handle the other AVC denials normally as non-blockers?
There were thoughts that maybe it's hiera trying to parse everything causing all the read-only denials, but from what I can tell the hiera part should be resolved with "allow certmonger_t puppet_etc_t:file { getattr ioctl open read }" as the files appear to be labelled as expected. Matching proctitle to the read-only AVCs, this is what I see: proctitle=pkill -o -HUP stunnel (getattr/open/read/search denials) - auditd_t - chronyd_t - cluster_t - crond_t - getty_t - gssproxy_t - irqbalance_t - kernel_t - NetworkManager_t - oddjob_t - openvswitch_t - policykit_t - rhsmcertd_t - rngd_t - rpcbind_t - sshd_t - sssd_t - syslogd_t - system_dbusd_t - systemd_logind_t - systemd_machined_t - tuned_t - udev_t - unconfined_t:s0-s0:c0.c1023 - virt_qemu_ga_t - virtd_t proctitle=pkill -USR1 httpd (open/search/getattr/read denials) - chronyd_t - gssproxy_t - irqbalance_t - kernel_t - policykit_t - rhsmcertd_t - system_dbusd_t I didn't see any other kill commands. The "podman ps" command at https://github.com/openstack/puppet-tripleo/blob/master/files/certmonger-haproxy-refresh.sh#L36 seems responsible for many of the complicated denials in comment 13.
So, just as a data point. I ran the install with the revert in place and was able to get a successful install. I'm trying again on a different machine. At the same time, though, I noticed that I first needed to manually delete the hosts from IPA from the previous run. This appears to be because the novajoin_notifier is not receiving notifications from nova. It is supposed to receive notifications on server delete on a novajoin_notifications queue. Not sure if this is selinux related - or basically another bug. Still investigating ..
Update: Ade and I poked around on the deployment and found that the notifications driver config setting: [oslo_messaging_notifications]/driver is set to 'noop' which will cause services (like nova) not to emit any notifications. The setting should be [oslo_messaging_notifications]/driver=messagingv2 in order for services to emit notifications. Alex Schultz let us know that this is related to the following recent change to upstream tripleo: https://review.opendev.org/674639 which will result in setting [oslo_messaging_notifications]/driver=noop when telemetry is not enabled. The ability to set driver=noop was added in an older change: https://review.opendev.org/481726 which environments/disable-telemetry.yaml makes use of: https://github.com/openstack/tripleo-heat-templates/blob/stable/stein/environments/disable-telemetry.yaml The recent tripleo change was backported to stein: https://review.opendev.org/675114 and we picked it up in OSP 15 because we're pulling from upstream. A fix is being investigated from the tripleo side in order to enable notifications when novajoin is present in a deployment.
Updating component to THT. We have located the revert in build openstack-tripleo-heat-templates-10.6.1-0.20190823114746.30390bc.el8ost. Moving RHBZ to MODIFIED. Will follow up with Rel/Del for switch to ON_QA.
For reference (hit some issue with the bugzilla 'clone' functionality), opened bug 1745142 to deal with the remaining AVCs.
*** Bug 1747957 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:2811