Description of problem: It is impossible to redeploy overcloud without deleting the stack because of failing chronyd restart. How reproducible: 100% Steps to Reproduce: 1. Deploy overcloud 2. Work with the env few hours 3. Try to redeploy without deleting the stack Actual results: TASK [chrony : Install chrony configuration file] ****************************** Tuesday 17 December 2019 20:12:16 +0000 (0:00:01.070) 0:07:08.082 ****** ok: [controller-0] => {"changed": false, "checksum": "58de69866bd94964f6ba27170ebddcb8d87d776c", "dest": "/etc/chrony.conf", "gid": 0, "group": "root", "mode": "0644", "owner": "root", "path": "/etc/chrony.conf", "secontext": "system_u:object_r:etc_t:s0", "size": 256, "state": "file", "uid": 0} ok: [controller-1] => {"changed": false, "checksum": "58de69866bd94964f6ba27170ebddcb8d87d776c", "dest": "/etc/chrony.conf", "gid": 0, "group": "root", "mode": "0644", "owner": "root", "path": "/etc/chrony.conf", "secontext": "system_u:object_r:etc_t:s0", "size": 256, "state": "file", "uid": 0} ok: [controller-2] => {"changed": false, "checksum": "58de69866bd94964f6ba27170ebddcb8d87d776c", "dest": "/etc/chrony.conf", "gid": 0, "group": "root", "mode": "0644", "owner": "root", "path": "/etc/chrony.conf", "secontext": "system_u:object_r:etc_t:s0", "size": 256, "state": "file", "uid": 0} TASK [chrony : Ensure chronyd is running] ************************************** Tuesday 17 December 2019 20:12:17 +0000 (0:00:01.301) 0:07:09.384 ****** fatal: [controller-0]: FAILED! => {"changed": false, "msg": "Unable to start service chronyd: Job for chronyd.service failed because the control process exited with error code.\nSee \"systemctl status chronyd.service\" and \"journalctl -xe\" for details.\n"} fatal: [controller-1]: FAILED! => {"changed": false, "msg": "Unable to start service chronyd: Job for chronyd.service failed because the control process exited with error code.\nSee \"systemctl status chronyd.service\" and \"journalctl -xe\" for details.\n"} fatal: [controller-2]: FAILED! => {"changed": false, "msg": "Unable to start service chronyd: Job for chronyd.service failed because the control process exited with error code.\nSee \"systemctl status chronyd.service\" and \"journalctl -xe\" for details.\n"} NO MORE HOSTS LEFT ************************************************************* PLAY RECAP ********************************************************************* ceph-0 : ok=71 changed=16 unreachable=0 failed=0 skipped=158 rescued=0 ignored=0 compute-0 : ok=68 changed=14 unreachable=0 failed=0 skipped=158 rescued=0 ignored=0 compute-1 : ok=68 changed=14 unreachable=0 failed=0 skipped=158 rescued=0 ignored=0 controller-0 : ok=152 changed=54 unreachable=0 failed=1 skipped=193 rescued=0 ignored=2 controller-1 : ok=152 changed=54 unreachable=0 failed=1 skipped=193 rescued=0 ignored=2 controller-2 : ok=152 changed=54 unreachable=0 failed=1 skipped=193 rescued=0 ignored=2 undercloud : ok=3 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 Tuesday 17 December 2019 20:12:18 +0000 (0:00:00.975) 0:07:10.360 ****** =============================================================================== Overcloud configuration failed. Ansible failed, check log at /var/lib/mistral/overcloud/ansible.log. Expected results: Redeploy succeeds. Additional info: 2019-12-17 20:12:11,093 p=86790 u=mistral | TASK [Install, Configure and Run Chrony] *************************************** 2019-12-17 20:12:11,094 p=86790 u=mistral | Tuesday 17 December 2019 20:12:11 +0000 (0:00:00.548) 0:07:02.888 ****** 2019-12-17 20:12:12,317 p=86790 u=mistral | TASK [chrony : Load distro-specific variables] ********************************* 2019-12-17 20:12:12,317 p=86790 u=mistral | Tuesday 17 December 2019 20:12:12 +0000 (0:00:01.223) 0:07:04.112 ****** 2019-12-17 20:12:12,584 p=86790 u=mistral | ok: [controller-0] => (item=/usr/share/ansible/roles/chrony/vars/RedHat.yml) => {"ansible_facts": {"chrony_config_file_location": "/etc/chrony.conf", "chrony_driftfile_path": "/var/lib/chrony/drift", "chrony_logdir_path": "/var/log/chrony", "chrony_makestep": "1.0 3", "chrony_package_name": "chrony", "chrony_rtc_settings": ["rtcsync"], "chrony_service_name": "chronyd"}, "ansible_included_var_files": ["/usr/share/ansible/roles/chrony/vars/RedHat.yml"], "ansible_loop_var": "item", "changed": false, "item": "/usr/share/ansible/roles/chrony/vars/RedHat.yml"} 2019-12-17 20:12:12,887 p=86790 u=mistral | ok: [controller-1] => (item=/usr/share/ansible/roles/chrony/vars/RedHat.yml) => {"ansible_facts": {"chrony_config_file_location": "/etc/chrony.conf", "chrony_driftfile_path": "/var/lib/chrony/drift", "chrony_logdir_path": "/var/log/chrony", "chrony_makestep": "1.0 3", "chrony_package_name": "chrony", "chrony_rtc_settings": ["rtcsync"], "chrony_service_name": "chronyd"}, "ansible_included_var_files": ["/usr/share/ansible/roles/chrony/vars/RedHat.yml"], "ansible_loop_var": "item", "changed": false, "item": "/usr/share/ansible/roles/chrony/vars/RedHat.yml"} 2019-12-17 20:12:12,919 p=86790 u=mistral | ok: [controller-2] => (item=/usr/share/ansible/roles/chrony/vars/RedHat.yml) => {"ansible_facts": {"chrony_config_file_location": "/etc/chrony.conf", "chrony_driftfile_path": "/var/lib/chrony/drift", "chrony_logdir_path": "/var/log/chrony", "chrony_makestep": "1.0 3", "chrony_package_name": "chrony", "chrony_rtc_settings": ["rtcsync"], "chrony_service_name": "chronyd"}, "ansible_included_var_files": ["/usr/share/ansible/roles/chrony/vars/RedHat.yml"], "ansible_loop_var": "item", "changed": false, "item": "/usr/share/ansible/roles/chrony/vars/RedHat.yml"} 2019-12-17 20:12:13,088 p=86790 u=mistral | TASK [chrony : Install chronyd] ************************************************ 2019-12-17 20:12:13,089 p=86790 u=mistral | Tuesday 17 December 2019 20:12:13 +0000 (0:00:00.771) 0:07:04.883 ****** 2019-12-17 20:12:13,954 p=86790 u=mistral | included: /usr/share/ansible/roles/chrony/tasks/install.yml for controller-0, controller-1, controller-2 2019-12-17 20:12:14,131 p=86790 u=mistral | TASK [chrony : Install chronyd package] **************************************** 2019-12-17 20:12:14,132 p=86790 u=mistral | Tuesday 17 December 2019 20:12:14 +0000 (0:00:01.042) 0:07:05.926 ****** 2019-12-17 20:12:14,313 p=86790 u=mistral | skipping: [controller-0] => {"changed": false, "skip_reason": "Conditional result was False"} 2019-12-17 20:12:14,489 p=86790 u=mistral | skipping: [controller-1] => {"changed": false, "skip_reason": "Conditional result was False"} 2019-12-17 20:12:14,506 p=86790 u=mistral | skipping: [controller-2] => {"changed": false, "skip_reason": "Conditional result was False"} 2019-12-17 20:12:14,680 p=86790 u=mistral | TASK [chrony : Upgrade chronyd] ************************************************ 2019-12-17 20:12:14,680 p=86790 u=mistral | Tuesday 17 December 2019 20:12:14 +0000 (0:00:00.548) 0:07:06.475 ****** 2019-12-17 20:12:14,858 p=86790 u=mistral | skipping: [controller-0] => {"changed": false, "skip_reason": "Conditional result was False"} 2019-12-17 20:12:15,031 p=86790 u=mistral | skipping: [controller-1] => {"changed": false, "skip_reason": "Conditional result was False"} 2019-12-17 20:12:15,048 p=86790 u=mistral | skipping: [controller-2] => {"changed": false, "skip_reason": "Conditional result was False"} 2019-12-17 20:12:15,216 p=86790 u=mistral | TASK [chrony : Configure chronyd] ********************************************** 2019-12-17 20:12:15,217 p=86790 u=mistral | Tuesday 17 December 2019 20:12:15 +0000 (0:00:00.536) 0:07:07.011 ****** 2019-12-17 20:12:16,103 p=86790 u=mistral | included: /usr/share/ansible/roles/chrony/tasks/config.yml for controller-0, controller-1, controller-2 2019-12-17 20:12:16,287 p=86790 u=mistral | TASK [chrony : Install chrony configuration file] ****************************** 2019-12-17 20:12:16,287 p=86790 u=mistral | Tuesday 17 December 2019 20:12:16 +0000 (0:00:01.070) 0:07:08.082 ****** 2019-12-17 20:12:17,054 p=86790 u=mistral | ok: [controller-0] => {"changed": false, "checksum": "58de69866bd94964f6ba27170ebddcb8d87d776c", "dest": "/etc/chrony.conf", "gid": 0, "group": "root", "mode": "0644", "owner": "root", "path": "/etc/chrony.conf", "secontext": "system_u:object_r:etc_t:s0", "size": 256, "state": "file", "uid": 0} 2019-12-17 20:12:17,266 p=86790 u=mistral | ok: [controller-1] => {"changed": false, "checksum": "58de69866bd94964f6ba27170ebddcb8d87d776c", "dest": "/etc/chrony.conf", "gid": 0, "group": "root", "mode": "0644", "owner": "root", "path": "/etc/chrony.conf", "secontext": "system_u:object_r:etc_t:s0", "size": 256, "state": "file", "uid": 0} 2019-12-17 20:12:17,413 p=86790 u=mistral | ok: [controller-2] => {"changed": false, "checksum": "58de69866bd94964f6ba27170ebddcb8d87d776c", "dest": "/etc/chrony.conf", "gid": 0, "group": "root", "mode": "0644", "owner": "root", "path": "/etc/chrony.conf", "secontext": "system_u:object_r:etc_t:s0", "size": 256, "state": "file", "uid": 0} 2019-12-17 20:12:17,589 p=86790 u=mistral | TASK [chrony : Ensure chronyd is running] ************************************** 2019-12-17 20:12:17,589 p=86790 u=mistral | Tuesday 17 December 2019 20:12:17 +0000 (0:00:01.301) 0:07:09.384 ****** 2019-12-17 20:12:18,199 p=86790 u=mistral | fatal: [controller-0]: FAILED! => {"changed": false, "msg": "Unable to start service chronyd: Job for chronyd.service failed because the control process exited with error code.\nSee \"systemctl status chronyd.service\" and \"journalctl -xe\" for details.\n"} 2019-12-17 20:12:18,422 p=86790 u=mistral | fatal: [controller-1]: FAILED! => {"changed": false, "msg": "Unable to start service chronyd: Job for chronyd.service failed because the control process exited with error code.\nSee \"systemctl status chronyd.service\" and \"journalctl -xe\" for details.\n"} 2019-12-17 20:12:18,561 p=86790 u=mistral | fatal: [controller-2]: FAILED! => {"changed": false, "msg": "Unable to start service chronyd: Job for chronyd.service failed because the control process exited with error code.\nSee \"systemctl status chronyd.service\" and \"journalctl -xe\" for details.\n"} 2019-12-17 20:12:18,562 p=86790 u=mistral | NO MORE HOSTS LEFT ************************************************************* 2019-12-17 20:12:18,563 p=86790 u=mistral | PLAY RECAP ********************************************************************* 2019-12-17 20:12:18,564 p=86790 u=mistral | ceph-0 : ok=71 changed=16 unreachable=0 failed=0 skipped=158 rescued=0 ignored=0 2019-12-17 20:12:18,564 p=86790 u=mistral | compute-0 : ok=68 changed=14 unreachable=0 failed=0 skipped=158 rescued=0 ignored=0 2019-12-17 20:12:18,564 p=86790 u=mistral | compute-1 : ok=68 changed=14 unreachable=0 failed=0 skipped=158 rescued=0 ignored=0 2019-12-17 20:12:18,564 p=86790 u=mistral | controller-0 : ok=152 changed=54 unreachable=0 failed=1 skipped=193 rescued=0 ignored=2 2019-12-17 20:12:18,564 p=86790 u=mistral | controller-1 : ok=152 changed=54 unreachable=0 failed=1 skipped=193 rescued=0 ignored=2 2019-12-17 20:12:18,564 p=86790 u=mistral | controller-2 : ok=152 changed=54 unreachable=0 failed=1 skipped=193 rescued=0 ignored=2 2019-12-17 20:12:18,565 p=86790 u=mistral | undercloud : ok=3 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 2019-12-17 20:12:18,565 p=86790 u=mistral | Tuesday 17 December 2019 20:12:18 +0000 (0:00:00.975) 0:07:10.360 ****** 2019-12-17 20:12:18,565 p=86790 u=mistral | ===============================================================================
It would help if you can provide the output of "systemctl status chronyd.service".
ACK, I will when I will reproduce this again.
Maybe related: I have seen that as well with osp15 and RHEL8.1. It helped to switch selinux to permissive. In my case, I could not start chronyd at all, since it was not allowed to access /var/run.
Sounds more like an actual chronyd issue since we don't run it in a container or do anything special. Will definitely need a reproducer.
I just ran a deployment and a subsequent deployment with the latest version of the code with no chronyd failures. If you are able to reproduce this, please provide the chronyd service failure logs. Additionally the templates used would be important as well. [heat-admin@overcloud-controller-0 ~]$ cat /etc/redhat-release Red Hat Enterprise Linux release 8.1 (Ootpa) [heat-admin@overcloud-controller-0 ~]$ logout Connection to 192.168.24.8 closed. (undercloud) [cloud-user@undercloud ~]$ rpm -qa | grep tripleo-heat-tem openstack-tripleo-heat-templates-11.3.1-0.20191212200219.5ca908c.el8ost.noarch
Just today, I ran into this again. /var/log/audit/audit.log contains this: [stack@kepler ~]$ sudo cat /var/log/audit/audit.log | grep chronyd type=AVC msg=audit(1578409563.716:21217): avc: denied { read } for pid=4011 comm="chronyd" name="run" dev="dm-0" ino=144 scontext=system_u:system_r:chronyd_t:s0 tcontext=system_u:object_r:container_file_t:s0 tclass=lnk_file permissive=0 type=SYSCALL msg=audit(1578409563.716:21217): arch=c000003e syscall=87 success=no exit=-13 a0=55a6b7a368b0 a1=7 a2=0 a3=ffffffff items=0 ppid=1 pid=4011 auid=4294967295 uid=990 gid=986 euid=990 suid=990 fsuid=990 egid=986 sgid=986 fsgid=986 tty=(none) ses=4294967295 comm="chronyd" exe="/usr/sbin/chronyd" subj=system_u:system_r:chronyd_t:s0 key=(null)ARCH=x86_64 SYSCALL=unlink AUID="unset" UID="chrony" GID="chrony" EUID="chrony" SUID="chrony" FSUID="chrony" EGID="chrony" SGID="chrony" FSGID="chrony" type=PROCTITLE msg=audit(1578409563.716:21217): proctitle="/usr/sbin/chronyd" type=AVC msg=audit(1578409563.717:21218): avc: denied { read } for pid=4011 comm="chronyd" name="run" dev="dm-0" ino=144 scontext=system_u:system_r:chronyd_t:s0 tcontext=system_u:object_r:container_file_t:s0 tclass=lnk_file permissive=0 type=SYSCALL msg=audit(1578409563.717:21218): arch=c000003e syscall=87 success=no exit=-13 a0=55a6b7a368e0 a1=1 a2=0 a3=2ac70da66b8f79 items=0 ppid=1 pid=4011 auid=4294967295 uid=990 gid=986 euid=990 suid=990 fsuid=990 egid=986 sgid=986 fsgid=986 tty=(none) ses=4294967295 comm="chronyd" exe="/usr/sbin/chronyd" subj=system_u:system_r:chronyd_t:s0 key=(null)ARCH=x86_64 SYSCALL=unlink AUID="unset" UID="chrony" GID="chrony" EUID="chrony" SUID="chrony" FSUID="chrony" EGID="chrony" SGID="chrony" FSGID="chrony" type=PROCTITLE msg=audit(1578409563.717:21218): proctitle="/usr/sbin/chronyd" type=SERVICE_START msg=audit(1578409563.868:21235): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=chronyd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'UID="root" AUID="unset" type=SERVICE_STOP msg=audit(1578409563.868:21236): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=chronyd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'UID="root" AUID="unset" rpm -q openstack-tripleo-heat-templates openstack-tripleo-heat-templates-11.3.2-0.20200109050651.8f93d27.el8ost.noarch openstack-selinux-0.8.20-0.20191202205815.09846a2.el8ost.noarch container-selinux-2.123.0-2.module+el8.1.0+4900+9d7326b8.noarch
Hello there, SELinux is preventing chronyd_t to access container_file_t - I'm pretty sure this is due to bind-mount with the ":z" flag. Apparently, it seems to be due to some collectd|metric|monitoring container. We can work together on this issue if you want, Martin. Lemme know if you have some available env so that we can inspect the relevant containers|rights|others. Cheers, C.
FYI https://github.com/moby/moby/issues/30934#issuecomment-352015746
I am quite sure we should just drop the z flag https://github.com/openstack/tripleo-heat-templates/blob/ce70482aa2e9cea9fa67bda05efdf27c24e72157/deployment/metrics/collectd-container-puppet.yaml#L633
According to our record this is already released and tested/working.