Bug 1784608 - chronyd deploy fails during redeploy
Summary: chronyd deploy fails during redeploy
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 16.0 (Train)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: beta
: 16.1 (Train on RHEL 8.2)
Assignee: Cédric Jeanneret
QA Contact: Sasha Smolyak
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-12-17 20:24 UTC by Martin Magr
Modified: 2020-03-25 15:45 UTC (History)
6 users (show)

Fixed In Version: openstack-tripleo-heat-templates-11.3.2-0.20200130233044.cc909b6.el8ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-25 15:44:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 704039 0 None MERGED Drop z flag on /var/run, it prevents redeployment 2020-06-06 10:51:48 UTC

Description Martin Magr 2019-12-17 20:24:38 UTC
Description of problem:
It is impossible to redeploy overcloud without deleting the stack because of failing chronyd restart.

How reproducible:
100%

Steps to Reproduce:
1. Deploy overcloud
2. Work with the env few hours
3. Try to redeploy without deleting the stack

Actual results:
TASK [chrony : Install chrony configuration file] ******************************
Tuesday 17 December 2019  20:12:16 +0000 (0:00:01.070)       0:07:08.082 ****** 
ok: [controller-0] => {"changed": false, "checksum": "58de69866bd94964f6ba27170ebddcb8d87d776c", "dest": "/etc/chrony.conf", "gid": 0, "group": "root", "mode": "0644", "owner": "root", "path": "/etc/chrony.conf", "secontext": "system_u:object_r:etc_t:s0", "size": 256, "state": "file", "uid": 0}
ok: [controller-1] => {"changed": false, "checksum": "58de69866bd94964f6ba27170ebddcb8d87d776c", "dest": "/etc/chrony.conf", "gid": 0, "group": "root", "mode": "0644", "owner": "root", "path": "/etc/chrony.conf", "secontext": "system_u:object_r:etc_t:s0", "size": 256, "state": "file", "uid": 0}
ok: [controller-2] => {"changed": false, "checksum": "58de69866bd94964f6ba27170ebddcb8d87d776c", "dest": "/etc/chrony.conf", "gid": 0, "group": "root", "mode": "0644", "owner": "root", "path": "/etc/chrony.conf", "secontext": "system_u:object_r:etc_t:s0", "size": 256, "state": "file", "uid": 0}

TASK [chrony : Ensure chronyd is running] **************************************
Tuesday 17 December 2019  20:12:17 +0000 (0:00:01.301)       0:07:09.384 ****** 
fatal: [controller-0]: FAILED! => {"changed": false, "msg": "Unable to start service chronyd: Job for chronyd.service failed because the control process exited with error code.\nSee \"systemctl status chronyd.service\" and \"journalctl -xe\" for details.\n"}
fatal: [controller-1]: FAILED! => {"changed": false, "msg": "Unable to start service chronyd: Job for chronyd.service failed because the control process exited with error code.\nSee \"systemctl status chronyd.service\" and \"journalctl -xe\" for details.\n"}
fatal: [controller-2]: FAILED! => {"changed": false, "msg": "Unable to start service chronyd: Job for chronyd.service failed because the control process exited with error code.\nSee \"systemctl status chronyd.service\" and \"journalctl -xe\" for details.\n"}

NO MORE HOSTS LEFT *************************************************************

PLAY RECAP *********************************************************************
ceph-0                     : ok=71   changed=16   unreachable=0    failed=0    skipped=158  rescued=0    ignored=0   
compute-0                  : ok=68   changed=14   unreachable=0    failed=0    skipped=158  rescued=0    ignored=0   
compute-1                  : ok=68   changed=14   unreachable=0    failed=0    skipped=158  rescued=0    ignored=0   
controller-0               : ok=152  changed=54   unreachable=0    failed=1    skipped=193  rescued=0    ignored=2   
controller-1               : ok=152  changed=54   unreachable=0    failed=1    skipped=193  rescued=0    ignored=2   
controller-2               : ok=152  changed=54   unreachable=0    failed=1    skipped=193  rescued=0    ignored=2   
undercloud                 : ok=3    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   

Tuesday 17 December 2019  20:12:18 +0000 (0:00:00.975)       0:07:10.360 ****** 
=============================================================================== 
Overcloud configuration failed.

Ansible failed, check log at /var/lib/mistral/overcloud/ansible.log.


Expected results:
Redeploy succeeds.

Additional info:
2019-12-17 20:12:11,093 p=86790 u=mistral |  TASK [Install, Configure and Run Chrony] ***************************************
2019-12-17 20:12:11,094 p=86790 u=mistral |  Tuesday 17 December 2019  20:12:11 +0000 (0:00:00.548)       0:07:02.888 ******
2019-12-17 20:12:12,317 p=86790 u=mistral |  TASK [chrony : Load distro-specific variables] *********************************
2019-12-17 20:12:12,317 p=86790 u=mistral |  Tuesday 17 December 2019  20:12:12 +0000 (0:00:01.223)       0:07:04.112 ******
2019-12-17 20:12:12,584 p=86790 u=mistral |  ok: [controller-0] => (item=/usr/share/ansible/roles/chrony/vars/RedHat.yml) => {"ansible_facts": {"chrony_config_file_location": "/etc/chrony.conf", "chrony_driftfile_path": "/var/lib/chrony/drift", "chrony_logdir_path": "/var/log/chrony", "chrony_makestep": "1.0 3", "chrony_package_name": "chrony", "chrony_rtc_settings": ["rtcsync"], "chrony_service_name": "chronyd"}, "ansible_included_var_files": ["/usr/share/ansible/roles/chrony/vars/RedHat.yml"], "ansible_loop_var": "item", "changed": false, "item": "/usr/share/ansible/roles/chrony/vars/RedHat.yml"}
2019-12-17 20:12:12,887 p=86790 u=mistral |  ok: [controller-1] => (item=/usr/share/ansible/roles/chrony/vars/RedHat.yml) => {"ansible_facts": {"chrony_config_file_location": "/etc/chrony.conf", "chrony_driftfile_path": "/var/lib/chrony/drift", "chrony_logdir_path": "/var/log/chrony", "chrony_makestep": "1.0 3", "chrony_package_name": "chrony", "chrony_rtc_settings": ["rtcsync"], "chrony_service_name": "chronyd"}, "ansible_included_var_files": ["/usr/share/ansible/roles/chrony/vars/RedHat.yml"], "ansible_loop_var": "item", "changed": false, "item": "/usr/share/ansible/roles/chrony/vars/RedHat.yml"}
2019-12-17 20:12:12,919 p=86790 u=mistral |  ok: [controller-2] => (item=/usr/share/ansible/roles/chrony/vars/RedHat.yml) => {"ansible_facts": {"chrony_config_file_location": "/etc/chrony.conf", "chrony_driftfile_path": "/var/lib/chrony/drift", "chrony_logdir_path": "/var/log/chrony", "chrony_makestep": "1.0 3", "chrony_package_name": "chrony", "chrony_rtc_settings": ["rtcsync"], "chrony_service_name": "chronyd"}, "ansible_included_var_files": ["/usr/share/ansible/roles/chrony/vars/RedHat.yml"], "ansible_loop_var": "item", "changed": false, "item": "/usr/share/ansible/roles/chrony/vars/RedHat.yml"}
2019-12-17 20:12:13,088 p=86790 u=mistral |  TASK [chrony : Install chronyd] ************************************************
2019-12-17 20:12:13,089 p=86790 u=mistral |  Tuesday 17 December 2019  20:12:13 +0000 (0:00:00.771)       0:07:04.883 ******
2019-12-17 20:12:13,954 p=86790 u=mistral |  included: /usr/share/ansible/roles/chrony/tasks/install.yml for controller-0, controller-1, controller-2
2019-12-17 20:12:14,131 p=86790 u=mistral |  TASK [chrony : Install chronyd package] ****************************************
2019-12-17 20:12:14,132 p=86790 u=mistral |  Tuesday 17 December 2019  20:12:14 +0000 (0:00:01.042)       0:07:05.926 ******
2019-12-17 20:12:14,313 p=86790 u=mistral |  skipping: [controller-0] => {"changed": false, "skip_reason": "Conditional result was False"}
2019-12-17 20:12:14,489 p=86790 u=mistral |  skipping: [controller-1] => {"changed": false, "skip_reason": "Conditional result was False"}
2019-12-17 20:12:14,506 p=86790 u=mistral |  skipping: [controller-2] => {"changed": false, "skip_reason": "Conditional result was False"}
2019-12-17 20:12:14,680 p=86790 u=mistral |  TASK [chrony : Upgrade chronyd] ************************************************
2019-12-17 20:12:14,680 p=86790 u=mistral |  Tuesday 17 December 2019  20:12:14 +0000 (0:00:00.548)       0:07:06.475 ******
2019-12-17 20:12:14,858 p=86790 u=mistral |  skipping: [controller-0] => {"changed": false, "skip_reason": "Conditional result was False"}
2019-12-17 20:12:15,031 p=86790 u=mistral |  skipping: [controller-1] => {"changed": false, "skip_reason": "Conditional result was False"}
2019-12-17 20:12:15,048 p=86790 u=mistral |  skipping: [controller-2] => {"changed": false, "skip_reason": "Conditional result was False"}
2019-12-17 20:12:15,216 p=86790 u=mistral |  TASK [chrony : Configure chronyd] **********************************************
2019-12-17 20:12:15,217 p=86790 u=mistral |  Tuesday 17 December 2019  20:12:15 +0000 (0:00:00.536)       0:07:07.011 ******
2019-12-17 20:12:16,103 p=86790 u=mistral |  included: /usr/share/ansible/roles/chrony/tasks/config.yml for controller-0, controller-1, controller-2
2019-12-17 20:12:16,287 p=86790 u=mistral |  TASK [chrony : Install chrony configuration file] ******************************
2019-12-17 20:12:16,287 p=86790 u=mistral |  Tuesday 17 December 2019  20:12:16 +0000 (0:00:01.070)       0:07:08.082 ******
2019-12-17 20:12:17,054 p=86790 u=mistral |  ok: [controller-0] => {"changed": false, "checksum": "58de69866bd94964f6ba27170ebddcb8d87d776c", "dest": "/etc/chrony.conf", "gid": 0, "group": "root", "mode": "0644", "owner": "root", "path": "/etc/chrony.conf", "secontext": "system_u:object_r:etc_t:s0", "size": 256, "state": "file", "uid": 0}
2019-12-17 20:12:17,266 p=86790 u=mistral |  ok: [controller-1] => {"changed": false, "checksum": "58de69866bd94964f6ba27170ebddcb8d87d776c", "dest": "/etc/chrony.conf", "gid": 0, "group": "root", "mode": "0644", "owner": "root", "path": "/etc/chrony.conf", "secontext": "system_u:object_r:etc_t:s0", "size": 256, "state": "file", "uid": 0}
2019-12-17 20:12:17,413 p=86790 u=mistral |  ok: [controller-2] => {"changed": false, "checksum": "58de69866bd94964f6ba27170ebddcb8d87d776c", "dest": "/etc/chrony.conf", "gid": 0, "group": "root", "mode": "0644", "owner": "root", "path": "/etc/chrony.conf", "secontext": "system_u:object_r:etc_t:s0", "size": 256, "state": "file", "uid": 0}
2019-12-17 20:12:17,589 p=86790 u=mistral |  TASK [chrony : Ensure chronyd is running] **************************************
2019-12-17 20:12:17,589 p=86790 u=mistral |  Tuesday 17 December 2019  20:12:17 +0000 (0:00:01.301)       0:07:09.384 ******
2019-12-17 20:12:18,199 p=86790 u=mistral |  fatal: [controller-0]: FAILED! => {"changed": false, "msg": "Unable to start service chronyd: Job for chronyd.service failed because the control process exited with error code.\nSee \"systemctl status chronyd.service\" and \"journalctl -xe\" for details.\n"}
2019-12-17 20:12:18,422 p=86790 u=mistral |  fatal: [controller-1]: FAILED! => {"changed": false, "msg": "Unable to start service chronyd: Job for chronyd.service failed because the control process exited with error code.\nSee \"systemctl status chronyd.service\" and \"journalctl -xe\" for details.\n"}
2019-12-17 20:12:18,561 p=86790 u=mistral |  fatal: [controller-2]: FAILED! => {"changed": false, "msg": "Unable to start service chronyd: Job for chronyd.service failed because the control process exited with error code.\nSee \"systemctl status chronyd.service\" and \"journalctl -xe\" for details.\n"}
2019-12-17 20:12:18,562 p=86790 u=mistral |  NO MORE HOSTS LEFT *************************************************************
2019-12-17 20:12:18,563 p=86790 u=mistral |  PLAY RECAP *********************************************************************
2019-12-17 20:12:18,564 p=86790 u=mistral |  ceph-0                     : ok=71   changed=16   unreachable=0    failed=0    skipped=158  rescued=0    ignored=0
2019-12-17 20:12:18,564 p=86790 u=mistral |  compute-0                  : ok=68   changed=14   unreachable=0    failed=0    skipped=158  rescued=0    ignored=0
2019-12-17 20:12:18,564 p=86790 u=mistral |  compute-1                  : ok=68   changed=14   unreachable=0    failed=0    skipped=158  rescued=0    ignored=0
2019-12-17 20:12:18,564 p=86790 u=mistral |  controller-0               : ok=152  changed=54   unreachable=0    failed=1    skipped=193  rescued=0    ignored=2
2019-12-17 20:12:18,564 p=86790 u=mistral |  controller-1               : ok=152  changed=54   unreachable=0    failed=1    skipped=193  rescued=0    ignored=2
2019-12-17 20:12:18,564 p=86790 u=mistral |  controller-2               : ok=152  changed=54   unreachable=0    failed=1    skipped=193  rescued=0    ignored=2
2019-12-17 20:12:18,565 p=86790 u=mistral |  undercloud                 : ok=3    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
2019-12-17 20:12:18,565 p=86790 u=mistral |  Tuesday 17 December 2019  20:12:18 +0000 (0:00:00.975)       0:07:10.360 ******
2019-12-17 20:12:18,565 p=86790 u=mistral |  ===============================================================================

Comment 1 Emilien Macchi 2019-12-18 13:40:00 UTC
It would help if you can provide the output of "systemctl status chronyd.service".

Comment 2 Martin Magr 2019-12-18 19:49:01 UTC
ACK, I will when I will reproduce this again.

Comment 3 Matthias Runge 2020-01-03 07:55:28 UTC
Maybe related:

I have seen that as well with osp15 and RHEL8.1. It helped to switch selinux to permissive.
In my case, I could not start chronyd at all, since it was not allowed to access /var/run.

Comment 4 Alex Schultz 2020-01-03 16:58:23 UTC
Sounds more like an actual chronyd issue since we don't run it in a container or do anything special.  Will definitely need a reproducer.

Comment 7 Alex Schultz 2020-01-06 22:01:35 UTC
I just ran a deployment and a subsequent deployment with the latest version of the code with no chronyd failures.  If you are able to reproduce this, please provide the chronyd service failure logs.  Additionally the templates used would be important as well.


[heat-admin@overcloud-controller-0 ~]$ cat /etc/redhat-release 
Red Hat Enterprise Linux release 8.1 (Ootpa)
[heat-admin@overcloud-controller-0 ~]$ logout
Connection to 192.168.24.8 closed.
(undercloud) [cloud-user@undercloud ~]$ rpm -qa | grep tripleo-heat-tem
openstack-tripleo-heat-templates-11.3.1-0.20191212200219.5ca908c.el8ost.noarch

Comment 8 Matthias Runge 2020-01-13 11:51:44 UTC
Just today, I ran into this again. /var/log/audit/audit.log contains this:

[stack@kepler ~]$ sudo cat /var/log/audit/audit.log | grep chronyd
type=AVC msg=audit(1578409563.716:21217): avc:  denied  { read } for  pid=4011 comm="chronyd" name="run" dev="dm-0" ino=144 scontext=system_u:system_r:chronyd_t:s0 tcontext=system_u:object_r:container_file_t:s0 tclass=lnk_file permissive=0
type=SYSCALL msg=audit(1578409563.716:21217): arch=c000003e syscall=87 success=no exit=-13 a0=55a6b7a368b0 a1=7 a2=0 a3=ffffffff items=0 ppid=1 pid=4011 auid=4294967295 uid=990 gid=986 euid=990 suid=990 fsuid=990 egid=986 sgid=986 fsgid=986 tty=(none) ses=4294967295 comm="chronyd" exe="/usr/sbin/chronyd" subj=system_u:system_r:chronyd_t:s0 key=(null)ARCH=x86_64 SYSCALL=unlink AUID="unset" UID="chrony" GID="chrony" EUID="chrony" SUID="chrony" FSUID="chrony" EGID="chrony" SGID="chrony" FSGID="chrony"
type=PROCTITLE msg=audit(1578409563.716:21217): proctitle="/usr/sbin/chronyd"
type=AVC msg=audit(1578409563.717:21218): avc:  denied  { read } for  pid=4011 comm="chronyd" name="run" dev="dm-0" ino=144 scontext=system_u:system_r:chronyd_t:s0 tcontext=system_u:object_r:container_file_t:s0 tclass=lnk_file permissive=0
type=SYSCALL msg=audit(1578409563.717:21218): arch=c000003e syscall=87 success=no exit=-13 a0=55a6b7a368e0 a1=1 a2=0 a3=2ac70da66b8f79 items=0 ppid=1 pid=4011 auid=4294967295 uid=990 gid=986 euid=990 suid=990 fsuid=990 egid=986 sgid=986 fsgid=986 tty=(none) ses=4294967295 comm="chronyd" exe="/usr/sbin/chronyd" subj=system_u:system_r:chronyd_t:s0 key=(null)ARCH=x86_64 SYSCALL=unlink AUID="unset" UID="chrony" GID="chrony" EUID="chrony" SUID="chrony" FSUID="chrony" EGID="chrony" SGID="chrony" FSGID="chrony"
type=PROCTITLE msg=audit(1578409563.717:21218): proctitle="/usr/sbin/chronyd"
type=SERVICE_START msg=audit(1578409563.868:21235): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=chronyd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'UID="root" AUID="unset"
type=SERVICE_STOP msg=audit(1578409563.868:21236): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=chronyd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'UID="root" AUID="unset"


rpm -q openstack-tripleo-heat-templates
openstack-tripleo-heat-templates-11.3.2-0.20200109050651.8f93d27.el8ost.noarch

openstack-selinux-0.8.20-0.20191202205815.09846a2.el8ost.noarch
container-selinux-2.123.0-2.module+el8.1.0+4900+9d7326b8.noarch

Comment 10 Cédric Jeanneret 2020-01-13 15:44:58 UTC
Hello there,

SELinux is preventing chronyd_t to access container_file_t - I'm pretty sure this is due to bind-mount with the ":z" flag. Apparently, it seems to be due to some collectd|metric|monitoring container. We can work together on this issue if you want, Martin. Lemme know if you have some available env so that we can inspect the relevant containers|rights|others.

Cheers,

C.

Comment 19 Martin Magr 2020-03-25 15:44:46 UTC
According to our record this is already released and tested/working.


Note You need to log in before you can comment on or make changes to this bug.