Bug 1628705
| Summary: | [OSP13] Rebooting a clustered control node without previously stopping pacemaker takes more than 15 minutes | |||
|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | David Vallee Delisle <dvd> | |
| Component: | puppet-tripleo | Assignee: | Michele Baldessari <michele> | |
| Status: | CLOSED ERRATA | QA Contact: | pkomarov | |
| Severity: | urgent | Docs Contact: | ||
| Priority: | urgent | |||
| Version: | 13.0 (Queens) | CC: | chjones, dpeacock, dvd, emacchi, jjoyce, jschluet, lmarsh, mburns, michele, nwahl, sbradley, shdunne, slinaber, tvignaud | |
| Target Milestone: | z3 | Keywords: | Triaged, ZStream | |
| Target Release: | 13.0 (Queens) | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | puppet-tripleo-8.3.4-14.el7ost python-paunch-2.5.0-3.el7ost | Doc Type: | Bug Fix | |
| Doc Text: |
A faulty interaction between rhel-plugin-push.service and the Docker service occurred during system shutdown, which caused the controller reboot to take a long time. WIth this release, the correct shutdown ordering is enforced for these two services. Rebooting a controller takes less time now.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1634810 (view as bug list) | Environment: | ||
| Last Closed: | 2018-11-13 22:28:50 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1634810 | |||
|
Description
David Vallee Delisle
2018-09-13 18:36:07 UTC
An RFE bug 1628701 was opened with pcs/pacemaker to allow the configuration of operation timeouts on bundled resource. This current BZ is opened to analyze whether or not we could implement a default operation timeout during deployment, or any other workaround. Verified, (undercloud) [stack@undercloud-0 ~]$ ansible overcloud -b -mshell -a"rpm -qa|grep 'puppet-tripleo\|python-paunch'" compute-0 | SUCCESS | rc=0 >> puppet-tripleo-8.3.6-2.el7ost.noarch python-paunch-2.5.0-3.el7ost.noarch compute-1 | SUCCESS | rc=0 >> puppet-tripleo-8.3.6-2.el7ost.noarch python-paunch-2.5.0-3.el7ost.noarch controller-2 | SUCCESS | rc=0 >> puppet-tripleo-8.3.6-2.el7ost.noarch python-paunch-2.5.0-3.el7ost.noarch controller-1 | SUCCESS | rc=0 >> puppet-tripleo-8.3.6-2.el7ost.noarch python-paunch-2.5.0-3.el7ost.noarch controller-0 | SUCCESS | rc=0 >> puppet-tripleo-8.3.6-2.el7ost.noarch python-paunch-2.5.0-3.el7ost.noarch (undercloud) [stack@undercloud-0 ~]$ rhos-release -L Installed repositories (rhel-7.6): 13 ceph-3 ceph-osd-3 rhel-7.6 (undercloud) [stack@undercloud-0 ~]$ cat core_puddle_version 2018-11-05.3 #fix check : [stack@undercloud-0 ~]$ ansible controller -mshell -b -a'ls -l /etc/systemd/system/resource-agents-deps.target.wants' [WARNING]: Found both group and host with same name: undercloud controller-2 | SUCCESS | rc=0 >> total 0 lrwxrwxrwx. 1 root root 38 Nov 6 21:10 docker.service -> /usr/lib/systemd/system/docker.service lrwxrwxrwx. 1 root root 48 Nov 6 21:10 rhel-push-plugin.service -> /usr/lib/systemd/system/rhel-push-plugin.service controller-0 | SUCCESS | rc=0 >> total 0 lrwxrwxrwx. 1 root root 38 Nov 6 21:10 docker.service -> /usr/lib/systemd/system/docker.service lrwxrwxrwx. 1 root root 48 Nov 6 21:10 rhel-push-plugin.service -> /usr/lib/systemd/system/rhel-push-plugin.service controller-1 | SUCCESS | rc=0 >> total 0 lrwxrwxrwx. 1 root root 38 Nov 6 21:10 docker.service -> /usr/lib/systemd/system/docker.service lrwxrwxrwx. 1 root root 48 Nov 6 21:10 rhel-push-plugin.service -> /usr/lib/systemd/system/rhel-push-plugin.service #disable stonith as in reproducer [root@controller-1 ~]# pcs property set stonith-enabled=false [root@controller-1 ~]# pcs config|grep stonith-enabled stonith-enabled: false #check controller reboot times: [root@controller-1 ~]# reboot Connection to 192.168.24.9 closed by remote host. Connection to 192.168.24.9 closed. #reboot time is no more than 6 min: [stack@undercloud-0 ~]$ date;until `nc -zv 192.168.24.9 22`;do date;sleep 1m;done Wed Nov 7 03:18:52 EST 2018 Ncat: Version 7.50 ( https://nmap.org/ncat ) Ncat: Connection refused. Wed Nov 7 03:18:53 EST 2018 .... Wed Nov 7 03:24:03 EST 2018 Ncat: Version 7.50 ( https://nmap.org/ncat ) Ncat: Connected to 192.168.24.9:22. Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3587 |