Bug 1634810 - [OSP14] Rebooting a clustered control node without previously stopping pacemaker takes more than 15 minutes
Summary: [OSP14] Rebooting a clustered control node without previously stopping pacema...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: puppet-tripleo
Version: 14.0 (Rocky)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: beta
: 14.0 (Rocky)
Assignee: Emilien Macchi
QA Contact: pkomarov
URL:
Whiteboard:
Depends On: 1628705
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-10-01 16:57 UTC by Michele Baldessari
Modified: 2021-12-10 17:56 UTC (History)
13 users (show)

Fixed In Version: puppet-tripleo-9.3.1-0.20181001112251.a6eaab1.el7ost python-paunch-3.2.0-0.20180921003258.6d2ec11.el7ost
Doc Type: No Doc Update
Doc Text:
Cause: A faulty interaction between rhel-plugin-push.service and the docker service during system shutdown. Consequence: A long time is needed to reboot a controller Fix: Correct shutdown ordering is enforced for these two services. Result: Rebooting a controller takes a more reasonable amount of times (couple of minutes).
Clone Of: 1628705
Environment:
Last Closed: 2019-01-11 11:53:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1792701 0 None None None 2018-10-01 16:57:40 UTC
OpenStack gerrit 606848 0 None None None 2018-10-15 08:55:22 UTC
RDO 16341 0 None None None 2018-10-15 08:54:35 UTC
Red Hat Issue Tracker OSP-11680 0 None None None 2021-12-10 17:56:31 UTC
Red Hat Knowledge Base (Solution) 3612971 0 None None None 2018-10-01 16:57:40 UTC
Red Hat Product Errata RHEA-2019:0045 0 None None None 2019-01-11 11:53:35 UTC

Comment 4 pkomarov 2018-10-18 07:34:39 UTC
Verified , 

[stack@undercloud-0 ~]$ cat core_puddle_version 
2018-10-10.3

[stack@undercloud-0 ~]$ ansible controller -mshell -b -a'ls -l /etc/systemd/system/resource-agents-deps.target.wants'
 [WARNING]: Found both group and host with same name: undercloud

#check fix: 
controller-1 | SUCCESS | rc=0 >>
total 0
lrwxrwxrwx. 1 root root 38 Oct 17 13:02 docker.service -> /usr/lib/systemd/system/docker.service
lrwxrwxrwx. 1 root root 48 Oct 17 13:02 rhel-push-plugin.service -> /usr/lib/systemd/system/rhel-push-plugin.service

controller-0 | SUCCESS | rc=0 >>
total 0
lrwxrwxrwx. 1 root root 38 Oct 17 13:02 docker.service -> /usr/lib/systemd/system/docker.service
lrwxrwxrwx. 1 root root 48 Oct 17 13:02 rhel-push-plugin.service -> /usr/lib/systemd/system/rhel-push-plugin.service

controller-2 | SUCCESS | rc=0 >>
total 0
lrwxrwxrwx. 1 root root 38 Oct 17 13:02 docker.service -> /usr/lib/systemd/system/docker.service
lrwxrwxrwx. 1 root root 48 Oct 17 13:02 rhel-push-plugin.service -> /usr/lib/systemd/system/rhel-push-plugin.service


#disable the fencing as in the reproducer: 

[root@controller-0 ~]# pcs property set stonith-enabled=false
[root@controller-0 ~]# pcs config|grep stonith-enabled
 stonith-enabled: false



[root@controller-0 ~]# date
Thu Oct 18 07:19:49 UTC 2018

[root@controller-0 ~]# reboot

#A simple ssh test proves that after no so mush as 3 min the controller is back online:

(undercloud) [stack@undercloud-0 ~]$ while true ; do nc -zv 192.168.24.15 22;sleep 10s;done
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connection refused.
...
Ncat: Connected to 192.168.24.15:22.
Ncat: 0 bytes sent, 0 bytes received in 0.02 seconds.
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 192.168.24.15:22.

Comment 8 errata-xmlrpc 2019-01-11 11:53:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:0045


Note You need to log in before you can comment on or make changes to this bug.