Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1634810

Summary: [OSP14] Rebooting a clustered control node without previously stopping pacemaker takes more than 15 minutes
Product: Red Hat OpenStack Reporter: Michele Baldessari <michele>
Component: puppet-tripleoAssignee: Emilien Macchi <emacchi>
Status: CLOSED ERRATA QA Contact: pkomarov
Severity: urgent Docs Contact:
Priority: urgent    
Version: 14.0 (Rocky)CC: agurenko, chjones, dvd, emacchi, jjoyce, jschluet, mburns, michele, nwahl, pkomarov, sbradley, slinaber, tvignaud
Target Milestone: betaKeywords: Triaged, ZStream
Target Release: 14.0 (Rocky)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: puppet-tripleo-9.3.1-0.20181001112251.a6eaab1.el7ost python-paunch-3.2.0-0.20180921003258.6d2ec11.el7ost Doc Type: No Doc Update
Doc Text:
Cause: A faulty interaction between rhel-plugin-push.service and the docker service during system shutdown. Consequence: A long time is needed to reboot a controller Fix: Correct shutdown ordering is enforced for these two services. Result: Rebooting a controller takes a more reasonable amount of times (couple of minutes).
Story Points: ---
Clone Of: 1628705 Environment:
Last Closed: 2019-01-11 11:53:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1628705    
Bug Blocks:    

Comment 4 pkomarov 2018-10-18 07:34:39 UTC
Verified , 

[stack@undercloud-0 ~]$ cat core_puddle_version 
2018-10-10.3

[stack@undercloud-0 ~]$ ansible controller -mshell -b -a'ls -l /etc/systemd/system/resource-agents-deps.target.wants'
 [WARNING]: Found both group and host with same name: undercloud

#check fix: 
controller-1 | SUCCESS | rc=0 >>
total 0
lrwxrwxrwx. 1 root root 38 Oct 17 13:02 docker.service -> /usr/lib/systemd/system/docker.service
lrwxrwxrwx. 1 root root 48 Oct 17 13:02 rhel-push-plugin.service -> /usr/lib/systemd/system/rhel-push-plugin.service

controller-0 | SUCCESS | rc=0 >>
total 0
lrwxrwxrwx. 1 root root 38 Oct 17 13:02 docker.service -> /usr/lib/systemd/system/docker.service
lrwxrwxrwx. 1 root root 48 Oct 17 13:02 rhel-push-plugin.service -> /usr/lib/systemd/system/rhel-push-plugin.service

controller-2 | SUCCESS | rc=0 >>
total 0
lrwxrwxrwx. 1 root root 38 Oct 17 13:02 docker.service -> /usr/lib/systemd/system/docker.service
lrwxrwxrwx. 1 root root 48 Oct 17 13:02 rhel-push-plugin.service -> /usr/lib/systemd/system/rhel-push-plugin.service


#disable the fencing as in the reproducer: 

[root@controller-0 ~]# pcs property set stonith-enabled=false
[root@controller-0 ~]# pcs config|grep stonith-enabled
 stonith-enabled: false



[root@controller-0 ~]# date
Thu Oct 18 07:19:49 UTC 2018

[root@controller-0 ~]# reboot

#A simple ssh test proves that after no so mush as 3 min the controller is back online:

(undercloud) [stack@undercloud-0 ~]$ while true ; do nc -zv 192.168.24.15 22;sleep 10s;done
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connection refused.
...
Ncat: Connected to 192.168.24.15:22.
Ncat: 0 bytes sent, 0 bytes received in 0.02 seconds.
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 192.168.24.15:22.

Comment 8 errata-xmlrpc 2019-01-11 11:53:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:0045