Bug 1557513

Summary: [OSP10 UPDATES] RMQ fails to start after minor update and reboot
Product: Red Hat OpenStack Reporter: John Eckersberg <jeckersb>
Component: puppet-rabbitmqAssignee: John Eckersberg <jeckersb>
Status: CLOSED ERRATA QA Contact: pkomarov
Severity: urgent Docs Contact:
Priority: urgent    
Version: 10.0 (Newton)CC: apevec, augol, chjones, jeckersb, jjoyce, jschluet, lhh, mbultel, michele, nlevinki, pkomarov, slinaber, srevivo, tvignaud, yprokule
Target Milestone: zstreamKeywords: Reopened, Triaged, ZStream
Target Release: 10.0 (Newton)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: puppet-rabbitmq-5.6.0-4.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1533406
: 1557519 (view as bug list) Environment:
Last Closed: 2018-11-26 18:00:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1533406, 1536064    
Bug Blocks: 1557519, 1557522, 1647474, 1647587, 1647593, 1654041, 1654042    

Comment 1 Chris Jones 2018-06-08 12:36:51 UTC
OSP11 is now EOL. This was fixed in OSP12.

Comment 2 John Eckersberg 2018-11-06 19:25:07 UTC
Re-opening this, this should be against OSP10 and is still a problem.

Downstream review - https://code.engineering.redhat.com/gerrit/#/c/132928/

Comment 3 John Eckersberg 2018-11-14 15:16:25 UTC
Fixed in puppet-rabbitmq-5.6.0-3.el7ost - https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=800198

Comment 9 pkomarov 2018-11-18 16:10:10 UTC
Verified, 

[stack@undercloud-0 ~]$ openstack stack list
+--------------------------------------+------------+-----------------+----------------------+----------------------+
| ID                                   | Stack Name | Stack Status    | Creation Time        | Updated Time         |
+--------------------------------------+------------+-----------------+----------------------+----------------------+
| 47aca0f1-a2cd-4592-86e6-3d6948dde115 | overcloud  | UPDATE_COMPLETE | 2018-11-18T10:29:09Z | 2018-11-18T13:29:34Z |
+--------------------------------------+------------+-----------------+----------------------+----------------------+
[stack@undercloud-0 ~]$ cat core_puddle_version 
2018-11-15.2[stack@undercloud-0 ~]$ rhos-release -L
Installed repositories (rhel-7.6):
  10
  ceph-2
  ceph-osd-2
  rhel-7.6

#test reproducer: 
ansible controller -mshell -b -a'pcs cluster stop --request-timeout=300'
ansible controller -mreboot
ansible controller -mshell -b -a'pcs cluster start --wait=300'

[stack@undercloud-0 ~]$ ansible controller -b -mshell -a'hostname -s;uptime;pcs status'
 [WARNING]: Found both group and host with same name: undercloud

controller-1 | SUCCESS | rc=0 >>
controller-1
 16:05:09 up 3 min,  1 user,  load average: 5.26, 3.80, 1.63
Cluster name: tripleo_cluster
Stack: corosync
Current DC: controller-2 (version 1.1.19-8.el7-c3c624ea3d) - partition with quorum
Last updated: Sun Nov 18 16:05:09 2018
Last change: Sun Nov 18 14:18:37 2018 by root via cibadmin on controller-0

3 nodes configured
19 resources configured

Online: [ controller-0 controller-1 controller-2 ]

Full list of resources:

 ip-192.168.24.12	(ocf::heartbeat:IPaddr2):	Started controller-0
 ip-172.17.1.15	(ocf::heartbeat:IPaddr2):	Started controller-1
 Clone Set: haproxy-clone [haproxy]
     Started: [ controller-0 controller-1 controller-2 ]
 Master/Slave Set: galera-master [galera]
     Masters: [ controller-0 controller-1 controller-2 ]
 ip-10.0.0.106	(ocf::heartbeat:IPaddr2):	Started controller-0
 ip-172.17.3.15	(ocf::heartbeat:IPaddr2):	Started controller-2
 Clone Set: rabbitmq-clone [rabbitmq]
     Started: [ controller-0 controller-1 controller-2 ]
 Master/Slave Set: redis-master [redis]
     Masters: [ controller-2 ]
     Slaves: [ controller-0 controller-1 ]
 ip-172.17.4.17	(ocf::heartbeat:IPaddr2):	Started controller-1
 ip-172.17.1.17	(ocf::heartbeat:IPaddr2):	Started controller-2
 openstack-cinder-volume	(systemd:openstack-cinder-volume):	Started controller-0

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

controller-0 | SUCCESS | rc=0 >>
controller-0
 16:05:09 up 3 min,  1 user,  load average: 5.16, 3.56, 1.53
Cluster name: tripleo_cluster
Stack: corosync
Current DC: controller-2 (version 1.1.19-8.el7-c3c624ea3d) - partition with quorum
Last updated: Sun Nov 18 16:05:09 2018
Last change: Sun Nov 18 14:18:37 2018 by root via cibadmin on controller-0

3 nodes configured
19 resources configured

Online: [ controller-0 controller-1 controller-2 ]

Full list of resources:

 ip-192.168.24.12	(ocf::heartbeat:IPaddr2):	Started controller-0
 ip-172.17.1.15	(ocf::heartbeat:IPaddr2):	Started controller-1
 Clone Set: haproxy-clone [haproxy]
     Started: [ controller-0 controller-1 controller-2 ]
 Master/Slave Set: galera-master [galera]
     Masters: [ controller-0 controller-1 controller-2 ]
 ip-10.0.0.106	(ocf::heartbeat:IPaddr2):	Started controller-0
 ip-172.17.3.15	(ocf::heartbeat:IPaddr2):	Started controller-2
 Clone Set: rabbitmq-clone [rabbitmq]
     Started: [ controller-0 controller-1 controller-2 ]
 Master/Slave Set: redis-master [redis]
     Masters: [ controller-2 ]
     Slaves: [ controller-0 controller-1 ]
 ip-172.17.4.17	(ocf::heartbeat:IPaddr2):	Started controller-1
 ip-172.17.1.17	(ocf::heartbeat:IPaddr2):	Started controller-2
 openstack-cinder-volume	(systemd:openstack-cinder-volume):	Started controller-0

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

controller-2 | SUCCESS | rc=0 >>
controller-2
 16:05:09 up 4 min,  1 user,  load average: 3.28, 2.69, 1.20
Cluster name: tripleo_cluster
Stack: corosync
Current DC: controller-2 (version 1.1.19-8.el7-c3c624ea3d) - partition with quorum
Last updated: Sun Nov 18 16:05:10 2018
Last change: Sun Nov 18 14:18:37 2018 by root via cibadmin on controller-0

3 nodes configured
19 resources configured

Online: [ controller-0 controller-1 controller-2 ]

Full list of resources:

 ip-192.168.24.12	(ocf::heartbeat:IPaddr2):	Started controller-0
 ip-172.17.1.15	(ocf::heartbeat:IPaddr2):	Started controller-1
 Clone Set: haproxy-clone [haproxy]
     Started: [ controller-0 controller-1 controller-2 ]
 Master/Slave Set: galera-master [galera]
     Masters: [ controller-0 controller-1 controller-2 ]
 ip-10.0.0.106	(ocf::heartbeat:IPaddr2):	Started controller-0
 ip-172.17.3.15	(ocf::heartbeat:IPaddr2):	Started controller-2
 Clone Set: rabbitmq-clone [rabbitmq]
     Started: [ controller-0 controller-1 controller-2 ]
 Master/Slave Set: redis-master [redis]
     Masters: [ controller-2 ]
     Slaves: [ controller-0 controller-1 ]
 ip-172.17.4.17	(ocf::heartbeat:IPaddr2):	Started controller-1
 ip-172.17.1.17	(ocf::heartbeat:IPaddr2):	Started controller-2
 openstack-cinder-volume	(systemd:openstack-cinder-volume):	Started controller-0

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled


[stack@undercloud-0 ~]$ ansible controller -b -mshell -a'ss -anp | grep 4369'
 [WARNING]: Found both group and host with same name: undercloud

controller-0 | SUCCESS | rc=0 >>
tcp    LISTEN     0      128       *:4369                  *:*                   users:(("epmd",pid=7753,fd=3))
tcp    TIME-WAIT  0      0      172.17.1.14:4369               172.17.1.14:55935              
tcp    TIME-WAIT  0      0      127.0.0.1:50839              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      127.0.0.1:60795              127.0.0.1:4369               
tcp    ESTAB      0      0      127.0.0.1:4369               127.0.0.1:41355               users:(("epmd",pid=7753,fd=5))
tcp    TIME-WAIT  0      0      172.17.1.14:4369               172.17.1.14:52420              
tcp    TIME-WAIT  0      0      127.0.0.1:49803              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      172.17.1.14:4369               172.17.1.14:33027              
tcp    TIME-WAIT  0      0      127.0.0.1:33871              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      127.0.0.1:39955              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      172.17.1.14:4369               172.17.1.14:49085              
tcp    TIME-WAIT  0      0      172.17.1.14:4369               172.17.1.14:45503              
tcp    TIME-WAIT  0      0      172.17.1.14:4369               172.17.1.14:33533              
tcp    TIME-WAIT  0      0      127.0.0.1:34434              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      127.0.0.1:51121              127.0.0.1:4369               
tcp    ESTAB      0      0      127.0.0.1:41355              127.0.0.1:4369                users:(("beam.smp",pid=20843,fd=44))
tcp    TIME-WAIT  0      0      172.17.1.14:4369               172.17.1.14:52936              
tcp    TIME-WAIT  0      0      127.0.0.1:46813              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      172.17.1.14:4369               172.17.1.14:52773              
tcp    TIME-WAIT  0      0      127.0.0.1:58945              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      172.17.1.14:4369               172.17.1.14:46137              
tcp    TIME-WAIT  0      0      172.17.1.14:4369               172.17.1.14:44826              
tcp    TIME-WAIT  0      0      172.17.1.14:4369               172.17.1.14:60743              
tcp    TIME-WAIT  0      0      127.0.0.1:45234              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      127.0.0.1:50195              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      127.0.0.1:38399              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      172.17.1.14:4369               172.17.1.14:59794              
tcp    TIME-WAIT  0      0      172.17.1.14:4369               172.17.1.14:35890              
tcp    LISTEN     0      128      :::4369                 :::*                   users:(("epmd",pid=7753,fd=4))

controller-2 | SUCCESS | rc=0 >>
tcp    LISTEN     0      128       *:4369                  *:*                   users:(("epmd",pid=10762,fd=3))
tcp    TIME-WAIT  0      0      172.17.1.12:4369               172.17.1.12:33546              
tcp    TIME-WAIT  0      0      127.0.0.1:58205              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      127.0.0.1:58565              127.0.0.1:4369               
tcp    ESTAB      0      0      127.0.0.1:34438              127.0.0.1:4369                users:(("beam.smp",pid=19277,fd=44))
tcp    TIME-WAIT  0      0      127.0.0.1:59929              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      127.0.0.1:52958              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      127.0.0.1:35858              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      127.0.0.1:59041              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      172.17.1.12:4369               172.17.1.12:55288              
tcp    TIME-WAIT  0      0      127.0.0.1:39314              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      172.17.1.12:4369               172.17.1.12:37688              
tcp    TIME-WAIT  0      0      172.17.1.12:4369               172.17.1.12:60427              
tcp    TIME-WAIT  0      0      127.0.0.1:42312              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      127.0.0.1:35171              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      172.17.1.12:4369               172.17.1.12:43145              
tcp    TIME-WAIT  0      0      172.17.1.12:4369               172.17.1.12:44075              
tcp    TIME-WAIT  0      0      172.17.1.12:4369               172.17.1.12:33722              
tcp    TIME-WAIT  0      0      172.17.1.12:4369               172.17.1.12:59552              
tcp    ESTAB      0      0      127.0.0.1:4369               127.0.0.1:34438               users:(("epmd",pid=10762,fd=5))
tcp    TIME-WAIT  0      0      127.0.0.1:35897              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      172.17.1.12:4369               172.17.1.12:60203              
tcp    TIME-WAIT  0      0      172.17.1.12:4369               172.17.1.12:52215              
tcp    TIME-WAIT  0      0      127.0.0.1:50348              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      172.17.1.12:4369               172.17.1.12:33201              
tcp    TIME-WAIT  0      0      172.17.1.12:4369               172.17.1.12:32860              
tcp    TIME-WAIT  0      0      127.0.0.1:38523              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      172.17.1.12:4369               172.17.1.12:40896              
tcp    LISTEN     0      128      :::4369                 :::*                   users:(("epmd",pid=10762,fd=4))

controller-1 | SUCCESS | rc=0 >>
tcp    LISTEN     0      128       *:4369                  *:*                   users:(("epmd",pid=9725,fd=3))
tcp    TIME-WAIT  0      0      172.17.1.18:4369               172.17.1.18:48883              
tcp    TIME-WAIT  0      0      172.17.1.18:4369               172.17.1.18:45415              
tcp    ESTAB      0      0      127.0.0.1:4369               127.0.0.1:41776               users:(("epmd",pid=9725,fd=5))
tcp    TIME-WAIT  0      0      127.0.0.1:39204              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      127.0.0.1:39352              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      172.17.1.18:4369               172.17.1.18:51954              
tcp    TIME-WAIT  0      0      127.0.0.1:43407              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      127.0.0.1:47398              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      127.0.0.1:48606              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      127.0.0.1:56653              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      172.17.1.18:4369               172.17.1.18:49594              
tcp    TIME-WAIT  0      0      127.0.0.1:53935              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      127.0.0.1:58469              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      172.17.1.18:4369               172.17.1.18:41873              
tcp    TIME-WAIT  0      0      172.17.1.18:4369               172.17.1.18:51399              
tcp    TIME-WAIT  0      0      172.17.1.18:4369               172.17.1.18:54985              
tcp    TIME-WAIT  0      0      127.0.0.1:41007              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      172.17.1.18:4369               172.17.1.18:60205              
tcp    TIME-WAIT  0      0      172.17.1.18:4369               172.17.1.18:49194              
tcp    TIME-WAIT  0      0      127.0.0.1:54824              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      127.0.0.1:40072              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      172.17.1.18:4369               172.17.1.18:50862              
tcp    TIME-WAIT  0      0      172.17.1.18:4369               172.17.1.18:39001              
tcp    TIME-WAIT  0      0      172.17.1.18:4369               172.17.1.18:56765              
tcp    ESTAB      0      0      127.0.0.1:41776              127.0.0.1:4369                users:(("beam.smp",pid=18278,fd=44))
tcp    TIME-WAIT  0      0      127.0.0.1:57691              127.0.0.1:4369               
tcp    TIME-WAIT  0      0      172.17.1.18:4369               172.17.1.18:45378              
tcp    LISTEN     0      128      :::4369                 :::*                   users:(("epmd",pid=9725,fd=4))

Comment 10 pkomarov 2018-11-19 17:11:12 UTC
more checks are needed

Comment 13 errata-xmlrpc 2018-11-26 18:00:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3674