Bug 1557513
| Summary: | [OSP10 UPDATES] RMQ fails to start after minor update and reboot | |||
|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | John Eckersberg <jeckersb> | |
| Component: | puppet-rabbitmq | Assignee: | John Eckersberg <jeckersb> | |
| Status: | CLOSED ERRATA | QA Contact: | pkomarov | |
| Severity: | urgent | Docs Contact: | ||
| Priority: | urgent | |||
| Version: | 10.0 (Newton) | CC: | apevec, augol, chjones, jeckersb, jjoyce, jschluet, lhh, mbultel, michele, nlevinki, pkomarov, slinaber, srevivo, tvignaud, yprokule | |
| Target Milestone: | zstream | Keywords: | Reopened, Triaged, ZStream | |
| Target Release: | 10.0 (Newton) | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | puppet-rabbitmq-5.6.0-4.el7ost | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | 1533406 | |||
| : | 1557519 (view as bug list) | Environment: | ||
| Last Closed: | 2018-11-26 18:00:40 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1533406, 1536064 | |||
| Bug Blocks: | 1557519, 1557522, 1647474, 1647587, 1647593, 1654041, 1654042 | |||
|
Comment 1
Chris Jones
2018-06-08 12:36:51 UTC
Re-opening this, this should be against OSP10 and is still a problem. Downstream review - https://code.engineering.redhat.com/gerrit/#/c/132928/ Fixed in puppet-rabbitmq-5.6.0-3.el7ost - https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=800198 Verified,
[stack@undercloud-0 ~]$ openstack stack list
+--------------------------------------+------------+-----------------+----------------------+----------------------+
| ID | Stack Name | Stack Status | Creation Time | Updated Time |
+--------------------------------------+------------+-----------------+----------------------+----------------------+
| 47aca0f1-a2cd-4592-86e6-3d6948dde115 | overcloud | UPDATE_COMPLETE | 2018-11-18T10:29:09Z | 2018-11-18T13:29:34Z |
+--------------------------------------+------------+-----------------+----------------------+----------------------+
[stack@undercloud-0 ~]$ cat core_puddle_version
2018-11-15.2[stack@undercloud-0 ~]$ rhos-release -L
Installed repositories (rhel-7.6):
10
ceph-2
ceph-osd-2
rhel-7.6
#test reproducer:
ansible controller -mshell -b -a'pcs cluster stop --request-timeout=300'
ansible controller -mreboot
ansible controller -mshell -b -a'pcs cluster start --wait=300'
[stack@undercloud-0 ~]$ ansible controller -b -mshell -a'hostname -s;uptime;pcs status'
[WARNING]: Found both group and host with same name: undercloud
controller-1 | SUCCESS | rc=0 >>
controller-1
16:05:09 up 3 min, 1 user, load average: 5.26, 3.80, 1.63
Cluster name: tripleo_cluster
Stack: corosync
Current DC: controller-2 (version 1.1.19-8.el7-c3c624ea3d) - partition with quorum
Last updated: Sun Nov 18 16:05:09 2018
Last change: Sun Nov 18 14:18:37 2018 by root via cibadmin on controller-0
3 nodes configured
19 resources configured
Online: [ controller-0 controller-1 controller-2 ]
Full list of resources:
ip-192.168.24.12 (ocf::heartbeat:IPaddr2): Started controller-0
ip-172.17.1.15 (ocf::heartbeat:IPaddr2): Started controller-1
Clone Set: haproxy-clone [haproxy]
Started: [ controller-0 controller-1 controller-2 ]
Master/Slave Set: galera-master [galera]
Masters: [ controller-0 controller-1 controller-2 ]
ip-10.0.0.106 (ocf::heartbeat:IPaddr2): Started controller-0
ip-172.17.3.15 (ocf::heartbeat:IPaddr2): Started controller-2
Clone Set: rabbitmq-clone [rabbitmq]
Started: [ controller-0 controller-1 controller-2 ]
Master/Slave Set: redis-master [redis]
Masters: [ controller-2 ]
Slaves: [ controller-0 controller-1 ]
ip-172.17.4.17 (ocf::heartbeat:IPaddr2): Started controller-1
ip-172.17.1.17 (ocf::heartbeat:IPaddr2): Started controller-2
openstack-cinder-volume (systemd:openstack-cinder-volume): Started controller-0
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
controller-0 | SUCCESS | rc=0 >>
controller-0
16:05:09 up 3 min, 1 user, load average: 5.16, 3.56, 1.53
Cluster name: tripleo_cluster
Stack: corosync
Current DC: controller-2 (version 1.1.19-8.el7-c3c624ea3d) - partition with quorum
Last updated: Sun Nov 18 16:05:09 2018
Last change: Sun Nov 18 14:18:37 2018 by root via cibadmin on controller-0
3 nodes configured
19 resources configured
Online: [ controller-0 controller-1 controller-2 ]
Full list of resources:
ip-192.168.24.12 (ocf::heartbeat:IPaddr2): Started controller-0
ip-172.17.1.15 (ocf::heartbeat:IPaddr2): Started controller-1
Clone Set: haproxy-clone [haproxy]
Started: [ controller-0 controller-1 controller-2 ]
Master/Slave Set: galera-master [galera]
Masters: [ controller-0 controller-1 controller-2 ]
ip-10.0.0.106 (ocf::heartbeat:IPaddr2): Started controller-0
ip-172.17.3.15 (ocf::heartbeat:IPaddr2): Started controller-2
Clone Set: rabbitmq-clone [rabbitmq]
Started: [ controller-0 controller-1 controller-2 ]
Master/Slave Set: redis-master [redis]
Masters: [ controller-2 ]
Slaves: [ controller-0 controller-1 ]
ip-172.17.4.17 (ocf::heartbeat:IPaddr2): Started controller-1
ip-172.17.1.17 (ocf::heartbeat:IPaddr2): Started controller-2
openstack-cinder-volume (systemd:openstack-cinder-volume): Started controller-0
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
controller-2 | SUCCESS | rc=0 >>
controller-2
16:05:09 up 4 min, 1 user, load average: 3.28, 2.69, 1.20
Cluster name: tripleo_cluster
Stack: corosync
Current DC: controller-2 (version 1.1.19-8.el7-c3c624ea3d) - partition with quorum
Last updated: Sun Nov 18 16:05:10 2018
Last change: Sun Nov 18 14:18:37 2018 by root via cibadmin on controller-0
3 nodes configured
19 resources configured
Online: [ controller-0 controller-1 controller-2 ]
Full list of resources:
ip-192.168.24.12 (ocf::heartbeat:IPaddr2): Started controller-0
ip-172.17.1.15 (ocf::heartbeat:IPaddr2): Started controller-1
Clone Set: haproxy-clone [haproxy]
Started: [ controller-0 controller-1 controller-2 ]
Master/Slave Set: galera-master [galera]
Masters: [ controller-0 controller-1 controller-2 ]
ip-10.0.0.106 (ocf::heartbeat:IPaddr2): Started controller-0
ip-172.17.3.15 (ocf::heartbeat:IPaddr2): Started controller-2
Clone Set: rabbitmq-clone [rabbitmq]
Started: [ controller-0 controller-1 controller-2 ]
Master/Slave Set: redis-master [redis]
Masters: [ controller-2 ]
Slaves: [ controller-0 controller-1 ]
ip-172.17.4.17 (ocf::heartbeat:IPaddr2): Started controller-1
ip-172.17.1.17 (ocf::heartbeat:IPaddr2): Started controller-2
openstack-cinder-volume (systemd:openstack-cinder-volume): Started controller-0
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
[stack@undercloud-0 ~]$ ansible controller -b -mshell -a'ss -anp | grep 4369'
[WARNING]: Found both group and host with same name: undercloud
controller-0 | SUCCESS | rc=0 >>
tcp LISTEN 0 128 *:4369 *:* users:(("epmd",pid=7753,fd=3))
tcp TIME-WAIT 0 0 172.17.1.14:4369 172.17.1.14:55935
tcp TIME-WAIT 0 0 127.0.0.1:50839 127.0.0.1:4369
tcp TIME-WAIT 0 0 127.0.0.1:60795 127.0.0.1:4369
tcp ESTAB 0 0 127.0.0.1:4369 127.0.0.1:41355 users:(("epmd",pid=7753,fd=5))
tcp TIME-WAIT 0 0 172.17.1.14:4369 172.17.1.14:52420
tcp TIME-WAIT 0 0 127.0.0.1:49803 127.0.0.1:4369
tcp TIME-WAIT 0 0 172.17.1.14:4369 172.17.1.14:33027
tcp TIME-WAIT 0 0 127.0.0.1:33871 127.0.0.1:4369
tcp TIME-WAIT 0 0 127.0.0.1:39955 127.0.0.1:4369
tcp TIME-WAIT 0 0 172.17.1.14:4369 172.17.1.14:49085
tcp TIME-WAIT 0 0 172.17.1.14:4369 172.17.1.14:45503
tcp TIME-WAIT 0 0 172.17.1.14:4369 172.17.1.14:33533
tcp TIME-WAIT 0 0 127.0.0.1:34434 127.0.0.1:4369
tcp TIME-WAIT 0 0 127.0.0.1:51121 127.0.0.1:4369
tcp ESTAB 0 0 127.0.0.1:41355 127.0.0.1:4369 users:(("beam.smp",pid=20843,fd=44))
tcp TIME-WAIT 0 0 172.17.1.14:4369 172.17.1.14:52936
tcp TIME-WAIT 0 0 127.0.0.1:46813 127.0.0.1:4369
tcp TIME-WAIT 0 0 172.17.1.14:4369 172.17.1.14:52773
tcp TIME-WAIT 0 0 127.0.0.1:58945 127.0.0.1:4369
tcp TIME-WAIT 0 0 172.17.1.14:4369 172.17.1.14:46137
tcp TIME-WAIT 0 0 172.17.1.14:4369 172.17.1.14:44826
tcp TIME-WAIT 0 0 172.17.1.14:4369 172.17.1.14:60743
tcp TIME-WAIT 0 0 127.0.0.1:45234 127.0.0.1:4369
tcp TIME-WAIT 0 0 127.0.0.1:50195 127.0.0.1:4369
tcp TIME-WAIT 0 0 127.0.0.1:38399 127.0.0.1:4369
tcp TIME-WAIT 0 0 172.17.1.14:4369 172.17.1.14:59794
tcp TIME-WAIT 0 0 172.17.1.14:4369 172.17.1.14:35890
tcp LISTEN 0 128 :::4369 :::* users:(("epmd",pid=7753,fd=4))
controller-2 | SUCCESS | rc=0 >>
tcp LISTEN 0 128 *:4369 *:* users:(("epmd",pid=10762,fd=3))
tcp TIME-WAIT 0 0 172.17.1.12:4369 172.17.1.12:33546
tcp TIME-WAIT 0 0 127.0.0.1:58205 127.0.0.1:4369
tcp TIME-WAIT 0 0 127.0.0.1:58565 127.0.0.1:4369
tcp ESTAB 0 0 127.0.0.1:34438 127.0.0.1:4369 users:(("beam.smp",pid=19277,fd=44))
tcp TIME-WAIT 0 0 127.0.0.1:59929 127.0.0.1:4369
tcp TIME-WAIT 0 0 127.0.0.1:52958 127.0.0.1:4369
tcp TIME-WAIT 0 0 127.0.0.1:35858 127.0.0.1:4369
tcp TIME-WAIT 0 0 127.0.0.1:59041 127.0.0.1:4369
tcp TIME-WAIT 0 0 172.17.1.12:4369 172.17.1.12:55288
tcp TIME-WAIT 0 0 127.0.0.1:39314 127.0.0.1:4369
tcp TIME-WAIT 0 0 172.17.1.12:4369 172.17.1.12:37688
tcp TIME-WAIT 0 0 172.17.1.12:4369 172.17.1.12:60427
tcp TIME-WAIT 0 0 127.0.0.1:42312 127.0.0.1:4369
tcp TIME-WAIT 0 0 127.0.0.1:35171 127.0.0.1:4369
tcp TIME-WAIT 0 0 172.17.1.12:4369 172.17.1.12:43145
tcp TIME-WAIT 0 0 172.17.1.12:4369 172.17.1.12:44075
tcp TIME-WAIT 0 0 172.17.1.12:4369 172.17.1.12:33722
tcp TIME-WAIT 0 0 172.17.1.12:4369 172.17.1.12:59552
tcp ESTAB 0 0 127.0.0.1:4369 127.0.0.1:34438 users:(("epmd",pid=10762,fd=5))
tcp TIME-WAIT 0 0 127.0.0.1:35897 127.0.0.1:4369
tcp TIME-WAIT 0 0 172.17.1.12:4369 172.17.1.12:60203
tcp TIME-WAIT 0 0 172.17.1.12:4369 172.17.1.12:52215
tcp TIME-WAIT 0 0 127.0.0.1:50348 127.0.0.1:4369
tcp TIME-WAIT 0 0 172.17.1.12:4369 172.17.1.12:33201
tcp TIME-WAIT 0 0 172.17.1.12:4369 172.17.1.12:32860
tcp TIME-WAIT 0 0 127.0.0.1:38523 127.0.0.1:4369
tcp TIME-WAIT 0 0 172.17.1.12:4369 172.17.1.12:40896
tcp LISTEN 0 128 :::4369 :::* users:(("epmd",pid=10762,fd=4))
controller-1 | SUCCESS | rc=0 >>
tcp LISTEN 0 128 *:4369 *:* users:(("epmd",pid=9725,fd=3))
tcp TIME-WAIT 0 0 172.17.1.18:4369 172.17.1.18:48883
tcp TIME-WAIT 0 0 172.17.1.18:4369 172.17.1.18:45415
tcp ESTAB 0 0 127.0.0.1:4369 127.0.0.1:41776 users:(("epmd",pid=9725,fd=5))
tcp TIME-WAIT 0 0 127.0.0.1:39204 127.0.0.1:4369
tcp TIME-WAIT 0 0 127.0.0.1:39352 127.0.0.1:4369
tcp TIME-WAIT 0 0 172.17.1.18:4369 172.17.1.18:51954
tcp TIME-WAIT 0 0 127.0.0.1:43407 127.0.0.1:4369
tcp TIME-WAIT 0 0 127.0.0.1:47398 127.0.0.1:4369
tcp TIME-WAIT 0 0 127.0.0.1:48606 127.0.0.1:4369
tcp TIME-WAIT 0 0 127.0.0.1:56653 127.0.0.1:4369
tcp TIME-WAIT 0 0 172.17.1.18:4369 172.17.1.18:49594
tcp TIME-WAIT 0 0 127.0.0.1:53935 127.0.0.1:4369
tcp TIME-WAIT 0 0 127.0.0.1:58469 127.0.0.1:4369
tcp TIME-WAIT 0 0 172.17.1.18:4369 172.17.1.18:41873
tcp TIME-WAIT 0 0 172.17.1.18:4369 172.17.1.18:51399
tcp TIME-WAIT 0 0 172.17.1.18:4369 172.17.1.18:54985
tcp TIME-WAIT 0 0 127.0.0.1:41007 127.0.0.1:4369
tcp TIME-WAIT 0 0 172.17.1.18:4369 172.17.1.18:60205
tcp TIME-WAIT 0 0 172.17.1.18:4369 172.17.1.18:49194
tcp TIME-WAIT 0 0 127.0.0.1:54824 127.0.0.1:4369
tcp TIME-WAIT 0 0 127.0.0.1:40072 127.0.0.1:4369
tcp TIME-WAIT 0 0 172.17.1.18:4369 172.17.1.18:50862
tcp TIME-WAIT 0 0 172.17.1.18:4369 172.17.1.18:39001
tcp TIME-WAIT 0 0 172.17.1.18:4369 172.17.1.18:56765
tcp ESTAB 0 0 127.0.0.1:41776 127.0.0.1:4369 users:(("beam.smp",pid=18278,fd=44))
tcp TIME-WAIT 0 0 127.0.0.1:57691 127.0.0.1:4369
tcp TIME-WAIT 0 0 172.17.1.18:4369 172.17.1.18:45378
tcp LISTEN 0 128 :::4369 :::* users:(("epmd",pid=9725,fd=4))
more checks are needed Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3674 |