RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1659072 - osp8 overcloud deployment fails with : epmd reports: node 'rabbit' not running at all
Summary: osp8 overcloud deployment fails with : epmd reports: node 'rabbit' not runnin...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: resource-agents
Version: 7.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Oyvind Albrigtsen
QA Contact: michal novacek
URL:
Whiteboard:
Depends On:
Blocks: 1647587 1692889
TreeView+ depends on / blocked
 
Reported: 2018-12-13 14:07 UTC by pkomarov
Modified: 2021-03-15 07:32 UTC (History)
14 users (show)

Fixed In Version: resource-agents-4.1.1-16.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1692889 (view as bug list)
Environment:
Last Closed: 2021-03-15 07:32:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ClusterLabs resource-agents pull 1276 0 None closed rabbitmq-cluster: always use quiet flag for eval calls 2021-02-04 11:12:54 UTC
Red Hat Knowledge Base (Solution) 3756061 0 None None None 2018-12-14 09:19:36 UTC

Description pkomarov 2018-12-13 14:07:51 UTC
Description of problem:
osp8 overcloud deployment fails with : epmd reports: node 'rabbit' not running at all  

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.HA deployment os osp8
2.
3.
[stack@undercloud-0 ~]$ cat core_puddle_version 
2018-11-15.1[stack@undercloud-0 ~]$ 
[stack@undercloud-0 ~]$ rhos-release -L

Installed repositories (rhel-7.6):
  8-director
  8
  ceph-2
  ceph-osd-2
  rhel-7.6


[stack@undercloud-0 ~]$ ansible controller -b -mshell -a'pcs status|grep FAILED'
 [WARNING]: Found both group and host with same name: undercloud

controller-2 | SUCCESS | rc=0 >>
     rabbitmq	(ocf::heartbeat:rabbitmq-cluster):	FAILED controller-0 (Monitoring)


controller-0 | SUCCESS | rc=0 >>
     rabbitmq	(ocf::heartbeat:rabbitmq-cluster):	FAILED controller-0 (Monitoring)

controller-1 | SUCCESS | rc=0 >>
     rabbitmq	(ocf::heartbeat:rabbitmq-cluster):	FAILED controller-0 (Monitoring)

[stack@undercloud-0 ~]$ openstack stack resource list overcloud|grep -v COMP
+-------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+---------------------+
| resource_name                             | physical_resource_id                          | resource_type                                     | resource_status | updated_time        |
+-------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+---------------------+
| ControllerNodesPostDeployment             | be5be2cf-1615-41a5-89d8-7c9b45af19b6          | OS::TripleO::ControllerPostDeployment             | CREATE_FAILED   | 2018-12-13T06:58:31 |
+--------------------------------[stack@undercloud-0 ~]$ openstack software deployment list |grep -v COMP
+--------------------------------------+--------------------------------------+--------------------------------------+--------+----------+
| id                                   | config_id                            | server_id                            | action | status   |
+--------------------------------------+--------------------------------------+--------------------------------------+--------+----------+
| c248efc4-c2a7-4641-a40f-323ff301d3e3 | d84dd9ef-d281-408f-a1f9-ae0ef2bfe9fd | 174e9834-1c28-4ec4-ab4d-b3c54dda5262 | CREATE | FAILED   |
+--------------------------------------+--------------------------------------+--------------------------------------+--------+----------+

[stack@undercloud-0 ~]$ openstack software config show d84dd9ef-d281-408f-a1f9-ae0ef2bfe9fd
output in link (basically it's post deployment pacemaker setup : puppet-tripleo:manifests/profile/base/pacemaker.pp)

http://pastebin.test.redhat.com/683353

#on the controller: from /var/log/messages:

Dec 13 05:29:31 controller-0 su: (to rabbitmq) root on none
Dec 13 05:29:31 controller-0 systemd: Started Session c14 of user rabbitmq.
Dec 13 05:29:32 controller-0 lrmd[8850]:  notice: rabbitmq_monitor_10000:20269:stderr [ /usr/lib/ocf/resource.d/heartbeat/rabbitmq-cluster: line 206: 1 ]
Dec 13 05:29:32 controller-0 lrmd[8850]:  notice: rabbitmq_monitor_10000:20269:stderr [ ...done. * 2 : syntax error: invalid arithmetic operator (error token is "...done. * 2 ") ]
Dec 13 05:29:32 controller-0 crmd[8853]:  notice: controller-0-rabbitmq_monitor_10000:74 [ /usr/lib/ocf/resource.d/heartbeat/rabbitmq-cluster: line 206: 1\n...done. * 2 : syntax error: invalid arithmetic operator (error token is "...done. * 2 ")\n ]
Dec 13 05:29:33 controller-0 crmd[8853]:  notice: Result of start operation for memcached on controller-0: 0 (ok)
Dec 13 05:29:34 controller-0 ntpd[7663]: 0.0.0.0 c61c 0c clock_step +0.424865 s
Dec 13 05:29:34 controller-0 ntpd[7663]: 0.0.0.0 c614 04 freq_mode
Dec 13 05:29:34 controller-0 systemd: Time has been changed
Dec 13 05:29:35 controller-0 ntpd[7663]: 0.0.0.0 c618 08 no_sys_peer
Dec 13 05:29:36 controller-0 crmd[8853]:  notice: controller-0-rabbitmq_monitor_10000:74 [ /usr/lib/ocf/resource.d/heartbeat/rabbitmq-cluster: line 206: 1\n...done. * 2 : syntax error: invalid arithmetic operator (error token is "...done. * 2 ")\n ]
Dec 13 05:29:36 controller-0 su: (to rabbitmq) root on none
Dec 13 05:29:36 controller-0 systemd: Started Session c15 of user rabbitmq.
Dec 13 05:29:36 controller-0 su: (to rabbitmq) root on none
Dec 13 05:29:36 controller-0 systemd: Started Session c16 of user rabbitmq.
Dec 13 05:29:36 controller-0 su: (to rabbitmq) root on none
Dec 13 05:29:36 controller-0 systemd: Started Session c17 of user rabbitmq.
Dec 13 05:29:36 controller-0 su: (to rabbitmq) root on none
Dec 13 05:29:36 controller-0 systemd: Started Session c18 of user rabbitmq.
Dec 13 05:29:38 controller-0 lrmd[8850]:  notice: rabbitmq_stop_0:20585:stderr [ Error: unable to connect to node 'rabbit@controller-0': nodedown ]
Dec 13 05:29:38 controller-0 lrmd[8850]:  notice: rabbitmq_stop_0:20585:stderr [  ]
Dec 13 05:29:38 controller-0 lrmd[8850]:  notice: rabbitmq_stop_0:20585:stderr [ DIAGNOSTICS ]
Dec 13 05:29:38 controller-0 lrmd[8850]:  notice: rabbitmq_stop_0:20585:stderr [ =========== ]
Dec 13 05:29:38 controller-0 lrmd[8850]:  notice: rabbitmq_stop_0:20585:stderr [  ]
Dec 13 05:29:38 controller-0 lrmd[8850]:  notice: rabbitmq_stop_0:20585:stderr [ attempted to contact: ['rabbit@controller-0'] ]
Dec 13 05:29:38 controller-0 lrmd[8850]:  notice: rabbitmq_stop_0:20585:stderr [  ]
Dec 13 05:29:38 controller-0 lrmd[8850]:  notice: rabbitmq_stop_0:20585:stderr [ rabbit@controller-0: ]
Dec 13 05:29:38 controller-0 lrmd[8850]:  notice: rabbitmq_stop_0:20585:stderr [   * connected to epmd (port 4369) on controller-0 ]
Dec 13 05:29:38 controller-0 lrmd[8850]:  notice: rabbitmq_stop_0:20585:stderr [   * epmd reports: node 'rabbit' not running at all ]
Dec 13 05:29:38 controller-0 lrmd[8850]:  notice: rabbitmq_stop_0:20585:stderr [                   no other nodes on controller-0 ]
Dec 13 05:29:38 controller-0 lrmd[8850]:  notice: rabbitmq_stop_0:20585:stderr [   * suggestion: start the node ]
Dec 13 05:29:38 controller-0 lrmd[8850]:  notice: rabbitmq_stop_0:20585:stderr [  ]
Dec 13 05:29:38 controller-0 lrmd[8850]:  notice: rabbitmq_stop_0:20585:stderr [ current node details: ]
Dec 13 05:29:38 controller-0 lrmd[8850]:  notice: rabbitmq_stop_0:20585:stderr [ - node name: 'rabbitmqctl20809@controller-0' ]
Dec 13 05:29:38 controller-0 lrmd[8850]:  notice: rabbitmq_stop_0:20585:stderr [ - home dir: /var/lib/rabbitmq ]
Dec 13 05:29:38 controller-0 lrmd[8850]:  notice: rabbitmq_stop_0:20585:stderr [ - cookie hash: J9D9g2x1Jd/x/mEaud74Uw== ]
Dec 13 05:29:38 controller-0 lrmd[8850]:  notice: rabbitmq_stop_0:20585:stderr [  ]

Comment 3 John Eckersberg 2018-12-13 17:54:46 UTC
This is because the rabbitmq-server version in OSP8 has different output format for eval calls than in newer OSP versions.

On 8:

[root@localhost ~]# rabbitmqctl eval 'testing.'
testing
...done.
[root@localhost ~]# 

On 9 through 14:
[root@localhost ~]# rabbitmqctl eval 'testing.'
testing
[root@localhost ~]# 

On the 8 version, the "...done." output can be suppressed by passing the -q flag:

[root@localhost ~]# rpm -q rabbitmq-server
rabbitmq-server-3.3.5-34.el7ost.noarch
[root@localhost ~]# rhos-release -L
Installed repositories (rhel-7.6):
  8
  ceph-1.3
  ceph-osd-1.3
  rhel-7.6
[root@localhost ~]# rabbitmqctl eval -q 'testing.'
testing
[root@localhost ~]#

So we must update the resource agent to always use -q for eval.

Comment 21 RHEL Program Management 2021-03-15 07:32:30 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.


Note You need to log in before you can comment on or make changes to this bug.