Bug 1362117
Summary: | RabbitMQ fails to rejoin the cluster after some consecutive start/stop actions | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Raoul Scarazzini <rscarazz> |
Component: | resource-agents | Assignee: | John Eckersberg <jeckersb> |
Status: | CLOSED DUPLICATE | QA Contact: | cluster-qe <cluster-qe> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 7.2 | CC: | agk, cluster-maint, fdinitto, michele, plemenko, rscarazz, ushkalim |
Target Milestone: | rc | ||
Target Release: | 7.2 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-12-07 14:00:10 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Raoul Scarazzini
2016-08-01 10:21:44 UTC
All sosreports are available at this address [1] with this directory tree: BZ1362117 |-osp9 | |- with_resource-agents-3.9.5-54.el7 | \- with_resource-agents-3.9.5-80.el7 \-newton |- with_resource-agents-3.9.5-54.el7 \- with_resource-agents-3.9.5-80.el7 In every try the result is the same, what changes is just the number of starts/stops that precedes the failure. [1] http://file.rdu.redhat.com/~rscarazz/BZ1362117/ As an additional test I installed the exact rpm version from zstream (version resource-agents-3.9.5-54.el7_2.16 as from here [1]), and after 88 tries we got the same error: rabbitmq_start_0 on overcloud-controller-0 'unknown error' (1): call=27889, status=complete, exitreason='none', last-rc-change='Tue Aug 2 17:41:29 2016', queued=0ms, exec=8124ms And from the cluster log the error seems pretty much clear: Aug 02 17:41:37 [40741] overcloud-controller-0.localdomain cib: info: crm_compress_string: Compressed 271070 bytes into 11028 (ratio 24:1) in 32ms rabbitmq-cluster(rabbitmq)[23454]: 2016/08/02_17:41:37 INFO: node failed to join even after reseting local data. Check SELINUX policy Aug 02 17:41:37 [40743] overcloud-controller-0.localdomain lrmd: notice: operation_finished: rabbitmq_start_0:23454:stderr [ Error: process_not_running ] Aug 02 17:41:37 [40743] overcloud-controller-0.localdomain lrmd: notice: operation_finished: rabbitmq_start_0:23454:stderr [ Error: unable to connect to node 'rabbit@overcloud-con troller-0': nodedown ] Aug 02 17:41:37 [40743] overcloud-controller-0.localdomain lrmd: notice: operation_finished: rabbitmq_start_0:23454:stderr [ ] Aug 02 17:41:37 [40743] overcloud-controller-0.localdomain lrmd: notice: operation_finished: rabbitmq_start_0:23454:stderr [ DIAGNOSTICS ] Aug 02 17:41:37 [40743] overcloud-controller-0.localdomain lrmd: notice: operation_finished: rabbitmq_start_0:23454:stderr [ =========== ] Aug 02 17:41:37 [40743] overcloud-controller-0.localdomain lrmd: notice: operation_finished: rabbitmq_start_0:23454:stderr [ ] Aug 02 17:41:37 [40743] overcloud-controller-0.localdomain lrmd: notice: operation_finished: rabbitmq_start_0:23454:stderr [ attempted to contact: ['rabbit@overcloud-controller-0' ] ] Aug 02 17:41:37 [40743] overcloud-controller-0.localdomain lrmd: notice: operation_finished: rabbitmq_start_0:23454:stderr [ ] Aug 02 17:41:37 [40743] overcloud-controller-0.localdomain lrmd: notice: operation_finished: rabbitmq_start_0:23454:stderr [ rabbit@overcloud-controller-0: ] Aug 02 17:41:37 [40743] overcloud-controller-0.localdomain lrmd: notice: operation_finished: rabbitmq_start_0:23454:stderr [ * connected to epmd (port 4369) on overcloud-control ler-0 ] Aug 02 17:41:37 [40743] overcloud-controller-0.localdomain lrmd: notice: operation_finished: rabbitmq_start_0:23454:stderr [ * epmd reports: node 'rabbit' not running at all ] Aug 02 17:41:37 [40743] overcloud-controller-0.localdomain lrmd: notice: operation_finished: rabbitmq_start_0:23454:stderr [ no other nodes on overcloud-controller-0 ] Aug 02 17:41:37 [40743] overcloud-controller-0.localdomain lrmd: notice: operation_finished: rabbitmq_start_0:23454:stderr [ * suggestion: start the node ] Aug 02 17:41:37 [40743] overcloud-controller-0.localdomain lrmd: notice: operation_finished: rabbitmq_start_0:23454:stderr [ ] Aug 02 17:41:37 [40743] overcloud-controller-0.localdomain lrmd: notice: operation_finished: rabbitmq_start_0:23454:stderr [ current node details: ] Aug 02 17:41:37 [40743] overcloud-controller-0.localdomain lrmd: notice: operation_finished: rabbitmq_start_0:23454:stderr [ - node name: 'rabbitmq-cli-15@overcloud-controller-0' ] Aug 02 17:41:37 [40743] overcloud-controller-0.localdomain lrmd: notice: operation_finished: rabbitmq_start_0:23454:stderr [ - home dir: /var/lib/rabbitmq ] Aug 02 17:41:37 [40743] overcloud-controller-0.localdomain lrmd: notice: operation_finished: rabbitmq_start_0:23454:stderr [ - cookie hash: YB33eAHDuEIsi0xTrQP4DA== ] Aug 02 17:41:37 [40743] overcloud-controller-0.localdomain lrmd: notice: operation_finished: rabbitmq_start_0:23454:stderr [ ] Aug 02 17:41:37 [40743] overcloud-controller-0.localdomain lrmd: notice: operation_finished: rabbitmq_start_0:23454:stderr [ Error: {not_a_cluster_node,"The node selected is not in the cluster."} ] Aug 02 17:41:37 [40743] overcloud-controller-0.localdomain lrmd: notice: operation_finished: rabbitmq_start_0:23454:stderr [ Error: {not_a_cluster_node,"The node selected is not in the cluster."} ] Aug 02 17:41:37 [40743] overcloud-controller-0.localdomain lrmd: notice: operation_finished: rabbitmq_start_0:23454:stderr [ Error: process_not_running ] Aug 02 17:41:37 [40743] overcloud-controller-0.localdomain lrmd: notice: operation_finished: rabbitmq_start_0:23454:stderr [ Error: unable to connect to node 'rabbit@overcloud-controller-0': nodedown ] Aug 02 17:41:37 [40743] overcloud-controller-0.localdomain lrmd: notice: operation_finished: rabbitmq_start_0:23454:stderr [ ] Aug 02 17:41:37 [40743] overcloud-controller-0.localdomain lrmd: notice: operation_finished: rabbitmq_start_0:23454:stderr [ DIAGNOSTICS ] Aug 02 17:41:37 [40743] overcloud-controller-0.localdomain lrmd: notice: operation_finished: rabbitmq_start_0:23454:stderr [ =========== ] Aug 02 17:41:37 [40743] overcloud-controller-0.localdomain lrmd: notice: operation_finished: rabbitmq_start_0:23454:stderr [ ] Aug 02 17:41:37 [40743] overcloud-controller-0.localdomain lrmd: notice: operation_finished: rabbitmq_start_0:23454:stderr [ attempted to contact: ['rabbit@overcloud-controller-0'] ] Aug 02 17:41:37 [40743] overcloud-controller-0.localdomain lrmd: notice: operation_finished: rabbitmq_start_0:23454:stderr [ ] Aug 02 17:41:37 [40743] overcloud-controller-0.localdomain lrmd: notice: operation_finished: rabbitmq_start_0:23454:stderr [ rabbit@overcloud-controller-0: ] Aug 02 17:41:37 [40743] overcloud-controller-0.localdomain lrmd: notice: operation_finished: rabbitmq_start_0:23454:stderr [ * connected to epmd (port 4369) on overcloud-controller-0 ] Aug 02 17:41:37 [40743] overcloud-controller-0.localdomain lrmd: notice: operation_finished: rabbitmq_start_0:23454:stderr [ * epmd reports: node 'rabbit' not running at all ] Aug 02 17:41:37 [40743] overcloud-controller-0.localdomain lrmd: notice: operation_finished: rabbitmq_start_0:23454:stderr [ no other nodes on overcloud-controll: Aug 02 17:41:37 [40743] overcloud-controller-0.localdomain lrmd: notice: operation_finished: rabbitmq_start_0:23454:stderr [ * suggestion: start the node ] Aug 02 17:41:37 [40743] overcloud-controller-0.localdomain lrmd: notice: operation_finished: rabbitmq_start_0:23454:stderr [ ] Aug 02 17:41:37 [40743] overcloud-controller-0.localdomain lrmd: notice: operation_finished: rabbitmq_start_0:23454:stderr [ current node details: ] Aug 02 17:41:37 [40743] overcloud-controller-0.localdomain lrmd: notice: operation_finished: rabbitmq_start_0:23454:stderr [ - node name: 'rabbitmq-cli-77@overcloud-controller-0' ] Aug 02 17:41:37 [40743] overcloud-controller-0.localdomain lrmd: notice: operation_finished: rabbitmq_start_0:23454:stderr [ - home dir: /var/lib/rabbitmq ] Aug 02 17:41:37 [40743] overcloud-controller-0.localdomain lrmd: notice: operation_finished: rabbitmq_start_0:23454:stderr [ - cookie hash: YB33eAHDuEIsi0xTrQP4DA== ] Aug 02 17:41:37 [40743] overcloud-controller-0.localdomain lrmd: notice: operation_finished: rabbitmq_start_0:23454:stderr [ ] Aug 02 17:41:37 [40743] overcloud-controller-0.localdomain lrmd: info: log_finished: finished - rsc:rabbitmq action:start call_id:27889 pid:23454 exit-code:1 exec-time:8124ms queue-time:0ms I attached all the new sosreport here [2], these were taken after while so you need to check the time while the error happened (Aug 02 17:41:37). [1] https://bugzilla.redhat.com/show_bug.cgi?id=1351446 [2] http://file.rdu.redhat.com/~rscarazz/BZ1362117/osp9/with_resource-agents-3.9.5-54.el7_2.16/ *** This bug has been marked as a duplicate of bug 1397393 *** |