Bug 1312094
| Summary: | crmd can crash after unexpected remote connection takeover | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Ken Gaillot <kgaillot> |
| Component: | pacemaker | Assignee: | Ken Gaillot <kgaillot> |
| Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> |
| Severity: | medium | Docs Contact: | |
| Priority: | high | ||
| Version: | 7.2 | CC: | abeekhof, cluster-maint, cluster-qe, phagara |
| Target Milestone: | rc | ||
| Target Release: | 7.3 | ||
| Hardware: | All | ||
| OS: | All | ||
| Whiteboard: | |||
| Fixed In Version: | pacemaker-1.1.15-1.2c148ac.git.el7 | Doc Type: | No Doc Update |
| Doc Text: |
undefined
|
Story Points: | --- |
| Clone Of: | 1312092 | Environment: | |
| Last Closed: | 2016-11-03 18:58:51 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1304771 | ||
| Bug Blocks: | 1379784 | ||
|
Comment 3
Mike McCune
2016-03-28 22:52:26 UTC
Setup: 3-node cluster + 1 pacemaker_remote node Before the fix: > Sep 08 14:26:11 [27822] virt-166 crmd: error: remote_lrm_op_callback: Unexpected pacemaker_remote client takeover. Disconnecting > Sep 08 14:26:11 [27822] virt-166 crmd: info: lrmd_api_disconnect: Disconnecting from 3 lrmd service > Sep 08 14:26:11 [27822] virt-166 crmd: info: lrmd_api_disconnect: Disconnecting from 3 lrmd service > Sep 08 14:26:11 [27822] virt-166 crmd: info: lrmd_tls_connection_destroy: TLS connection destroyed > Sep 08 14:26:11 [27816] virt-166 pacemakerd: error: child_waitpid: Managed process 27822 (crmd) dumped core > Sep 08 14:26:11 [27816] virt-166 pacemakerd: error: pcmk_child_exit: The crmd process (27822) terminated with signal 6 (core=1) pacemaker_remote node got disconnected from the cluster, crmd on cluster node hosting the pacemaker_remote connection crashed and was restarted, the cluster returned to a fully operational state shortly thereafter. After the fix: > Sep 8 16:23:46 virt-055 pacemaker_remoted[17977]: notice: LRMD client connection established. 0xd8e120 id: f93cb6a1-a321-4ff5-8c75-398190f50b28 > Sep 8 16:23:56 virt-055 pacemaker_remoted[17977]: notice: LRMD client disconnecting remote client - name: <unknown> id: f93cb6a1-a321-4ff5-8c75-398190f50b28 > Sep 8 16:23:56 virt-055 pacemaker_remoted[17977]: error: Remote client authentication timed out Cluster remained fully operational without service disruption, no log messages on cluster node hosting the pacemaker_remote connection, the remote node itself logs auth time-out error. Marking as verified in pacemaker-1.1.15-1.2c148ac.git.el7 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2578.html |