Bug 1312094 - crmd can crash after unexpected remote connection takeover
Summary: crmd can crash after unexpected remote connection takeover
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: pacemaker
Version: 7.2
Hardware: All
OS: All
high
medium
Target Milestone: rc
: 7.3
Assignee: Ken Gaillot
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On: 1304771
Blocks: CVE-2016-7797
TreeView+ depends on / blocked
 
Reported: 2016-02-25 17:45 UTC by Ken Gaillot
Modified: 2016-11-03 18:58 UTC (History)
4 users (show)

Fixed In Version: pacemaker-1.1.15-1.2c148ac.git.el7
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of: 1312092
Environment:
Last Closed: 2016-11-03 18:58:51 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Cluster Labs 5269 0 None None None 2016-02-25 17:45:07 UTC
Red Hat Product Errata RHSA-2016:2578 0 normal SHIPPED_LIVE Moderate: pacemaker security, bug fix, and enhancement update 2016-11-03 12:07:24 UTC

Comment 3 Mike McCune 2016-03-28 22:52:26 UTC
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 5 Patrik Hagara 2016-09-08 14:28:34 UTC
Setup: 3-node cluster + 1 pacemaker_remote node

Before the fix:

> Sep 08 14:26:11 [27822] virt-166       crmd:    error: remote_lrm_op_callback:	Unexpected pacemaker_remote client takeover. Disconnecting
> Sep 08 14:26:11 [27822] virt-166       crmd:     info: lrmd_api_disconnect:	Disconnecting from 3 lrmd service
> Sep 08 14:26:11 [27822] virt-166       crmd:     info: lrmd_api_disconnect:	Disconnecting from 3 lrmd service
> Sep 08 14:26:11 [27822] virt-166       crmd:     info: lrmd_tls_connection_destroy:	TLS connection destroyed
> Sep 08 14:26:11 [27816] virt-166 pacemakerd:    error: child_waitpid:	Managed process 27822 (crmd) dumped core
> Sep 08 14:26:11 [27816] virt-166 pacemakerd:    error: pcmk_child_exit:	The crmd process (27822) terminated with signal 6 (core=1)

pacemaker_remote node got disconnected from the cluster, crmd on cluster node hosting the pacemaker_remote connection crashed and was restarted,  the cluster returned to a fully operational state shortly thereafter.


After the fix:

> Sep  8 16:23:46 virt-055 pacemaker_remoted[17977]:  notice: LRMD client connection established. 0xd8e120 id: f93cb6a1-a321-4ff5-8c75-398190f50b28
> Sep  8 16:23:56 virt-055 pacemaker_remoted[17977]:  notice: LRMD client disconnecting remote client - name: <unknown> id: f93cb6a1-a321-4ff5-8c75-398190f50b28
> Sep  8 16:23:56 virt-055 pacemaker_remoted[17977]:   error: Remote client authentication timed out

Cluster remained fully operational without service disruption, no log messages on cluster node hosting the pacemaker_remote connection, the remote node itself logs auth time-out error.

Marking as verified in pacemaker-1.1.15-1.2c148ac.git.el7

Comment 7 errata-xmlrpc 2016-11-03 18:58:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2578.html


Note You need to log in before you can comment on or make changes to this bug.