Bug 855959

Summary: Unable to remove nodes from corosync running config without segfault
Product: Red Hat Enterprise Linux 7 Reporter: Chris Feist <cfeist>
Component: libqbAssignee: Angus Salkeld <asalkeld>
Status: CLOSED UPSTREAM QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.0CC: sdake
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libqb-0.14.2-2.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-09-17 06:12:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Chris Feist 2012-09-10 18:37:36 UTC
When attempting to remove nodes from a running corosync cluster I get a segfault.  Below are the commands/output that I'm running.  (cluster is udpu)

[root@rh7-1 ~]# rpm -q corosync
corosync-2.0.1-3.el7.x86_64

I've got a 3 node cluster running pacemaker/corosync.  I shutdown pacemaker and corosync on the 3rd node (rh7-3) and try to remove it on another node from the running corosync and I end up getting a segfault.  Here are the commands I'm running.

Before rh7-3 shutdown:
[root@rh7-1 ~]# corosync-quorumtool -l

Membership information
----------------------
    Nodeid      Votes Name
         3          1 rh7-3
         2          1 rh7-2
         1          1 rh7-1

After shutdown:
[root@rh7-1 ~]# corosync-quorumtool -l

Membership information
----------------------
    Nodeid      Votes Name
         2          1 rh7-2
         1          1 rh7-1

[root@rh7-1 ~]# corosync-cmapctl  | grep nodelist
nodelist.local_node_pos (u32) = 0
nodelist.node.0.nodeid (u32) = 1
nodelist.node.0.ring0_addr (str) = rh7-1
nodelist.node.1.nodeid (u32) = 2
nodelist.node.1.ring0_addr (str) = rh7-2
nodelist.node.2.nodeid (u32) = 3
nodelist.node.2.ring0_addr (str) = rh7-3

[root@rh7-1 ~]# corosync-cmapctl -d nodelist.node.2.ring0_addr nodelist.node.2.nodeid
Can't delete key nodelist.node.2.ring0_addr. Error CS_ERR_LIBRARY
Can't delete key nodelist.node.2.nodeid. Error CS_ERR_LIBRARY

Then corosync segfaults.  I get these messages in /var/log/messages:

Sep 10 12:00:55 rh7-1 corosync[18699]:  [TOTEM ] removing UDPU member {192.168.122.116}
Sep 10 12:00:56 rh7-1 abrt[22147]: Saved core dump of pid 18699 (/usr/sbin/corosync) to /var/spool/abrt/ccpp-2012-09-10-12:00:56-18699 (33681408 bytes)
Sep 10 12:00:57 rh7-1 systemd[1]: corosync.service: main process exited, code=dumped, status=11

Comment 1 Jan Friesse 2012-09-17 06:12:03 UTC
This was problem with libqb, which should be fixed in upstream and libqb-0.14.2-2.el7.