Bug 1134426

Summary:

pcs needs a better parser for corosync.conf

Product:

Red Hat Enterprise Linux 7

Reporter:

Radek Steiger <rsteiger>

Component:

pcs

Assignee:

Tomas Jelinek <tojeline>

Status:

CLOSED ERRATA

QA Contact:

cluster-qe <cluster-qe>

Severity:

high

Docs Contact:

Priority:

high

Version:

7.0

CC:

cfeist, cluster-maint, jpokorny, tojeline

Target Milestone:

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

pcs-0.9.140-1.el7

Doc Type:

Bug Fix

Doc Text:

Cause: User edits corosync.conf configuration file manually. Consequence: Pcs misbehaves as it is not able to read the file properly. Fix: Implement full-featured parser for corosync.conf file. Result: Pcs is able to read a manually edited corosync.conf file properly.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2015-11-19 09:32:49 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1142126

Attachments:

Description	Flags
corosync.conf	none
Example patch (see the comment)	none
proposed fix 1/3	none
proposed fix 2/3	none
proposed fix 3/3	none

Description Radek Steiger 2014-08-27 13:42:53 UTC

Created attachment 931475 [details]
corosync.conf

Description of problem:

(credits go to Honza Friesse and Tomas Jelinek)

The internal parser for getting nodes from corosync.conf uses following code for grepping node names:

    preg = re.compile(r'.*ring0_addr: (.*)')
    for line in lines:
        match = preg.match(line)
        if match:
            nodes.append (match.group(1))

This basically greps _anything_ containing "ring0_addr:" and can lead to some crazy results. As an example I created a cluster with "ring0_addr:" in cluster name:


[root@virt-041 ~]# pcs cluster setup --name "ring0_addr: blabla" virt-041.cluster-qe.lab.eng.brq.redhat.com virt-042.cluster-qe.lab.eng.brq.redhat.com --start --enable
Shutting down pacemaker/corosync services...
Redirecting to /bin/systemctl stop  pacemaker.service
Redirecting to /bin/systemctl stop  corosync.service
Killing any remaining services...
Removing all cluster configuration files...
virt-041.cluster-qe.lab.eng.brq.redhat.com: Succeeded
virt-041.cluster-qe.lab.eng.brq.redhat.com: Starting Cluster...
virt-042.cluster-qe.lab.eng.brq.redhat.com: Succeeded
virt-042.cluster-qe.lab.eng.brq.redhat.com: Starting Cluster...
virt-041.cluster-qe.lab.eng.brq.redhat.com: Cluster Enabled
virt-042.cluster-qe.lab.eng.brq.redhat.com: Cluster Enabled


The getNodesFromCorosyncConf() function then results in following situations: 

[root@virt-041 ~]# pcs status nodes corosync
Corosync Nodes:
 Online: virt-041.cluster-qe.lab.eng.brq.redhat.com virt-042.cluster-qe.lab.eng.brq.redhat.com 
 Offline: blabla 

[root@virt-041 ~]# pcs status pcsd
  blabla: Offline
  virt-041.cluster-qe.lab.eng.brq.redhat.com: Online
  virt-042.cluster-qe.lab.eng.brq.redhat.com: Online


The related sections from corosync.conf:

[root@virt-041 ~]# grep ring0 /etc/corosync/corosync.conf
cluster_name: ring0_addr: blabla
        ring0_addr: virt-041.cluster-qe.lab.eng.brq.redhat.com
        ring0_addr: virt-042.cluster-qe.lab.eng.brq.redhat.com



Version-Release number of selected component (if applicable):

pcs-0.9.115-32.el7

Comment 6 Jan Pokorný [poki] 2014-11-24 20:37:45 UTC

Created attachment 960932 [details]
Example patch (see the comment)

...but in fact this is just a drop in the sea, proper config format parser is needed to fix the same underlying issue once for all.

Comment 7 Jan Pokorný [poki] 2014-11-24 22:50:03 UTC

FWIW, this is also the reason I added a tunable as a possible
workaround/preventive measure in the pcs-clufter interaction:

https://github.com/jnpkrn/clufter/commit/84d0e6b8bab10abd3f06db8b6f13967f5a809366

Comment 8 Tomas Jelinek 2015-02-19 15:44:56 UTC

Created attachment 993680 [details]
proposed fix 1/3

Comment 9 Tomas Jelinek 2015-02-19 15:45:15 UTC

Created attachment 993681 [details]
proposed fix 2/3

Comment 10 Tomas Jelinek 2015-02-19 15:45:35 UTC

Created attachment 993683 [details]
proposed fix 3/3

Comment 13 Tomas Jelinek 2015-05-26 10:35:26 UTC

additional fix:
https://github.com/feist/pcs/commit/b4e73ec58ee50a2b8b60f1684ed85ac67919d894

Comment 14 Tomas Jelinek 2015-06-04 14:21:55 UTC

Before Fix:
[root@rh71-node1 ~]# rpm -q pcs
pcs-0.9.137-13.el7_1.2.x86_64
[root@rh71-node1:~]# pcs cluster setup --name 'ring0_addr: blabla' rh71-node1 rh71-node2 --start --enable
Shutting down pacemaker/corosync services...
Redirecting to /bin/systemctl stop  pacemaker.service
Redirecting to /bin/systemctl stop  corosync.service
Killing any remaining services...
Removing all cluster configuration files...
rh71-node1: Succeeded
rh71-node2: Succeeded
Starting cluster on nodes: rh71-node1, rh71-node2...
rh71-node1: Starting Cluster...
rh71-node2: Starting Cluster...
rh71-node1: Cluster Enabled
rh71-node2: Cluster Enabled
[root@rh71-node1:~]# pcs status nodes corosync
Corosync Nodes:
 Online: rh71-node1 rh71-node2
 Offline: blabla 
[root@rh71-node1:~]# pcs status pcsd
  blabla: Offline
  rh71-node1: Online
  rh71-node2: Online



After Fix:
[root@rh71-node1:~]# rpm -q pcs
pcs-0.9.140-1.el6.x86_64
[root@rh71-node1:~]# pcs cluster setup --name 'ring0_addr: blabla' rh71-node1 rh71-node2 --start --enable
Shutting down pacemaker/corosync services...
Redirecting to /bin/systemctl stop  pacemaker.service
Redirecting to /bin/systemctl stop  corosync.service
Killing any remaining services...
Removing all cluster configuration files...
rh71-node1: Succeeded
rh71-node2: Succeeded
Starting cluster on nodes: rh71-node1, rh71-node2...
rh71-node1: Starting Cluster...
rh71-node2: Starting Cluster...
rh71-node1: Cluster Enabled
rh71-node2: Cluster Enabled
Synchronizing pcsd certificates on nodes rh71-node1, rh71-node2. pcsd needs to be restarted on the nodes in order to reload the certificates.
[root@rh71-node1:~]# pcs status nodes corosync
Corosync Nodes:
 Online: rh71-node1 rh71-node2 
 Offline: 
[root@rh71-node1:~]# pcs status pcsd
  rh71-node1: Online
  rh71-node2: Online

Comment 18 errata-xmlrpc 2015-11-19 09:32:49 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-2290.html