Bug 1134426
Summary: | pcs needs a better parser for corosync.conf | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Radek Steiger <rsteiger> | ||||||||||||
Component: | pcs | Assignee: | Tomas Jelinek <tojeline> | ||||||||||||
Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | ||||||||||||
Severity: | high | Docs Contact: | |||||||||||||
Priority: | high | ||||||||||||||
Version: | 7.0 | CC: | cfeist, cluster-maint, jpokorny, tojeline | ||||||||||||
Target Milestone: | rc | ||||||||||||||
Target Release: | --- | ||||||||||||||
Hardware: | Unspecified | ||||||||||||||
OS: | Unspecified | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | pcs-0.9.140-1.el7 | Doc Type: | Bug Fix | ||||||||||||
Doc Text: |
Cause:
User edits corosync.conf configuration file manually.
Consequence:
Pcs misbehaves as it is not able to read the file properly.
Fix:
Implement full-featured parser for corosync.conf file.
Result:
Pcs is able to read a manually edited corosync.conf file properly.
|
Story Points: | --- | ||||||||||||
Clone Of: | Environment: | ||||||||||||||
Last Closed: | 2015-11-19 09:32:49 UTC | Type: | Bug | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Bug Depends On: | |||||||||||||||
Bug Blocks: | 1142126 | ||||||||||||||
Attachments: |
|
Created attachment 960932 [details]
Example patch (see the comment)
...but in fact this is just a drop in the sea, proper config format parser is needed to fix the same underlying issue once for all.
FWIW, this is also the reason I added a tunable as a possible workaround/preventive measure in the pcs-clufter interaction: https://github.com/jnpkrn/clufter/commit/84d0e6b8bab10abd3f06db8b6f13967f5a809366 Created attachment 993680 [details]
proposed fix 1/3
Created attachment 993681 [details]
proposed fix 2/3
Created attachment 993683 [details]
proposed fix 3/3
Before Fix: [root@rh71-node1 ~]# rpm -q pcs pcs-0.9.137-13.el7_1.2.x86_64 [root@rh71-node1:~]# pcs cluster setup --name 'ring0_addr: blabla' rh71-node1 rh71-node2 --start --enable Shutting down pacemaker/corosync services... Redirecting to /bin/systemctl stop pacemaker.service Redirecting to /bin/systemctl stop corosync.service Killing any remaining services... Removing all cluster configuration files... rh71-node1: Succeeded rh71-node2: Succeeded Starting cluster on nodes: rh71-node1, rh71-node2... rh71-node1: Starting Cluster... rh71-node2: Starting Cluster... rh71-node1: Cluster Enabled rh71-node2: Cluster Enabled [root@rh71-node1:~]# pcs status nodes corosync Corosync Nodes: Online: rh71-node1 rh71-node2 Offline: blabla [root@rh71-node1:~]# pcs status pcsd blabla: Offline rh71-node1: Online rh71-node2: Online After Fix: [root@rh71-node1:~]# rpm -q pcs pcs-0.9.140-1.el6.x86_64 [root@rh71-node1:~]# pcs cluster setup --name 'ring0_addr: blabla' rh71-node1 rh71-node2 --start --enable Shutting down pacemaker/corosync services... Redirecting to /bin/systemctl stop pacemaker.service Redirecting to /bin/systemctl stop corosync.service Killing any remaining services... Removing all cluster configuration files... rh71-node1: Succeeded rh71-node2: Succeeded Starting cluster on nodes: rh71-node1, rh71-node2... rh71-node1: Starting Cluster... rh71-node2: Starting Cluster... rh71-node1: Cluster Enabled rh71-node2: Cluster Enabled Synchronizing pcsd certificates on nodes rh71-node1, rh71-node2. pcsd needs to be restarted on the nodes in order to reload the certificates. [root@rh71-node1:~]# pcs status nodes corosync Corosync Nodes: Online: rh71-node1 rh71-node2 Offline: [root@rh71-node1:~]# pcs status pcsd rh71-node1: Online rh71-node2: Online Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-2290.html |
Created attachment 931475 [details] corosync.conf Description of problem: (credits go to Honza Friesse and Tomas Jelinek) The internal parser for getting nodes from corosync.conf uses following code for grepping node names: preg = re.compile(r'.*ring0_addr: (.*)') for line in lines: match = preg.match(line) if match: nodes.append (match.group(1)) This basically greps _anything_ containing "ring0_addr:" and can lead to some crazy results. As an example I created a cluster with "ring0_addr:" in cluster name: [root@virt-041 ~]# pcs cluster setup --name "ring0_addr: blabla" virt-041.cluster-qe.lab.eng.brq.redhat.com virt-042.cluster-qe.lab.eng.brq.redhat.com --start --enable Shutting down pacemaker/corosync services... Redirecting to /bin/systemctl stop pacemaker.service Redirecting to /bin/systemctl stop corosync.service Killing any remaining services... Removing all cluster configuration files... virt-041.cluster-qe.lab.eng.brq.redhat.com: Succeeded virt-041.cluster-qe.lab.eng.brq.redhat.com: Starting Cluster... virt-042.cluster-qe.lab.eng.brq.redhat.com: Succeeded virt-042.cluster-qe.lab.eng.brq.redhat.com: Starting Cluster... virt-041.cluster-qe.lab.eng.brq.redhat.com: Cluster Enabled virt-042.cluster-qe.lab.eng.brq.redhat.com: Cluster Enabled The getNodesFromCorosyncConf() function then results in following situations: [root@virt-041 ~]# pcs status nodes corosync Corosync Nodes: Online: virt-041.cluster-qe.lab.eng.brq.redhat.com virt-042.cluster-qe.lab.eng.brq.redhat.com Offline: blabla [root@virt-041 ~]# pcs status pcsd blabla: Offline virt-041.cluster-qe.lab.eng.brq.redhat.com: Online virt-042.cluster-qe.lab.eng.brq.redhat.com: Online The related sections from corosync.conf: [root@virt-041 ~]# grep ring0 /etc/corosync/corosync.conf cluster_name: ring0_addr: blabla ring0_addr: virt-041.cluster-qe.lab.eng.brq.redhat.com ring0_addr: virt-042.cluster-qe.lab.eng.brq.redhat.com Version-Release number of selected component (if applicable): pcs-0.9.115-32.el7