Bug 1374857
Summary: | [RFE] Support of 32-node Pacemaker cluster | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Sam Yangsao <syangsao> | |
Component: | corosync | Assignee: | Christine Caulfield <ccaulfie> | |
Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | |
Severity: | high | Docs Contact: | Steven J. Levine <slevine> | |
Priority: | medium | |||
Version: | 7.4 | CC: | aarnold, aherr, ccaulfie, cfeist, cluster-maint, cluster-qe, cmackows, djansa, dwood, fdanapfe, fdinitto, idevat, jfriesse, kgaillot, mmazoure, mnovacek, omular, royoung, sbradley, slevine, syangsao, syu, tojeline | |
Target Milestone: | rc | Keywords: | FutureFeature | |
Target Release: | --- | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Enhancement | ||
Doc Text: |
.Maximum size of a supported RHEL HA cluster increased from 16 to 32 nodes
With this release, Red Hat supports cluster deployments of up to 32 full cluster nodes.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1717098 (view as bug list) | Environment: | ||
Last Closed: | 2019-08-06 13:10:11 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1298243 | |||
Bug Blocks: | 1363902, 1420851, 1717098, 1722048 |
Comment 6
Jan Friesse
2016-09-13 07:13:21 UTC
Quite recently there was discussion on upstream list http://lists.clusterlabs.org/pipermail/users/2017-January/004764.html. It looks like corosync works just fine up to ~70 nodes, then receive buffer overfills with join messages. So 32 nodes should be doable without changing corosync code/defaults. As tested and confirmed by Chrissie and Chris Mackowski, corosync seems to work just fine with 32-nodes as it is. So no patch is provided and this bug is used as a "test only". A 32-node Pacemaker cluster was created without any problems. Used generatejob2 command to create the cluster: # /usr/local/bin/generatejob2.sh --nodes 32 -v 7 --beaker-reserve 1 --disks 1 --ip 1 setup --submit Snippet from the TESTOUT.log: ... [2019-06-12 14:33:55.770890] [setup] corosync + pacemaker configure on virt-051, virt-052, virt-053, virt-054, virt-055, virt-056, virt-057, virt-058, virt-059, virt-060, virt-061, virt-062, virt-063, virt-064, virt-065, virt-066, virt-067, virt-074, virt-077, virt-078, virt-079, virt-082, virt-083, virt-084, virt-085, virt-086, virt-087, virt-088, virt-089, virt-090, virt-091, virt-092 ... [2019-06-12 14:42:11.487585] [setup] success [2019-06-12 14:42:11.487744] [setup] Waiting for clvm lockspace on all nodes... [2019-06-12 14:42:17.061511] [setup] Stopping and disabling lvmetad... [2019-06-12 14:42:19.556535] <pass name="setup" id="setup" pid="19644" time="Wed Jun 12 14:42:19 2019 +0200" type="cmd" duration="521" /> [2019-06-12 14:42:19.556664] ------------------- Summary --------------------- [2019-06-12 14:42:19.556797] Testcase Result [2019-06-12 14:42:19.556884] -------- ------ [2019-06-12 14:42:19.556968] generic_setup PASS [2019-06-12 14:42:19.557051] setup PASS [2019-06-12 14:42:19.557131] ================================================= [2019-06-12 14:42:19.557175] Total Tests Run: 2 [2019-06-12 14:42:19.557220] Total PASS: 2 [2019-06-12 14:42:19.557264] Total FAIL: 0 [2019-06-12 14:42:19.557408] Total TIMEOUT: 0 [2019-06-12 14:42:19.557457] Total KILLED: 0 [2019-06-12 14:42:19.557503] Total STOPPED: 0 Verified for corosync-2.4.3-6.el7 The following have been tested to work: - create cluster with 32 nodes and separate fencing - create fifty separate Apache resources, move all of them to different node, disable them, remove them - recovery: kill pacemaker on fifteen nodes and watch cluster recovery - recovery: halt fifteen nodes and watch pacemker fence them, then wait for them to come back Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2245 |