Bug 1374857 - [RFE] Support of 32-node Pacemaker cluster
Summary: [RFE] Support of 32-node Pacemaker cluster
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: corosync
Version: 7.4
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: rc
: ---
Assignee: Christine Caulfield
QA Contact: cluster-qe@redhat.com
Steven J. Levine
URL:
Whiteboard:
Depends On: 1298243
Blocks: 1420851 1363902 1717098 1722048
TreeView+ depends on / blocked
 
Reported: 2016-09-09 21:07 UTC by Sam Yangsao
Modified: 2020-09-21 07:39 UTC (History)
23 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
.Maximum size of a supported RHEL HA cluster increased from 16 to 32 nodes With this release, Red Hat supports cluster deployments of up to 32 full cluster nodes.
Clone Of:
: 1717098 (view as bug list)
Environment:
Last Closed: 2019-08-06 13:10:11 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Article) 3069031 None None None 2018-08-03 13:48:14 UTC
Red Hat Product Errata RHBA-2019:2245 None None None 2019-08-06 13:10:24 UTC

Comment 6 Jan Friesse 2016-09-13 07:13:21 UTC
@Chrissie,
because you were testing this larger clusters I'm reassigning to you.

I also believe it may be good start to QE try running test suite on 32-node cluster and report results.

Comment 16 Jan Friesse 2017-01-16 12:19:41 UTC
Quite recently there was discussion on upstream list http://lists.clusterlabs.org/pipermail/users/2017-January/004764.html. It looks like corosync works just fine up to ~70 nodes, then receive buffer overfills with join messages.

So 32 nodes should be doable without changing corosync code/defaults.

Comment 39 Jan Friesse 2019-03-22 08:02:49 UTC
As tested and confirmed by Chrissie and Chris Mackowski, corosync seems to work just fine with 32-nodes as it is. So no patch is provided and this bug is used as a "test only".

Comment 40 Michal Mazourek 2019-06-12 13:10:24 UTC
A 32-node Pacemaker cluster was created without any problems.
Used generatejob2 command to create the cluster:
# /usr/local/bin/generatejob2.sh --nodes 32 -v 7 --beaker-reserve 1 --disks 1 --ip 1 setup --submit

Snippet from the TESTOUT.log:
...
[2019-06-12 14:33:55.770890] [setup] corosync + pacemaker configure on virt-051, virt-052, virt-053, virt-054, virt-055, virt-056, virt-057, virt-058, virt-059, virt-060, virt-061, virt-062, virt-063, virt-064, virt-065, virt-066, virt-067, virt-074, virt-077, virt-078, virt-079, virt-082, virt-083, virt-084, virt-085, virt-086, virt-087, virt-088, virt-089, virt-090, virt-091, virt-092
...
[2019-06-12 14:42:11.487585] [setup]  success
[2019-06-12 14:42:11.487744] [setup] Waiting for clvm lockspace on all nodes...
[2019-06-12 14:42:17.061511] [setup] Stopping and disabling lvmetad...
[2019-06-12 14:42:19.556535] <pass name="setup" id="setup" pid="19644" time="Wed Jun 12 14:42:19 2019 +0200" type="cmd" duration="521" />
[2019-06-12 14:42:19.556664] ------------------- Summary ---------------------
[2019-06-12 14:42:19.556797] Testcase                                 Result    
[2019-06-12 14:42:19.556884] --------                                 ------    
[2019-06-12 14:42:19.556968] generic_setup                            PASS      
[2019-06-12 14:42:19.557051] setup                                    PASS      
[2019-06-12 14:42:19.557131] =================================================
[2019-06-12 14:42:19.557175] Total Tests Run: 2
[2019-06-12 14:42:19.557220] Total PASS:      2
[2019-06-12 14:42:19.557264] Total FAIL:      0
[2019-06-12 14:42:19.557408] Total TIMEOUT:   0
[2019-06-12 14:42:19.557457] Total KILLED:    0
[2019-06-12 14:42:19.557503] Total STOPPED:   0

Verified for corosync-2.4.3-6.el7

Comment 43 michal novacek 2019-07-10 17:11:03 UTC

The following have been tested to work:

 - create cluster with 32 nodes and separate fencing 

 - create fifty separate Apache resources, move all of them to different node, disable them, remove them

 - recovery: kill pacemaker on fifteen nodes and watch cluster recovery

 - recovery: halt fifteen nodes and watch pacemker fence them, then wait for them to come back

Comment 48 errata-xmlrpc 2019-08-06 13:10:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2245


Note You need to log in before you can comment on or make changes to this bug.