1366039 – Coordinate Quorum algorithm descriptions with related man pages

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1366039 - Coordinate Quorum algorithm descriptions with related man pages

Summary: Coordinate Quorum algorithm descriptions with related man pages

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	doc-High_Availability_Add-On_Reference
Sub Component:
Version:	7.0
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	7.3
Assignee:	Steven J. Levine
QA Contact:	ecs-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:	614122 1129724
Blocks:
TreeView+	depends on / blocked

Reported:	2016-08-10 20:32 UTC by Steven J. Levine
Modified:	2020-04-15 14:36 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	Enhancement
Doc Text:
Clone Of:	1129724
Environment:
Last Closed:	2016-11-23 23:01:59 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Comment 3 Jan Friesse 2016-08-22 09:27:53 UTC

Steven,
it may make sense to change examples to refer to ffsplit rather than lms. LMS has it's big strengths but also some weaknesses and generally, ffsplit is "safer" and probably easier to understand than lms (which can gives to some of not so well informed user false hopes).

Comment 4 Steven J. Levine 2016-10-12 20:47:25 UTC

Moving to 7.4 for now.

Comment 5 Jan Friesse 2016-10-13 07:15:21 UTC

@Steven,
I'm ok with 7.4 but I really have to stress importance of this bug.

It is really really important to:
- Explain differences between lms and ffsplit
- Change default "examples" to refer ffsplit

Otherwise people who are just going to copy&paste from documentation (what is probably majority) may find itself in big trouble. Our QE are first good example. Basically with example (configure LMS) you've chosen you nullified default in corosync (FFsplit and for VERY good reason).

Comment 6 Steven J. Levine 2016-10-13 12:48:23 UTC

@Jan:

It doesn't actually have to wait until 7.4 GA -- I can update the docs on the Portal at any time.  But 7.3 doc builds are happening this week and this won't make those builds.  I can put this in my "immediate backlog" list for post-7.3-GA and move it up in my queue.

Steven

Comment 7 Jan Friesse 2016-10-13 15:00:10 UTC

@Steven

that sounds great. Just let me know what information you need to know so I can provide them to you.

Comment 8 Steven J. Levine 2016-11-11 20:12:15 UTC

Honza:

I have updated the latest draft of the 7.3 Pacemaker manual with this info -- it wasn't a lot all told. The new draft is here, as chapter 10:

http://jenkinscat.gsslab.pnq.redhat.com:8080/job/doc-Red_Hat_Enterprise_Linux-7-High_Availability_Add-On_Reference%20%28html-single%29/lastSuccessfulBuild/artifact/tmp/en-US/html-single/index.html#ch-Quorum-HAAR

In table 10.1 I added the info that auto_tie_breaker and last_man_standing are not compatible with quorum devices, even though pcs takes care of that implementation. But it seemed as if it wouldn't hurt to note this.

In Section 10.5.2 I added two sub-bullets under the description of the net quorum device model, simple overviews of ffsplit and lms with a pointer to the corosync-qdevice man page for the details.

I changed the algorithm for the quorum device add command to ffsplit (in the command in step 2 and in the pcs quorum config output in step 3).

In section 10.5.4.1, changing the device settings, I changed the example so that it now changes the algorithm to lms (since the device we created now has the algorithm of ffsplit).

I think that's it.

I can push this to the Portal at any time. Is there anything that still seems wrong or incomplete to you?

Comment 9 Jan Friesse 2016-11-21 16:44:52 UTC

@Steven:
I'm commenting just Qdevice section.

- Example below "The pcs quorum device status shows the quorum device runtime status." contains "Algorithm:              LMS". It may be confusing because previous example documented ffsplit. Same applies also to next example (below "From the quorum device side, you can execute the following status command, which shows the status of the corosync-qnetd daemon. ")

- Description of ffsplit/lms is move forward but I'm missing one extremely important piece of information. LMS means that qdevice has (number_of_nodes - 1) votes. It's nothing bad (actually it's how LMS works) but I would like to see some small warning about it. Let me explain what I mean (and what may be not evident and means little more thinking) and what I would like to see in "warning" box.

LMS allows cluster to remain quorate even with only one remaining node, but it also means qdevice is VERY strong (it's voting power is same as number_of_nodes - 1). This also means loosing connection with qnetd means loosing number_of_nodes - 1 votes what means that only cluster with all nodes active can remain quorate (over vote qdevice), any other cluster becomes unquorate.

Comment 10 Steven J. Levine 2016-11-22 20:27:47 UTC

Honza:

I have fixed the examples so that they show an ffsplit algorithm.

I have added this warning following the description of the lms algorithm:

Warning
The LMS algorithm allows the cluster to remain quorate even with only one remaining node, but it also means that the voting power of the quorum device is great since it is the same as number_of_nodes - 1. Losing connection with the quorum device means losing number_of_nodes - 1 votes, which means that only a cluster with all nodes active can remain quorate (by overvoting the quorum device); any other cluster becomes inquorate. 

You can see this here, in section 10.5.2:

http://jenkinscat.gsslab.pnq.redhat.com:8080/job/doc-Red_Hat_Enterprise_Linux-7-High_Availability_Add-On_Reference%20%28html-single%29/lastSuccessfulBuild/artifact/tmp/en-US/html-single/index.html#s1-quorumdev-HAAR

Is that looking any better?

Steven

Comment 11 Jan Friesse 2016-11-23 08:31:08 UTC

@Steven:
Yes, looks way better and it's exactly what I was asking for, so thanks.

Comment 12 Steven J. Levine 2016-11-23 18:06:05 UTC

Fixes in latest draft which I am preparing for publication. When this build appears on docs-dev I will tag it for publication.

Red_Hat_Enterprise_Linux-High_Availability_Add-On_Reference-7-web-en-US-3.1-6.el6eng

Comment 13 Steven J. Levine 2016-11-23 23:01:59 UTC

These updates are now on the Portal.

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Reference/s1-quorumdev-HAAR.html

Note You need to log in before you can comment on or make changes to this bug.