Bug 1366039

Summary: Coordinate Quorum algorithm descriptions with related man pages
Product: Red Hat Enterprise Linux 7 Reporter: Steven J. Levine <slevine>
Component: doc-High_Availability_Add-On_ReferenceAssignee: Steven J. Levine <slevine>
Status: CLOSED CURRENTRELEASE QA Contact: ecs-bugs
Severity: high Docs Contact:
Priority: high    
Version: 7.0CC: ccaulfie, cluster-maint, djuran, ecs-bugs, jfriesse, jkortus, mspqa-list, rhel-docs, rmccabe, slevine, tojeline
Target Milestone: rcKeywords: Documentation
Target Release: 7.3   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: 1129724 Environment:
Last Closed: 2016-11-23 23:01:59 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 614122, 1129724    
Bug Blocks:    

Comment 3 Jan Friesse 2016-08-22 09:27:53 UTC
Steven,
it may make sense to change examples to refer to ffsplit rather than lms. LMS has it's big strengths but also some weaknesses and generally, ffsplit is "safer" and probably easier to understand than lms (which can gives to some of not so well informed user false hopes).

Comment 4 Steven J. Levine 2016-10-12 20:47:25 UTC
Moving to 7.4 for now.

Comment 5 Jan Friesse 2016-10-13 07:15:21 UTC
@Steven,
I'm ok with 7.4 but I really have to stress importance of this bug.

It is really really important to:
- Explain differences between lms and ffsplit
- Change default "examples" to refer ffsplit

Otherwise people who are just going to copy&paste from documentation (what is probably majority) may find itself in big trouble. Our QE are first good example. Basically with example (configure LMS) you've chosen you nullified default in corosync (FFsplit and for VERY good reason).

Comment 6 Steven J. Levine 2016-10-13 12:48:23 UTC
@Jan:

It doesn't actually have to wait until 7.4 GA -- I can update the docs on the Portal at any time.  But 7.3 doc builds are happening this week and this won't make those builds.  I can put this in my "immediate backlog" list for post-7.3-GA and move it up in my queue.

Steven

Comment 7 Jan Friesse 2016-10-13 15:00:10 UTC
@Steven

that sounds great. Just let me know what information you need to know so I can provide them to you.

Comment 8 Steven J. Levine 2016-11-11 20:12:15 UTC
Honza:

I have updated the latest draft of the 7.3 Pacemaker manual with this info -- it wasn't a lot all told. The new draft is here, as chapter 10:

http://jenkinscat.gsslab.pnq.redhat.com:8080/job/doc-Red_Hat_Enterprise_Linux-7-High_Availability_Add-On_Reference%20%28html-single%29/lastSuccessfulBuild/artifact/tmp/en-US/html-single/index.html#ch-Quorum-HAAR

In table 10.1 I added the info that auto_tie_breaker and last_man_standing are not compatible with quorum devices, even though pcs takes care of that implementation. But it seemed as if it wouldn't hurt to note this.

In Section 10.5.2 I added two sub-bullets under the description of the net quorum device model, simple overviews of ffsplit and lms with a pointer to the corosync-qdevice man page for the details.

I changed the algorithm for the quorum device add command to ffsplit (in the command in step 2 and in the pcs quorum config output in step 3).

In section 10.5.4.1, changing the device settings, I changed the example so that it now changes the algorithm to lms (since the device we created now has the algorithm of ffsplit).

I think that's it.

I can push this to the Portal at any time. Is there anything that still seems wrong or incomplete to you?

Comment 9 Jan Friesse 2016-11-21 16:44:52 UTC
@Steven:
I'm commenting just Qdevice section.

- Example below "The pcs quorum device status shows the quorum device runtime status." contains "Algorithm:              LMS". It may be confusing because previous example documented ffsplit. Same applies also to next example (below "From the quorum device side, you can execute the following status command, which shows the status of the corosync-qnetd daemon. ")

- Description of ffsplit/lms is move forward but I'm missing one extremely important piece of information. LMS means that qdevice has (number_of_nodes - 1) votes. It's nothing bad (actually it's how LMS works) but I would like to see some small warning about it. Let me explain what I mean (and what may be not evident and means little more thinking) and what I would like to see in "warning" box.

LMS allows cluster to remain quorate even with only one remaining node, but it also means qdevice is VERY strong (it's voting power is same as number_of_nodes - 1). This also means loosing connection with qnetd means loosing number_of_nodes - 1 votes what means that only cluster with all nodes active can remain quorate (over vote qdevice), any other cluster becomes unquorate.

Comment 10 Steven J. Levine 2016-11-22 20:27:47 UTC
Honza:

I have fixed the examples so that they show an ffsplit algorithm.

I have added this warning following the description of the lms algorithm:

Warning
The LMS algorithm allows the cluster to remain quorate even with only one remaining node, but it also means that the voting power of the quorum device is great since it is the same as number_of_nodes - 1. Losing connection with the quorum device means losing number_of_nodes - 1 votes, which means that only a cluster with all nodes active can remain quorate (by overvoting the quorum device); any other cluster becomes inquorate. 

You can see this here, in section 10.5.2:

http://jenkinscat.gsslab.pnq.redhat.com:8080/job/doc-Red_Hat_Enterprise_Linux-7-High_Availability_Add-On_Reference%20%28html-single%29/lastSuccessfulBuild/artifact/tmp/en-US/html-single/index.html#s1-quorumdev-HAAR

Is that looking any better?

Steven

Comment 11 Jan Friesse 2016-11-23 08:31:08 UTC
@Steven:
Yes, looks way better and it's exactly what I was asking for, so thanks.

Comment 12 Steven J. Levine 2016-11-23 18:06:05 UTC
Fixes in latest draft which I am preparing for publication. When this build appears on docs-dev I will tag it for publication.

Red_Hat_Enterprise_Linux-High_Availability_Add-On_Reference-7-web-en-US-3.1-6.el6eng