Bug 1366039
| Summary: | Coordinate Quorum algorithm descriptions with related man pages | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Steven J. Levine <slevine> |
| Component: | doc-High_Availability_Add-On_Reference | Assignee: | Steven J. Levine <slevine> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | ecs-bugs |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 7.0 | CC: | ccaulfie, cluster-maint, djuran, ecs-bugs, jfriesse, jkortus, mspqa-list, rhel-docs, rmccabe, slevine, tojeline |
| Target Milestone: | rc | Keywords: | Documentation |
| Target Release: | 7.3 | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Enhancement | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 1129724 | Environment: | |
| Last Closed: | 2016-11-23 23:01:59 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 614122, 1129724 | ||
| Bug Blocks: | |||
|
Comment 3
Jan Friesse
2016-08-22 09:27:53 UTC
Moving to 7.4 for now. @Steven, I'm ok with 7.4 but I really have to stress importance of this bug. It is really really important to: - Explain differences between lms and ffsplit - Change default "examples" to refer ffsplit Otherwise people who are just going to copy&paste from documentation (what is probably majority) may find itself in big trouble. Our QE are first good example. Basically with example (configure LMS) you've chosen you nullified default in corosync (FFsplit and for VERY good reason). @Jan: It doesn't actually have to wait until 7.4 GA -- I can update the docs on the Portal at any time. But 7.3 doc builds are happening this week and this won't make those builds. I can put this in my "immediate backlog" list for post-7.3-GA and move it up in my queue. Steven @Steven that sounds great. Just let me know what information you need to know so I can provide them to you. Honza: I have updated the latest draft of the 7.3 Pacemaker manual with this info -- it wasn't a lot all told. The new draft is here, as chapter 10: http://jenkinscat.gsslab.pnq.redhat.com:8080/job/doc-Red_Hat_Enterprise_Linux-7-High_Availability_Add-On_Reference%20%28html-single%29/lastSuccessfulBuild/artifact/tmp/en-US/html-single/index.html#ch-Quorum-HAAR In table 10.1 I added the info that auto_tie_breaker and last_man_standing are not compatible with quorum devices, even though pcs takes care of that implementation. But it seemed as if it wouldn't hurt to note this. In Section 10.5.2 I added two sub-bullets under the description of the net quorum device model, simple overviews of ffsplit and lms with a pointer to the corosync-qdevice man page for the details. I changed the algorithm for the quorum device add command to ffsplit (in the command in step 2 and in the pcs quorum config output in step 3). In section 10.5.4.1, changing the device settings, I changed the example so that it now changes the algorithm to lms (since the device we created now has the algorithm of ffsplit). I think that's it. I can push this to the Portal at any time. Is there anything that still seems wrong or incomplete to you? @Steven: I'm commenting just Qdevice section. - Example below "The pcs quorum device status shows the quorum device runtime status." contains "Algorithm: LMS". It may be confusing because previous example documented ffsplit. Same applies also to next example (below "From the quorum device side, you can execute the following status command, which shows the status of the corosync-qnetd daemon. ") - Description of ffsplit/lms is move forward but I'm missing one extremely important piece of information. LMS means that qdevice has (number_of_nodes - 1) votes. It's nothing bad (actually it's how LMS works) but I would like to see some small warning about it. Let me explain what I mean (and what may be not evident and means little more thinking) and what I would like to see in "warning" box. LMS allows cluster to remain quorate even with only one remaining node, but it also means qdevice is VERY strong (it's voting power is same as number_of_nodes - 1). This also means loosing connection with qnetd means loosing number_of_nodes - 1 votes what means that only cluster with all nodes active can remain quorate (over vote qdevice), any other cluster becomes unquorate. Honza: I have fixed the examples so that they show an ffsplit algorithm. I have added this warning following the description of the lms algorithm: Warning The LMS algorithm allows the cluster to remain quorate even with only one remaining node, but it also means that the voting power of the quorum device is great since it is the same as number_of_nodes - 1. Losing connection with the quorum device means losing number_of_nodes - 1 votes, which means that only a cluster with all nodes active can remain quorate (by overvoting the quorum device); any other cluster becomes inquorate. You can see this here, in section 10.5.2: http://jenkinscat.gsslab.pnq.redhat.com:8080/job/doc-Red_Hat_Enterprise_Linux-7-High_Availability_Add-On_Reference%20%28html-single%29/lastSuccessfulBuild/artifact/tmp/en-US/html-single/index.html#s1-quorumdev-HAAR Is that looking any better? Steven @Steven: Yes, looks way better and it's exactly what I was asking for, so thanks. Fixes in latest draft which I am preparing for publication. When this build appears on docs-dev I will tag it for publication. Red_Hat_Enterprise_Linux-High_Availability_Add-On_Reference-7-web-en-US-3.1-6.el6eng These updates are now on the Portal. https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Reference/s1-quorumdev-HAAR.html |