RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 620679 - qdiskd should stop voting if no <quorumd config is available
Summary: qdiskd should stop voting if no <quorumd config is available
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: cluster
Version: 6.0
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Lon Hohberger
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On: 615926
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-08-03 09:14 UTC by Fabio Massimo Di Nitto
Modified: 2016-04-26 16:40 UTC (History)
8 users (show)

Fixed In Version: cluster-3.0.12-27.el6
Doc Type: Bug Fix
Doc Text:
Clone Of: 615926
Environment:
Last Closed: 2011-05-19 12:53:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Fix (1.99 KB, patch)
2010-08-03 16:39 UTC, Lon Hohberger
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:0537 0 normal SHIPPED_LIVE cluster and gfs2-utils bug fix update 2011-05-18 17:57:40 UTC

Comment 1 Fabio Massimo Di Nitto 2010-08-03 09:16:33 UTC
When testing bz 615926 I also tested the other direction:

2 nodes cluster with qdiskd running

remove qdiskd from the configuration

qdiskd daemon is not killed in this case, configuration change is dispatched, but qdiskd keeps happily voting:

<?xml version="1.0"?>
<cluster config_version="3" name="rhel6">
        <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="rhel6-node1" nodeid="1" votes="1">
                        <fence/>
                </clusternode>
                <clusternode name="rhel6-node2" nodeid="2" votes="1">
                        <fence/>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices/>
        <rm>
                <failoverdomains/>
                <resources/>
        </rm>
</cluster>

[root@rhel6-node2 cluster]# ps ax|grep qdiskd
 2794 ?        SLsl   0:01 qdiskd -Q

[root@rhel6-node2 cluster]# cman_tool status
Version: 6.2.0
Config Version: 3
Cluster Name: rhel6
Cluster Id: 60348
Cluster Member: Yes
Cluster Generation: 8
Membership state: Cluster-Member
Nodes: 2
Expected votes: 1
Quorum device votes: 1
Total votes: 3
Node votes: 1
Quorum: 1  
Active subsystems: 11
Flags: 2node 
Ports Bound: 0 11 177  
Node name: rhel6-node2
Node ID: 2
Multicast addresses: 239.192.235.168 
Node addresses: 192.168.2.66 

[root@rhel6-node2 cluster]# cman_tool nodes
Node  Sts   Inc   Joined               Name
   0   M      0   2010-08-03 11:09:36  /dev/block/252:17
   1   M      8   2010-08-03 11:07:25  rhel6-node1
   2   M      4   2010-08-03 11:07:25  rhel6-node2

expected behavior from qdiskd is to go idle if not configured in cluster.conf

Comment 4 RHEL Program Management 2010-08-03 09:47:49 UTC
This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **

Comment 5 Lon Hohberger 2010-08-03 14:02:03 UTC
This requires careful consideration.

The reconfiguration order is important.  Consider the case of a 4 node cluster + qdiskd.

Expected votes is 7

If you remove qdiskd from cluster.conf, cman will recalc expected votes to 4.  Then, before qdiskd processes the config change, it calls cman_poll_quorum_device().  This bumps expected votes back to 7.

Then qdiskd exits.

For now, it's much safer to:

(a) ensure all nodes are in the cluster
(b) kill qdiskd with SIGTERM on all nodes
(c) remove qdiskd from cluster.conf

Comment 6 Fabio Massimo Di Nitto 2010-08-03 14:53:47 UTC
so I understand all the issues described above.

In the specific case (a) and (c) are already true. qdiskd is already gone from cluster.conf and all nodes are in the cluster and active.

we don't have a way to tell qdiskd to die.

Doesn't cman_poll_quorum_device() recalculate every time based on qdiskd votes? if so, votes from qdiskd would go down to 0 (no config?no votes ;)) and expected votes recalculated.

Comment 7 Lon Hohberger 2010-08-03 16:38:30 UTC
No, cman_poll_quorum_device does not recalculate; you have to tell cman to drop the votes.

I was already working on a patch.  What it does is:

 - if previously configured and device & label are no longer present:
   - print a log message
   - reregister with 0 votes (causes recalculate_quorum())
   - clean shutdown e.g.:
     - write logout message to quorum disk
     - cman_unregister_quorum_device()

Aug  3 12:36:30 crackle modcluster: Updating cluster.conf
Aug  3 12:36:32 crackle corosync[1262]:   [QUORUM] Members[2]: 1 2
Aug  3 12:36:32 crackle corosync[1262]:   [CMAN  ] quorum device re-registered
Aug  3 12:36:32 crackle qdiskd[15384]: Quorum device removed from the configuration.  Shutting down.
Aug  3 12:36:43 crackle corosync[1262]:   [CMAN  ] lost contact with quorum device
Aug  3 12:36:43 crackle corosync[1262]:   [QUORUM] Members[2]: 1 2

Note however that because qdiskd was a member previously that it will still appear in both 'clustat' and 'cman_tool nodes' output.

[root@crackle ~]# cman_tool status
Version: 6.2.0
Config Version: 25
Cluster Name: cereal
Cluster Id: 27600
Cluster Member: Yes
Cluster Generation: 1248
Membership state: Cluster-Member
Nodes: 2
Expected votes: 2
Total votes: 2
Node votes: 1
Quorum: 2  
Active subsystems: 7
Flags: 
Ports Bound: 0  
Node name: crackle
Node ID: 2
Multicast addresses: 239.192.107.60 
Node addresses: 192.168.122.21 

(I used a two node cluster to illustrate that the fix works - if it didn't, expected votes would be 3 still).

Comment 8 Lon Hohberger 2010-08-03 16:39:13 UTC
Created attachment 436324 [details]
Fix

Patch not applied to any branches at this point.

Comment 9 Lon Hohberger 2010-08-03 16:57:25 UTC
http://git.fedorahosted.org/git?p=cluster.git;a=commit;h=e118d34dce64325a93c92833b1e074fbabb1a516

Updated patch posted to upstream STABLE3 branch.

Comment 10 Lon Hohberger 2010-08-03 16:58:52 UTC
Logs from updated patch:

Aug  3 12:47:56 snap modcluster: Updating cluster.conf
Aug  3 12:47:57 snap corosync[3446]:   [QUORUM] Members[2]: 1 2
Aug  3 12:47:57 snap corosync[3446]:   [CMAN  ] quorum device re-registered
Aug  3 12:47:57 snap corosync[3446]:   [QUORUM] Members[2]: 1 2
Aug  3 12:47:57 snap qdiskd[5751]: Quorum device removed from the configuration.  Shutting down.
Aug  3 12:47:57 snap qdiskd[5751]: Unregistering quorum device.
Aug  3 12:48:10 snap corosync[3446]:   [CMAN  ] lost contact with quorum device
Aug  3 12:48:10 snap corosync[3446]:   [QUORUM] Members[2]: 1 2

Comment 13 Fabio Massimo Di Nitto 2010-11-22 18:11:57 UTC
devel_ack, we already have the fix

Comment 17 errata-xmlrpc 2011-05-19 12:53:27 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0537.html


Note You need to log in before you can comment on or make changes to this bug.