Bug 234589 - rgmanager not working when using a quorum disk
rgmanager not working when using a quorum disk
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: rgmanager (Show other bugs)
5.0
x86_64 Linux
medium Severity medium
: ---
: ---
Assigned To: Lon Hohberger
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-03-30 08:16 EDT by Robert Hell
Modified: 2010-10-22 10:07 EDT (History)
4 users (show)

See Also:
Fixed In Version: RHBA-2007-0580
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-11-07 11:45:54 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
Cluster Configuration File (1.86 KB, text/xml)
2007-03-30 08:16 EDT, Robert Hell
no flags Details
Fix fix (469 bytes, patch)
2007-04-16 10:50 EDT, Lon Hohberger
no flags Details | Diff

  None (edit)
Description Robert Hell 2007-03-30 08:16:52 EDT
Description of problem:
If using qdiskd with a quorum disk rgmanager ist not able to start services.
Without starting qdiskd rgmanager works fine.

Version-Release number of selected component (if applicable):
RHEL 5: cman-2.0.60-1.el5, rgmanager-2.0.23-1 (x86_64)

How reproducible:
Use a quorum disk

Steps to Reproduce:
1. configure quorum disk in cluster.conf
2. start cman
3. start qdiskd
4. start rgmanager  

Actual results:
no services running, clustat hangs when starting, system-config-cluster hangs 
when starting, /var/log/messages:
Mar 30 14:09:27 pg-ba-001 clurgmgrd[20629]: <err> #34: Cannot get status for 
service service:pg-ba-vts1
Mar 30 14:09:43 pg-ba-001 clurgmgrd[20629]: <err> #34: Cannot get status for 
service service:pg-ba-vts2

Expected results:
Running services.

Additional info:
I attached my cluster.conf. Registration of quorum succeeds in cman.
Comment 1 Robert Hell 2007-03-30 08:16:52 EDT
Created attachment 151269 [details]
Cluster Configuration File
Comment 2 Scott Bachmann 2007-04-12 11:07:45 EDT
Additional info:
clurgmgrd appears to be suffering the same fate as ccs_tool in bug #223519, 
treating the quorum disk as an actual node.  When clurgmgrd first starts, it 
attempts to make contact with the quorum disk "node" to determine the status 
of the services its running.  This times out, causing an "abort":

[12453] info: State change: Local UP
[12453] info: State change: sys-b UP
[12453] info: State change: /dev/dm-3 UP #Note: Quorum Disk
...
aight, need responses from 3 guys
VF: Push 2.12453 #1 (X#00020001)
VF: Checking for consensus...
...
VF: YES
VF: YES
VF: Timed out waiting for 1 responses
VF: Broadcasting ABORT (X#00020002)
VF: Aborted!

I was able to construct a proof of concept by adding code to 
rgmanager/src/daemons/main.c:membership_update() that sets cn_member to 0 for 
the cml_members element which has a cn_nodeid of 0.  Afterwords, the resource 
manager appears to function as expected.  Additionally, clustat no longer 
hangs with a “Timed out waiting for a response from Resource Group Manager” 
message.

I hope that this information assists in leading to a proper patch, as mine was 
a rather brute force solution.
Comment 3 Lon Hohberger 2007-04-16 10:50:36 EDT
Created attachment 152699 [details]
Fix fix

Hi, this should fix it.
Comment 4 Lon Hohberger 2007-04-16 10:52:00 EDT
Actually, it sounds like exactly what you did, but in a different location. ;)
Comment 5 Robert Hell 2007-04-16 13:24:23 EDT
Thanks for that!

Will there be an official errata for this problem?
Comment 6 Lon Hohberger 2007-04-19 16:22:46 EDT
I can't confirm one way or the other at this point, but it looks like it will be
in update 1 for certain.
Comment 7 Kiersten (Kerri) Anderson 2007-04-23 13:23:54 EDT
Fixing Product Name.  Cluster Suite was integrated into the Enterprise Linux for
version 5.0.
Comment 8 RHEL Product and Program Management 2007-04-25 16:16:21 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 9 Robert Hell 2007-06-05 03:30:52 EDT
Hi!

Do you have any news for me if this fix will be in an upcoming errata or in the 
next Update for RHEL5?

Regards,
Robert
Comment 10 Lon Hohberger 2007-06-21 12:20:42 EDT
Update 1 for RHEL5 :)
Comment 12 Leo Pleiman 2007-08-06 11:41:40 EDT
lpleiman@redhat.com
Comment 14 errata-xmlrpc 2007-11-07 11:45:54 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0580.html

Note You need to log in before you can comment on or make changes to this bug.