Bug 234589 - rgmanager not working when using a quorum disk
Summary: rgmanager not working when using a quorum disk
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: rgmanager
Version: 5.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Lon Hohberger
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-03-30 12:16 UTC by Robert Hell
Modified: 2018-10-19 23:14 UTC (History)
4 users (show)

Fixed In Version: RHBA-2007-0580
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-11-07 16:45:54 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Cluster Configuration File (1.86 KB, text/xml)
2007-03-30 12:16 UTC, Robert Hell
no flags Details
Fix fix (469 bytes, patch)
2007-04-16 14:50 UTC, Lon Hohberger
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2007:0580 0 normal SHIPPED_LIVE rgmanager bug fix and enhancement update 2007-10-30 15:37:24 UTC

Description Robert Hell 2007-03-30 12:16:52 UTC
Description of problem:
If using qdiskd with a quorum disk rgmanager ist not able to start services.
Without starting qdiskd rgmanager works fine.

Version-Release number of selected component (if applicable):
RHEL 5: cman-2.0.60-1.el5, rgmanager-2.0.23-1 (x86_64)

How reproducible:
Use a quorum disk

Steps to Reproduce:
1. configure quorum disk in cluster.conf
2. start cman
3. start qdiskd
4. start rgmanager  

Actual results:
no services running, clustat hangs when starting, system-config-cluster hangs 
when starting, /var/log/messages:
Mar 30 14:09:27 pg-ba-001 clurgmgrd[20629]: <err> #34: Cannot get status for 
service service:pg-ba-vts1
Mar 30 14:09:43 pg-ba-001 clurgmgrd[20629]: <err> #34: Cannot get status for 
service service:pg-ba-vts2

Expected results:
Running services.

Additional info:
I attached my cluster.conf. Registration of quorum succeeds in cman.

Comment 1 Robert Hell 2007-03-30 12:16:52 UTC
Created attachment 151269 [details]
Cluster Configuration File

Comment 2 Scott Bachmann 2007-04-12 15:07:45 UTC
Additional info:
clurgmgrd appears to be suffering the same fate as ccs_tool in bug #223519, 
treating the quorum disk as an actual node.  When clurgmgrd first starts, it 
attempts to make contact with the quorum disk "node" to determine the status 
of the services its running.  This times out, causing an "abort":

[12453] info: State change: Local UP
[12453] info: State change: sys-b UP
[12453] info: State change: /dev/dm-3 UP #Note: Quorum Disk
...
aight, need responses from 3 guys
VF: Push 2.12453 #1 (X#00020001)
VF: Checking for consensus...
...
VF: YES
VF: YES
VF: Timed out waiting for 1 responses
VF: Broadcasting ABORT (X#00020002)
VF: Aborted!

I was able to construct a proof of concept by adding code to 
rgmanager/src/daemons/main.c:membership_update() that sets cn_member to 0 for 
the cml_members element which has a cn_nodeid of 0.  Afterwords, the resource 
manager appears to function as expected.  Additionally, clustat no longer 
hangs with a “Timed out waiting for a response from Resource Group Manager” 
message.

I hope that this information assists in leading to a proper patch, as mine was 
a rather brute force solution.


Comment 3 Lon Hohberger 2007-04-16 14:50:36 UTC
Created attachment 152699 [details]
Fix fix

Hi, this should fix it.

Comment 4 Lon Hohberger 2007-04-16 14:52:00 UTC
Actually, it sounds like exactly what you did, but in a different location. ;)

Comment 5 Robert Hell 2007-04-16 17:24:23 UTC
Thanks for that!

Will there be an official errata for this problem?

Comment 6 Lon Hohberger 2007-04-19 20:22:46 UTC
I can't confirm one way or the other at this point, but it looks like it will be
in update 1 for certain.

Comment 7 Kiersten (Kerri) Anderson 2007-04-23 17:23:54 UTC
Fixing Product Name.  Cluster Suite was integrated into the Enterprise Linux for
version 5.0.

Comment 8 RHEL Program Management 2007-04-25 20:16:21 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 9 Robert Hell 2007-06-05 07:30:52 UTC
Hi!

Do you have any news for me if this fix will be in an upcoming errata or in the 
next Update for RHEL5?

Regards,
Robert

Comment 10 Lon Hohberger 2007-06-21 16:20:42 UTC
Update 1 for RHEL5 :)

Comment 12 Leo Pleiman 2007-08-06 15:41:40 UTC
lpleiman

Comment 14 errata-xmlrpc 2007-11-07 16:45:54 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0580.html



Note You need to log in before you can comment on or make changes to this bug.