Bug 689466

Summary: Issue with RGManager and CLVMD
Product: Red Hat Enterprise Linux 5 Reporter: rauch
Component: lvm2-clusterAssignee: LVM and device-mapper development team <lvm-team>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Cluster QE <mspqa-list>
Severity: urgent Docs Contact:
Priority: medium    
Version: 5.6CC: agk, ccaulfie, cluster-maint, dwysocha, edamato, heinzm, jbrassow, jwest, mbroz, nkim, onong.tayeng, pep, prajnoha, prockai, rdassen, stroetgen, thornber, zkabelac
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-02-10 18:21:13 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description rauch 2011-03-21 15:49:07 UTC
Description of problem:

We have the following issue on a two node cluster system (node1 + node2) with qdisk.

The RGManager command clustat does not show the cluster services any more on node1. The RGManager status was also not listed for both cluster members by clustat.

On cluster member node2, the command clustat shows the cluster services and the RGManager for both nodes as running.

The status logging of the RGManager stops on the cluster members 2 days ago without errors.

See below the command clustat from both cluster nodes:


node1 ~ # clustat
Service states unavailable: Temporary failure; try again
Cluster Status for CL_XY @ Fri Mar 18 08:51:21 2011
Member Status: Quorate

Member Name	        ID		Status
------ ----				---- ------
node1			1		Online, Local
node2			2		Online
/dev/dm-11		0		Online, Quorum Disk


node2 ~ # clustat
Cluster Status for CL_XY @ Fri Mar 18 08:53:33 2011
Member Status: Quorate

Member Name	        ID	        Status
------ ----			        ---- ------
node1		        1	        Online, RG-Worker
node2		        2	        Online, Local, RG-Master
/dev/dm-11		0	        Online, Quorum Disk

Service Name		Owner (Last)		State 
------- ----		----- ------		----- 
service:1		node1			started
service:2		node1			started 
service:3		node2			started
service:4		node2			started

Furthermore the LVM command does not successfully execute. The commands just hang, but it was possible to kill the processes by pressing ctrl+c.

The exact same issue happend a month ago on this system with RHEL5.5. In the meantime the cluster was updated to RHEL5.6. Debugging for RGManager and cman is now activated in the /etc/cluster/cluster.conf file.


Version-Release number of selected component (if applicable):

kernel-2.6.18-238.5.1.el5.x86_64
cman-2.0.115-68.el5_6.1.x86_64
rgmanager-2.0.52-9.el5.x86_64

How reproducible:
not reproducible

Steps to Reproduce:
1.
2.
3.
  
Actual results:
RGManager does stop writing logs.
Command clustat is not working for 1 node.
LVM commands stop working.

Expected results:


Additional info:
The cluster services include mounted disks.
For this setup a LVM HA configuration is used (RHEL5.6).

Comment 1 ot 2011-08-26 15:44:29 UTC
I am faced with the same issue. Any idea when will a fix be available? Any work-arounds?

(In reply to comment #0)
> Description of problem:
> 
> We have the following issue on a two node cluster system (node1 + node2) with
> qdisk.
> 
> The RGManager command clustat does not show the cluster services any more on
> node1. The RGManager status was also not listed for both cluster members by
> clustat.
> 
> On cluster member node2, the command clustat shows the cluster services and the
> RGManager for both nodes as running.
> 
> The status logging of the RGManager stops on the cluster members 2 days ago
> without errors.
> 
> See below the command clustat from both cluster nodes:
> 
> 
> node1 ~ # clustat
> Service states unavailable: Temporary failure; try again
> Cluster Status for CL_XY @ Fri Mar 18 08:51:21 2011
> Member Status: Quorate
> 
> Member Name         ID  Status
> ------ ----    ---- ------
> node1   1  Online, Local
> node2   2  Online
> /dev/dm-11  0  Online, Quorum Disk
> 
> 
> node2 ~ # clustat
> Cluster Status for CL_XY @ Fri Mar 18 08:53:33 2011
> Member Status: Quorate
> 
> Member Name         ID         Status
> ------ ----           ---- ------
> node1          1         Online, RG-Worker
> node2          2         Online, Local, RG-Master
> /dev/dm-11  0         Online, Quorum Disk
> 
> Service Name  Owner (Last)  State 
> ------- ----  ----- ------  ----- 
> service:1  node1   started
> service:2  node1   started 
> service:3  node2   started
> service:4  node2   started
> 
> Furthermore the LVM command does not successfully execute. The commands just
> hang, but it was possible to kill the processes by pressing ctrl+c.
> 
> The exact same issue happend a month ago on this system with RHEL5.5. In the
> meantime the cluster was updated to RHEL5.6. Debugging for RGManager and cman
> is now activated in the /etc/cluster/cluster.conf file.
> 
> 
> Version-Release number of selected component (if applicable):
> 
> kernel-2.6.18-238.5.1.el5.x86_64
> cman-2.0.115-68.el5_6.1.x86_64
> rgmanager-2.0.52-9.el5.x86_64
> 
> How reproducible:
> not reproducible
> 
> Steps to Reproduce:
> 1.
> 2.
> 3.
> 
> Actual results:
> RGManager does stop writing logs.
> Command clustat is not working for 1 node.
> LVM commands stop working.
> 
> Expected results:
> 
> 
> Additional info:
> The cluster services include mounted disks.
> For this setup a LVM HA configuration is used (RHEL5.6).

Comment 2 Lon Hohberger 2012-02-10 17:54:11 UTC
Looks like this slipped through the cracks.

clustat is just a victim here; if the cluster is locked up (e.g. fencing causing a problem or clvmd causing a problem), the errors will bubble up to the top.

That lvm commands were hanging indicates that this is not an rgmanager issue.

Comment 4 Alasdair Kergon 2012-02-10 18:21:13 UTC
If anyone sees this again with the current versions of the packages, then please reopen this and attach relevant LVM diagnostics to this bug.