493384 – VGs don't activate properly after services restart

Bug 493384 - VGs don't activate properly after services restart

Summary: VGs don't activate properly after services restart

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	cmirror
Sub Component:
Version:	5.3
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Jonathan Earl Brassow
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2009-04-01 15:57 UTC by Jaroslav Kortus
Modified:	2014-06-09 11:06 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2014-06-03 12:36:54 UTC
Target Upstream Version:
Embargoed:
Dependent Products:
Flags:	jkortus: needinfo-

Attachments	(Terms of Use)
/var/log/messages covering services startup which ends up hanging (12.90 KB, application/octet-stream) 2009-04-01 16:39 UTC, Jaroslav Kortus	no flags	Details
View All

Description Jaroslav Kortus 2009-04-01 15:57:26 UTC

Description of problem:
After restarting the services (cman, clvmd) the clvmd service startup fails and freezes at "Activating VGs:"
These messages are visible in /var/log/messages:
Apr  1 04:13:55 a2 clogd[26973]: [NqVaQc2F] Failed to open checkpoint: SA_AIS_ERR_LIBRARY 
Apr  1 04:13:55 a2 clogd[26973]: [NqVaQc2F] Failed to import checkpoint from 3 
Apr  1 04:13:55 a2 kernel: clogd(26973): unaligned access to 0x6000000000010bdc, ip=0x4000000000056821
Apr  1 04:14:10 a2 kernel: device-mapper: dm-log-clustered: [NqVaQc2F] Request timed out: [DM_CLOG_RESUME/25] - retrying

The last one keeps repeating with the number (25) increasing.

Version-Release number of selected component (if applicable):
lvm2-cluster-2.02.40-7.el5
cman-2.0.98-1.el5
Linux a3 2.6.18-128.el5 #1 SMP Wed Dec 17 11:44:28 EST 2008 ia64 ia64 ia64 GNU/Linux

How reproducible:
every time

Steps to Reproduce:
1. reboot two nodes (tested with 2nodes cluster)
2. nodes form quorum and operate as expected, vgs up and visible
3. login to first node and do: service clvmd stop; service cman stop; service cman start; service clvmd start
4. the last command freezes as described
  
Actual results:
service clvmd start freezes and log keeps repeating DM_CLOG_RESUME error

Expected results:
clvmd should come up, properly register to cluster and activate all VGS as it does when it reboots

Additional info:
# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster name="a-cluster" config_version="8">
                <cman two_node="1">
                </cman>
                <fence_daemon post_join_delay="20" post_fail_delay="20" clean_start="1"/>

                <clusternodes>
<!--                    <clusternode name="a1" votes="1" nodeid="1">
                                <fcdriver>lpfc</fcdriver>
                                <arch>ia64</arch>
                        </clusternode>
-->                     <clusternode name="a2" votes="1" nodeid="2">
                                <fcdriver>lpfc</fcdriver>
                                <arch>ia64</arch>
                        </clusternode>
                        <clusternode name="a3" votes="1" nodeid="3">
                                <fcdriver>lpfc</fcdriver>
                                <arch>ia64</arch>
                        </clusternode>
                </clusternodes>

                <fencedevices>
                        <fencedevice name="apc1" agent="fence_apc" ipaddr="link-apc" login="xxx" passwd="xxx"/>
                </fencedevices>

        </cluster>

All following commands are run from the 'healthy' node which still sees all VGS:
# group_tool 
type             level name     id       state       
fence            0     default  00010002 none        
[2 3]
dlm              1     clvmd    00020002 none        
[2 3]

# cman_tool status
Version: 6.1.0
Config Version: 8
Cluster Name: a-cluster
Cluster Id: 43956
Cluster Member: Yes
Cluster Generation: 420
Membership state: Cluster-Member
Nodes: 2
Expected votes: 2
Total votes: 2
Quorum: 1  
Active subsystems: 8
Flags: 2node Dirty 
Ports Bound: 0 11  
Node name: a3
Node ID: 3
Multicast addresses: 239.192.171.96 
Node addresses: 10.15.89.181 

# vgs
  VG         #PV #LV #SN Attr   VSize   VFree  
  GFSMIRROR    3   1   0 wz--nc   1.45T   1.43T
  VolGroup00   1   2   0 wz--n- 136.62G      0 
  clustervg    1   1   0 wz--n- 495.71G 485.71G

# lvs
  LV       VG         Attr   LSize   Origin Snap%  Move Log         Copy%  Convert
  GFSM-1   GFSMIRROR  mwi-a-  10.00G                    GFSM-1_mlog 100.00        
  LogVol00 VolGroup00 -wi-ao 126.81G                                              
  LogVol01 VolGroup00 -wi-ao   9.81G                                              
  first    clustervg  -wi-a-  10.00G

Comment 1 Milan Broz 2009-04-01 16:15:50 UTC

I think there is some related bug for this already, but in any case it is clustered mirror log problem.

Comment 2 Jaroslav Kortus 2009-04-01 16:39:43 UTC

Created attachment 337579 [details]
/var/log/messages covering services startup which ends up hanging

/var/log/messages covering services startup which ends up hanging

Comment 4 RHEL Program Management 2014-03-07 13:46:26 UTC

This bug/component is not included in scope for RHEL-5.11.0 which is the last RHEL5 minor release. This Bugzilla will soon be CLOSED as WONTFIX (at the end of RHEL5.11 development phase (Apr 22, 2014)). Please contact your account manager or support representative in case you need to escalate this bug.

Comment 5 RHEL Program Management 2014-06-03 12:36:54 UTC

Thank you for submitting this request for inclusion in Red Hat Enterprise Linux 5. We've carefully evaluated the request, but are unable to include it in RHEL5 stream. If the issue is critical for your business, please provide additional business justification through the appropriate support channels (https://access.redhat.com/site/support).

Note You need to log in before you can comment on or make changes to this bug.