Description of problem: After restarting the services (cman, clvmd) the clvmd service startup fails and freezes at "Activating VGs:" These messages are visible in /var/log/messages: Apr 1 04:13:55 a2 clogd[26973]: [NqVaQc2F] Failed to open checkpoint: SA_AIS_ERR_LIBRARY Apr 1 04:13:55 a2 clogd[26973]: [NqVaQc2F] Failed to import checkpoint from 3 Apr 1 04:13:55 a2 kernel: clogd(26973): unaligned access to 0x6000000000010bdc, ip=0x4000000000056821 Apr 1 04:14:10 a2 kernel: device-mapper: dm-log-clustered: [NqVaQc2F] Request timed out: [DM_CLOG_RESUME/25] - retrying The last one keeps repeating with the number (25) increasing. Version-Release number of selected component (if applicable): lvm2-cluster-2.02.40-7.el5 cman-2.0.98-1.el5 Linux a3 2.6.18-128.el5 #1 SMP Wed Dec 17 11:44:28 EST 2008 ia64 ia64 ia64 GNU/Linux How reproducible: every time Steps to Reproduce: 1. reboot two nodes (tested with 2nodes cluster) 2. nodes form quorum and operate as expected, vgs up and visible 3. login to first node and do: service clvmd stop; service cman stop; service cman start; service clvmd start 4. the last command freezes as described Actual results: service clvmd start freezes and log keeps repeating DM_CLOG_RESUME error Expected results: clvmd should come up, properly register to cluster and activate all VGS as it does when it reboots Additional info: # cat /etc/cluster/cluster.conf <?xml version="1.0"?> <cluster name="a-cluster" config_version="8"> <cman two_node="1"> </cman> <fence_daemon post_join_delay="20" post_fail_delay="20" clean_start="1"/> <clusternodes> <!-- <clusternode name="a1" votes="1" nodeid="1"> <fcdriver>lpfc</fcdriver> <arch>ia64</arch> </clusternode> --> <clusternode name="a2" votes="1" nodeid="2"> <fcdriver>lpfc</fcdriver> <arch>ia64</arch> </clusternode> <clusternode name="a3" votes="1" nodeid="3"> <fcdriver>lpfc</fcdriver> <arch>ia64</arch> </clusternode> </clusternodes> <fencedevices> <fencedevice name="apc1" agent="fence_apc" ipaddr="link-apc" login="xxx" passwd="xxx"/> </fencedevices> </cluster> All following commands are run from the 'healthy' node which still sees all VGS: # group_tool type level name id state fence 0 default 00010002 none [2 3] dlm 1 clvmd 00020002 none [2 3] # cman_tool status Version: 6.1.0 Config Version: 8 Cluster Name: a-cluster Cluster Id: 43956 Cluster Member: Yes Cluster Generation: 420 Membership state: Cluster-Member Nodes: 2 Expected votes: 2 Total votes: 2 Quorum: 1 Active subsystems: 8 Flags: 2node Dirty Ports Bound: 0 11 Node name: a3 Node ID: 3 Multicast addresses: 239.192.171.96 Node addresses: 10.15.89.181 # vgs VG #PV #LV #SN Attr VSize VFree GFSMIRROR 3 1 0 wz--nc 1.45T 1.43T VolGroup00 1 2 0 wz--n- 136.62G 0 clustervg 1 1 0 wz--n- 495.71G 485.71G # lvs LV VG Attr LSize Origin Snap% Move Log Copy% Convert GFSM-1 GFSMIRROR mwi-a- 10.00G GFSM-1_mlog 100.00 LogVol00 VolGroup00 -wi-ao 126.81G LogVol01 VolGroup00 -wi-ao 9.81G first clustervg -wi-a- 10.00G
I think there is some related bug for this already, but in any case it is clustered mirror log problem.
Created attachment 337579 [details] /var/log/messages covering services startup which ends up hanging /var/log/messages covering services startup which ends up hanging
This bug/component is not included in scope for RHEL-5.11.0 which is the last RHEL5 minor release. This Bugzilla will soon be CLOSED as WONTFIX (at the end of RHEL5.11 development phase (Apr 22, 2014)). Please contact your account manager or support representative in case you need to escalate this bug.
Thank you for submitting this request for inclusion in Red Hat Enterprise Linux 5. We've carefully evaluated the request, but are unable to include it in RHEL5 stream. If the issue is critical for your business, please provide additional business justification through the appropriate support channels (https://access.redhat.com/site/support).