Hide Forgot
Description of problem: The creation attempt of multiple cmirror eventually results in a deadlock. SCENARIO - [many_mirrors] Recreating VG and PVs to increase metadata size Writing physical volume data to disk "/dev/sdd1" Writing physical volume data to disk "/dev/sdd2" Writing physical volume data to disk "/dev/sde1" Writing physical volume data to disk "/dev/sde2" Writing physical volume data to disk "/dev/sdf1" Writing physical volume data to disk "/dev/sdf2" Writing physical volume data to disk "/dev/sdg1" Writing physical volume data to disk "/dev/sdg2" Writing physical volume data to disk "/dev/sdh1" Writing physical volume data to disk "/dev/sdh2" Making 200 mirrors... 1 taft-04: lvcreate -m 1 -n 200_1 -L 25M --nosync mirror_sanity WARNING: New mirror won't be synchronised. Don't read what you didn't write! 2 taft-04: lvcreate -m 1 -n 200_2 -L 25M --nosync mirror_sanity WARNING: New mirror won't be synchronised. Don't read what you didn't write! [...] 129 taft-02: lvcreate -m 1 -n 200_129 -L 25M --nosync mirror_sanity WARNING: New mirror won't be synchronised. Don't read what you didn't write! 130 taft-03: lvcreate -m 1 -n 200_130 -L 25M --nosync mirror_sanity WARNING: New mirror won't be synchronised. Don't read what you didn't write! [DEADLOCK] Dec 12 13:38:38 taft-01 qarshd[18232]: Running cmdline: lvcreate -m 1 -n 500_122 -L 25M --nosync mirror_sanity Dec 12 13:38:43 taft-01 clogd[6351]: cpg_initialize failed: Cannot join cluster Dec 12 13:38:43 taft-01 clogd[6351]: clog_resume: Failed to create cluster CPG Dec 12 13:38:43 taft-01 lvm[6597]: Monitoring mirror device mirror_sanity-500_122 for events. Dec 12 13:38:48 taft-01 clogd[6351]: cpg_initialize failed: Cannot join cluster Dec 12 13:38:48 taft-01 clogd[6351]: clog_resume: Failed to create cluster CPG Dec 12 13:38:48 taft-01 lvm[6597]: Monitoring mirror device mirror_sanity-500_123 for events. Dec 12 13:38:48 taft-01 qarshd[18367]: Running cmdline: lvcreate -m 1 -n 500_124 -L 25M --nosync mirror_sanity Dec 12 13:38:53 taft-01 clogd[6351]: cpg_initialize failed: Cannot join cluster Dec 12 13:38:53 taft-01 clogd[6351]: clog_resume: Failed to create cluster CPG Dec 12 13:38:53 taft-01 lvm[6597]: Monitoring mirror device mirror_sanity-500_124 for events. Dec 12 13:38:53 taft-01 qarshd[18435]: Running cmdline: lvcreate -m 1 -n 500_125 -L 25M --nosync mirror_sanity Dec 12 13:38:59 taft-01 clogd[6351]: cpg_initialize failed: Cannot join cluster Dec 12 13:38:59 taft-01 clogd[6351]: clog_resume: Failed to create cluster CPG Dec 12 13:38:59 taft-01 lvm[6597]: Monitoring mirror device mirror_sanity-500_125 for events. Dec 12 13:39:04 taft-01 clogd[6351]: cpg_initialize failed: Cannot join cluster Dec 12 13:39:04 taft-01 clogd[6351]: clog_resume: Failed to create cluster CPG Dec 12 13:39:04 taft-01 lvm[6597]: Monitoring mirror device mirror_sanity-500_126 for events. Version-Release number of selected component (if applicable): 2.6.18-274.el5 lvm2-2.02.88-5.el5 BUILT: Fri Dec 2 12:25:45 CST 2011 lvm2-cluster-2.02.88-5.el5 BUILT: Fri Dec 2 12:48:37 CST 2011 device-mapper-1.02.67-2.el5 BUILT: Mon Oct 17 08:31:56 CDT 2011 device-mapper-event-1.02.67-2.el5 BUILT: Mon Oct 17 08:31:56 CDT 2011 cmirror-1.1.39-14.el5 BUILT: Wed Nov 2 17:25:33 CDT 2011 kmod-cmirror-0.1.22-3.el5 BUILT: Tue Dec 22 13:39:47 CST 2009
This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.8 and Red Hat does not plan to fix this issue the currently developed update. Contact your manager or support representative in case you need to escalate this bug.
*** Bug 782156 has been marked as a duplicate of this bug. ***
Encountered the same issue while testing mirrors in a cluster. reserved_memory set to 32768 the error shows itself around 220th mirror. The errors in /var/log/messages are: [lvm_cluster_mirror] [lvm_cluster_mirror_sanity] 230 a1: lvcreate -m 1 -n 500_230 -L 25M --nosync mirror_sanity [lvm_cluster_mirror] [lvm_cluster_mirror_sanity] WARNING: New mirror won't be synchronised. Don't read what you didn't write! errors in /var/log/messages: (08:25:09) [root@a1:/var/log]$ tail /var/log/messages Jan 16 08:26:23 a1 kernel: clogd(26041): unaligned access to 0x600000000001160c, ip=0x4000000000005ef0 Jan 16 08:26:23 a1 kernel: clogd(26041): unaligned access to 0x6000000000011614, ip=0x4000000000005f10 Jan 16 08:26:23 a1 clogd[26041]: cpg_mcast_joined error: SA_AIS_ERR_BAD_HANDLE Jan 16 08:26:28 a1 last message repeated 36 times Jan 16 08:26:28 a1 kernel: kernel unaligned access to 0xe0000001f1a60394, ip=0xa00000020371e4d0 Jan 16 08:26:28 a1 kernel: kernel unaligned access to 0xe0000001f1a6043c, ip=0xa00000020371e560 Jan 16 08:26:28 a1 kernel: clogd(26041): unaligned access to 0x600000000001160c, ip=0x400000000006f9f0 Jan 16 08:26:28 a1 kernel: clogd(26041): unaligned access to 0x600000000001160c, ip=0x4000000000005ef0 Jan 16 08:26:28 a1 kernel: clogd(26041): unaligned access to 0x6000000000011614, ip=0x4000000000005f10 Jan 16 08:26:28 a1 clogd[26041]: cpg_mcast_joined error: SA_AIS_ERR_BAD_HANDLE The operation can be unlocked after executing vgs or vgscan command on the active node (which can as well get stuck, then do the same command on one other node and they both get unlocked), after which the mirrors continue being created for maybe 10 more or 15 times and it gets deadlocked again. This can be repeated as far as I can tell indefinitely. The errors showing in /var/log/messages then are: Jan 16 11:04:40 a1 kernel: kernel unaligned access to 0xe0000001f3490714, ip=0xa000000202f424d0 Jan 16 11:04:40 a1 kernel: kernel unaligned access to 0xe0000001f34907bc, ip=0xa000000202f42560 Jan 16 11:04:40 a1 kernel: clogd(4059): unaligned access to 0x600000000001160c, ip=0x400000000006f9f0 Jan 16 11:04:40 a1 kernel: clogd(4059): unaligned access to 0x60000000000116b4, ip=0x4000000000057f60 Jan 16 11:04:40 a1 kernel: clogd(4059): unaligned access to 0x60000000000116b4, ip=0x4000000000058260 Jan 16 11:04:40 a1 clogd[4059]: cpg_initialize failed: Cannot join cluster Jan 16 11:04:40 a1 clogd[4059]: clog_resume: Failed to create cluster CPG Jan 16 11:04:40 a1 lvm[4524]: Monitoring mirror device mirror_sanity-500_322 for events
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release.
There simply may need to be limits placed on the number of cluster mirrors that are allowed. It doesn't look like checkpointing/CPG can handle the load of all the mirrors.