Description of problem: Every so often my hayes cluster (which uses AOE storage) will be unable to complete the simple task of creating/activating cmirrors. I'm not sure how I get into this state, but once it's there it's a hard process to get out of it. [root@hayes-01 ~]# service clvmd start Starting clvmd: [ OK ] Activating VGs: [HANG DUE TO EXISTING MIRROR] Jul 30 16:28:38 hayes-03 clogd[4395]: Invalid log request received, ignoring. device-mapper: dm-log-clustered: [U9LjlBOE] Request timed out: [DM_CLOG_CTR/20] - retrying Jul 30 16:28:53 hayes-03 kernel: device-mapper: dm-log-clustered: [U9LjlBOE] Request timed out: [DM_CLOG_CTR/20] - retrying Jul 30 16:28:53 hayes-03 clogd[4395]: Invalid log request received, ignoring. device-mapper: dm-log-clustered: [U9LjlBOE] Request timed out: [DM_CLOG_CTR/21] - retrying Jul 30 16:29:08 hayes-03 kernel: device-mapper: dm-log-clustered: [U9LjlBOE] Request timed out: [DM_CLOG_CTR/21] - retrying Jul 30 16:29:08 hayes-03 clogd[4395]: Invalid log request received, ignoring. device-mapper: dm-log-clustered: [U9LjlBOE] Request timed out: [DM_CLOG_CTR/22] - retrying Jul 30 16:29:23 hayes-03 kernel: device-mapper: dm-log-clustered: [U9LjlBOE] Request timed out: [DM_CLOG_CTR/22] - retrying Jul 30 16:29:23 hayes-03 clogd[4395]: Invalid log request received, ignoring. device-mapper: dm-log-clustered: [U9LjlBOE] Request timed out: [DM_CLOG_CTR/23] - retrying Jul 30 16:29:38 hayes-03 kernel: device-mapper: dm-log-clustered: [U9LjlBOE] Request timed out: [DM_CLOG_CTR/23] - retrying Jul 30 16:29:38 hayes-03 clogd[4395]: Invalid log request received, ignoring. device-mapper: dm-log-clustered: [U9LjlBOE] Request timed out: [DM_CLOG_CTR/24] - retrying Jul 30 16:29:53 hayes-03 kernel: device-mapper: dm-log-clustered: [U9LjlBOE] Request timed out: [DM_CLOG_CTR/24] - retrying Jul 30 16:29:53 hayes-03 clogd[4395]: Invalid log request received, ignoring. device-mapper: dm-log-clustered: [U9LjlBOE] Request timed out: [DM_CLOG_CTR/25] - retrying Jul 30 16:30:08 hayes-03 kernel: device-mapper: dm-log-clustered: [U9LjlBOE] Request timed out: [DM_CLOG_CTR/25] - retrying Jul 30 16:30:08 hayes-03 clogd[4395]: Invalid log request received, ignoring. Nothing shows up as being bad in the debugging: Jul 30 16:30:39 hayes-02 clogd[4346]: Jul 30 16:30:39 hayes-02 clogd[4346]: LOG COMPONENT DEBUGGING:: Jul 30 16:30:39 hayes-02 clogd[4346]: Official log list: Jul 30 16:30:39 hayes-02 clogd[4346]: Pending log list: Jul 30 16:30:39 hayes-02 clogd[4346]: Resync request history: Jul 30 16:30:39 hayes-02 clogd[4346]: Jul 30 16:30:39 hayes-02 clogd[4346]: CLUSTER COMPONENT DEBUGGING:: Jul 30 16:30:39 hayes-02 clogd[4346]: Command History: Jul 30 16:30:42 hayes-02 clogd[4346]: Jul 30 16:30:42 hayes-02 clogd[4346]: LOG COMPONENT DEBUGGING:: Jul 30 16:30:42 hayes-02 clogd[4346]: Official log list: Jul 30 16:30:42 hayes-02 clogd[4346]: Pending log list: Jul 30 16:30:42 hayes-02 clogd[4346]: Resync request history: Jul 30 16:30:42 hayes-02 clogd[4346]: Jul 30 16:30:42 hayes-02 clogd[4346]: CLUSTER COMPONENT DEBUGGING:: Jul 30 16:30:42 hayes-02 clogd[4346]: Command History: This has been reproduce on this cluster quite a few times. Version-Release number of selected component (if applicable): 2.6.18-158.el5 lvm2-2.02.46-8.el5 BUILT: Thu Jun 18 08:06:12 CDT 2009 lvm2-cluster-2.02.46-8.el5 BUILT: Thu Jun 18 08:05:27 CDT 2009 device-mapper-1.02.32-1.el5 BUILT: Thu May 21 02:18:23 CDT 2009 cmirror-1.1.39-2.el5 BUILT: Mon Jul 27 15:39:05 CDT 2009 kmod-cmirror-0.1.21-14.el5 BUILT: Thu May 21 08:28:17 CDT 2009
This doesn't appear to be a net device issue as I just hit this on the grant cluster as well (fc storage). Right after clvmd segfaulted (due to bz 506986), I restarted the cluster and it hung attempting to start clvmd. Now even after cleaning up the volumes, and restarting everything, any cmirror create attempt hangs the cluster.
Looks like log daemon is failing to do something. After dd'ing to really wipe the storage clean, I was still unable to create anything. [root@hayes-01 ~]# lvcreate -m 1 -n mirror -L 10G VG Aborting. Failed to activate mirror log. Failed to create mirror log.
Release note added. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cluster mirrors larger than 1.5TB require larger region sizes, or will not work. Due to limitations in the cluster infrastructure, cluster mirrors greater than 1.5TB cannot be created with the default region size. Users that require larger mirrors should increase the region size from its default (512k) to something larger. Example: # -R <region_size_in_MiB> lvcreate -m1 -L 2T -R 2 -n mirror vol_group Failure to increase the region size will result in hung LVM creation and possibly hanging other LVM commands as well.
Release note updated. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,9 +1,6 @@ -Cluster mirrors larger than 1.5TB require larger region sizes, or will not work. - -Due to limitations in the cluster infrastructure, cluster mirrors greater than 1.5TB cannot be created with the default region size. Users that require larger mirrors should increase the region size from its default (512k) to something larger. - -Example: -# -R <region_size_in_MiB> +Due to limitations in the cluster infrastructure, cluster mirrors greater than 1.5TB cannot be created with the default region size. If larger mirrors are required, the region size should be increased from its default (512kB), for example: +<screen> +# -R <region_size_in_MiB> lvcreate -m1 -L 2T -R 2 -n mirror vol_group - +</screen> -Failure to increase the region size will result in hung LVM creation and possibly hanging other LVM commands as well.+Failure to increase the region size will result in the LVM creation process hanging and may cause other LVM commands to hang.
The given workaround will have to suffice for rhel5. The openAIS checkpoint limits will not be increased in RHEL5; therefore, a solution would involve automatically picking a region_size based on these limits. This would be a different behavior than single machine mirroring. I am closing this bug with no intent to fix in rhel5. It is possible to make-up for the limitations in openAIS checkpoints within LVM, but I don't see any demand for this - especially with such a simple workaround.
Hi, I seem to be running into similar problems when trying to pvmove stuff - weirdly enough, the pvmove mirrors are much smaller than 1.5 TB, but. Is there a way to increase the mirror region size for the pvmove mirrors?