Bug 436507
Summary: | RHEL5 cmirror tracker: failed checkpoint issues can cause volume activation hang | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Corey Marthaler <cmarthal> |
Component: | cmirror | Assignee: | Jonathan Earl Brassow <jbrassow> |
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 5.2 | CC: | agk, ccaulfie, dwysocha, edamato, heinzm, mbroz |
Target Milestone: | rc | Keywords: | Reopened |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2009-01-20 21:25:39 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Corey Marthaler
2008-03-07 17:08:38 UTC
This is reproducable: SCENARIO - [extend_mirror_into_non_contig_space_on_primary_leg] Create a mirror with non contiguous space on the primary leg only for expansion, do the expansion, and verify the expansion didn't spoil mirror redundancy lvcreate -m 1 -n non_contig_prim_expand -L 20M mirror_sanity /dev/sdd1:0-500 /dev/sdb1:0-500 /dev/sdf1:0-50 Error locking on node taft-02: Command timed out Aborting. Failed to activate new LV to wipe the start of it. Error locking on node taft-02: Command timed out couldn't create mirror non_contig_prim_expand [...] Mar 7 13:34:53 taft-03 kernel: device-mapper: dm-log-clustered: Request timed out on DM_CLOG_SET_REGION_SYNC:158788 - retrying Mar 7 13:35:08 taft-03 kernel: device-mapper: dm-log-clustered: Request timed out on DM_CLOG_SET_REGION_SYNC:158789 - retrying Mar 7 13:35:23 taft-03 kernel: device-mapper: dm-log-clustered: Request timed out on DM_CLOG_SET_REGION_SYNC:158790 - retrying Mar 7 13:35:23 taft-03 clogd[6706]: kernel_recv: Preallocated transfer structs exhausted Mar 7 13:35:23 taft-03 clogd[6706]: cpg_message_callback: Preallocated transfer structs exhausted Mar 7 13:35:38 taft-03 last message repeated 2 times Mar 7 13:35:38 taft-03 clogd[6706]: kernel_recv: Preallocated transfer structs exhausted Mar 7 13:35:38 taft-03 kernel: device-mapper: dm-log-clustered: Request timed out on DM_CLOG_SET_REGION_SYNC:158791 - retrying Mar 7 13:35:38 taft-03 clogd[6706]: cpg_message_callback: Preallocated transfer structs exhausted Mar 7 13:35:53 taft-03 last message repeated 2 times Mar 7 13:35:53 taft-03 clogd[6706]: kernel_recv: Preallocated transfer structs exhausted Mar 7 13:35:53 taft-03 kernel: device-mapper: dm-log-clustered: Request timed out on DM_CLOG_SET_REGION_SYNC:158792 - retrying Mar 7 13:35:53 taft-03 clogd[6706]: cpg_message_callback: Preallocated transfer structs exhausted Mar 7 13:36:08 taft-03 last message repeated 2 times believe this to be a dupe of bugzill #400941. *** This bug has been marked as a duplicate of 400941 *** This is still reproducable with openais-0.80.3-17.el5. SCENARIO - [extend_mirror_into_non_contig_space_on_primary_leg] Create a mirror with non contiguous space on the primary leg only for expansion, do the expansion, and verify the expansion didn't spoil mirror redundancy lvcreate -m 1 -n non_contig_prim_expand -L 20M mirror_sanity /dev/etherd/e1.1p4:0-500 /dev/etherd/e1.1p3:0-500 /dev/etherd/e1.1p2:0-50 Error locking on node hayes-03: Command timed out Error locking on node hayes-01: Command timed out Aborting. Failed to activate new LV to wipe the start of it. Error locking on node hayes-03: Command timed out Error locking on node hayes-01: Command timed out couldn't create mirror non_contig_prim_expand I'm seeing tons of the following messages in the logs: Jul 9 10:46:39 hayes-01 kernel: device-mapper: dm-log-clustered: [V21KYIHt] Request timed out: [DM_CLOG_RESUME/34996] - retrying Jul 9 10:46:39 hayes-01 clogd[3162]: kernel_recv: Preallocated transfer structs exhausted Jul 9 10:46:39 hayes-01 clogd[3162]: cpg_message_callback: Preallocated transfer structs exhausted Jul 9 10:46:53 hayes-01 clogd[3162]: cpg_message_callback: Preallocated transfer structs exhausted Jul 9 10:46:54 hayes-01 clogd[3162]: kernel_recv: Preallocated transfer structs exhausted Jul 9 10:46:54 hayes-01 kernel: device-mapper: dm-log-clustered: [V21KYIHt] Request timed out: [DM_CLOG_RESUME/34997] - retrying The messages: Mar 7 10:51:57 taft-02 clogd[6689]: [R5GGAijX] Failed to open checkpoint for 3 Mar 7 10:51:57 taft-02 clogd[6689]: Failed to export checkpoint are exactly what I would expect from bug 455453. The later messages can happen due to a massive backlog of requests that need processing... something that will occur because the nodes cannot get their checkpoints. I'm assuming this is fixed by the following check-in: commit 6c8d7408095782bb00b5361a7df5973f3dcda183 Author: Jonathan Brassow <jbrassow> Date: Tue Jul 15 11:58:26 2008 -0500 clogd: Fix for bug 455453: small mirror creation fails Was setting the checkpoint attribute 'attr.maxSectionSize' with the size of the bitmap. However, when mirrors are really small (<= 30M) other sections may have a larger size and need to considered. No longer seeing this issue with the latest code. Marking verified. 2.6.18-98.el5 lvm2-2.02.32-4.el5 BUILT: Fri Apr 4 06:15:19 CDT 2008 lvm2-cluster-2.02.32-4.el5 BUILT: Wed Apr 2 03:56:50 CDT 2008 device-mapper-1.02.24-1.el5 BUILT: Thu Jan 17 16:46:05 CST 2008 cmirror-1.1.22-1.el5 BUILT: Thu Jul 24 15:59:03 CDT 2008 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2009-0158.html |