Bug 225337
Summary: | conversion of mirrors can cause the sync percent to get stuck at different spots below 100% | ||
---|---|---|---|
Product: | [Retired] Red Hat Cluster Suite | Reporter: | Corey Marthaler <cmarthal> |
Component: | cmirror | Assignee: | Jonathan Earl Brassow <jbrassow> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Cluster QE <mspqa-list> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4 | CC: | agk, dwysocha, jbrassow, mbroz, prockai |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2010-04-27 14:57:34 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Corey Marthaler
2007-01-29 22:59:03 UTC
This is reproducable. Caused it to happen while after rebooting the cluster due to bz 199433 [root@salem ~]# lvs -a -o +devices LV VG Attr LSize Origin Snap% Move Log Copy% Devices kool new -wi-a- 10.00G /dev/sdc(2560) [kool_mimage_0] new vwi-a- 10.00G kool_mimage_1 new -wi-a- 10.00G /dev/sde(0) kool_mlog new -wi-a- 4.00M /dev/sdd(0) salem new mwi-a- 10.00G salem_mlog 7.70 salem_mimage_0(0),salem_mimage_1(0) [salem_mimage_0] new iwi-ao 10.00G /dev/sdc(0) [salem_mimage_1] new iwi-ao 10.00G /dev/sde(2560) [salem_mlog] new lwi-ao 4.00M /dev/sdd(1) [root@salem ~]# [root@salem ~]# [root@salem ~]# lvconvert -m 0 /dev/new/kool Logical volume kool is already not mirrored. [root@salem ~]# lvconvert -m 1 /dev/new/kool Internal error: Duplicate LV name kool_mlog detected in new. Failed to create mirror log. This can apparently happen by just doing mirror up and down converts. [root@link-02 ~]# lvs -a -o +devices LV VG Attr LSize Origin Snap% Move Log Copy% Devices mirror1 vg Mwi-so 5.00G mirror1_mlog 99.92 mirror1_mimage_0(0),mirror1_mimage_1(0) [mirror1_mimage_0] vg iwi-so 5.00G /dev/sdh1(1280) [mirror1_mimage_1] vg iwi-so 5.00G /dev/sdb1(0) [mirror1_mlog] vg lwi-so 4.00M /dev/sda1(1283) mirror2 vg Mwi-so 5.00G mirror2_mlog 99.92 mirror2_mimage_0(0),mirror2_mimage_1(0) [mirror2_mimage_0] vg iwi-so 5.00G /dev/sdh1(2560) [mirror2_mimage_1] vg iwi-so 5.00G /dev/sdc1(0) [mirror2_mlog] vg lwi-so 4.00M /dev/sda1(1284) mirror3 vg Mwi-so 5.00G mirror3_mlog 100.00 mirror3_mimage_0(0),mirror3_mimage_1(0) [mirror3_mimage_0] vg iwi-so 5.00G /dev/sdh1(3840) [mirror3_mimage_1] vg iwi-so 5.00G /dev/sdd1(0) [mirror3_mlog] vg lwi-so 4.00M /dev/sda1(1282) I suspect that in the quick failure case, only a recovery write failed. It seems there is currently no mechanism to raise an event (what causes dmeventd to reconfigure the mirror) when an error happens durring recovery. This is probably by design and probably the right thing to do. This is certainly what is causing your results. If you lvchange -an; lvchange -ay, the problem will resolve itself. This is less than ideal though... We keep track of where we are in the recovery process through a variable called 'sync_search'. This never gets set back to zero unless the mirror table is reloaded (lvchange -an; lvchange -ay). It might be nice to reset 'sync_search' if it is >= 'region_count' and we receive a 'get_resync_work' request, since we won't get those requests if the client thinks the mirror is in-sync. I'll have to give the above more thought, and this would definitly involve kernel changes... I am certain that this problem affects single machine mirroring too. Careful consideration will need to be made for mirrors that wish to ignore failures (i.e. pvmove). Wait... 'sync_search' is in the logging code. This allows me to make the change without affecting the kernel. The change is: static int _core_get_resync_work(struct log_c *lc, region_t *region) { if (lc->sync_search >= lc->region_count) { /* * FIXME: pvmove is not supported yet, but when it is, * an audit of sync_count changes will need to be made */ if (lc->sync_count < lc->region_count) { lc->sync_search = 0; } else { return 0; } } ... This has not been reproduced after running many up and down cmirror convert operations. Marking VERIFIED. Assuming this VERIFIED fix got released. Closing. Reopen if it's not yet resolved. |