Bug 445456
Summary: | RHEL5 cmirror tracker: multiple mirrors can cause copy percent to get stuck | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Corey Marthaler <cmarthal> | ||||
Component: | cmirror | Assignee: | Jonathan Earl Brassow <jbrassow> | ||||
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 5.3 | CC: | agk, ccaulfie, dwysocha, edamato, heinzm, mbroz | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2009-01-20 21:26:08 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Corey Marthaler
2008-05-06 21:53:11 UTC
reproduced this, and found that not only can the different sync counts be stuck, but they can be off from node to node. [root@hayes-03 ~]# lvs LV VG Attr LSize Origin Snap% Move Log Copy% Convert hayes-01.4588 lock_stress mwi-a- 500.00M 96.80 hayes-01.4592 lock_stress mwi-a- 500.00M 11.20 hayes-01.4594 lock_stress mwi-a- 500.00M 100.00 hayes-02.4582 lock_stress mwi-a- 500.00M 94.40 hayes-02.4586 lock_stress mwi-a- 500.00M 96.80 hayes-02.4590 lock_stress mwi-a- 500.00M 17.60 hayes-03.4584 lock_stress mwi-a- 500.00M 100.00 hayes-03.4596 lock_stress mwi-a- 500.00M 8.80 hayes-03.4598 lock_stress mwi-a- 500.00M 24.80 [root@hayes-02 ~]# lvs LV VG Attr LSize Origin Snap% Move Log Copy% Convert hayes-01.4588 lock_stress mwi-a- 500.00M 93.60 hayes-01.4592 lock_stress mwi-a- 500.00M 16.00 hayes-01.4594 lock_stress mwi-a- 500.00M 94.40 hayes-02.4582 lock_stress mwi-a- 500.00M 100.00 hayes-02.4586 lock_stress mwi-a- 500.00M 95.20 hayes-02.4590 lock_stress mwi-a- 500.00M 16.00 hayes-03.4584 lock_stress mwi-a- 500.00M 96.80 hayes-03.4596 lock_stress mwi-a- 500.00M 13.60 hayes-03.4598 lock_stress mwi-a- 500.00M 18.40 What is the build date for the cmirror rpm you are using. You mention kmod-cmirror, but not cmirror. [root@hayes-02 ~]# rpm -qi cmirror Name : cmirror Relocations: (not relocatable) Version : 1.1.15 Vendor: Red Hat, Inc. Release : 1.el5 Build Date: Thu 28 Feb 2008 01:04:29 PM CST Install Date: Mon 05 May 2008 12:01:16 PM CDT Build Host: hs20-bc1-7.build.redhat.com Jon and I have narrowed this down to just needing to create 8 mirrors. Doing so usually leaves one of the mirrors' copy percent stuck at something less than at 100%. 2.6.18-92.1.5.el5 lvm2-2.02.39-2.el5 BUILT: Wed Jul 9 07:26:29 CDT 2008 lvm2-cluster-2.02.39-1.el5 BUILT: Thu Jul 3 09:31:57 CDT 2008 device-mapper-1.02.27-1.el5 BUILT: Thu Jul 3 03:22:29 CDT 2008 cmirror-1.1.19-2.el5 BUILT: Tue Jul 8 11:15:54 CDT 2008 kmod-cmirror-0.1.10-1.el5 BUILT: Tue May 20 14:55:48 CDT 2008 sync_check_1 mirror_sanity mwi-a- 1.00G sync_check_1_mlog 100.00 sync_check_2 mirror_sanity mwi-a- 1.00G sync_check_2_mlog 100.00 sync_check_3 mirror_sanity mwi-a- 1.00G sync_check_3_mlog 100.00 sync_check_4 mirror_sanity mwi-a- 1.00G sync_check_4_mlog 100.00 sync_check_5 mirror_sanity mwi-a- 1.00G sync_check_5_mlog 100.00 sync_check_6 mirror_sanity mwi-a- 1.00G sync_check_6_mlog 100.00 sync_check_7 mirror_sanity mwi-a- 1.00G sync_check_7_mlog 100.00 sync_check_8 mirror_sanity mwi-a- 1.00G sync_check_8_mlog 40.23 I appear to have hit this on hayes-01 by just doing a simple upconvert: Create a mirror and then attempt to up convert it lvcreate -m 1 -n mirror_up_converts -L 100M --corelog mirror_sanity Upconvert to 2 redundant legs lvconvert -m 2 --corelog /dev/mirror_sanity/mirror_up_converts mirror_up_converts mirror_sanity mwi-a- 100.00M 4.00 Copy percent is stuck at 4% now, and the process is looping with the following output I'll attach... Created attachment 311963 [details]
strace output
The strace prints the following over and over...
I think upconverts are broken on cluster mirrors. It seems that lvm is not properly suspending the first mirror before activating the new one... leaving two different mirrors with overlapping devices active. (That's bad.) Pretty sure this should fix the bug... at the very least, it will fix several problems: commit 613d97438673200c87e4b07e3c4ee659c01acf65 Author: Jonathan Brassow <jbrassow> Date: Wed Jul 23 15:12:24 2008 -0500 dm-log-clustered: Fix bug 445456 I was resetting some common memory outside of the protection of a lock. Fix verified with the latest code. 2.6.18-98.el5 lvm2-2.02.32-4.el5 BUILT: Fri Apr 4 06:15:19 CDT 2008 lvm2-cluster-2.02.32-4.el5 BUILT: Wed Apr 2 03:56:50 CDT 2008 device-mapper-1.02.24-1.el5 BUILT: Thu Jan 17 16:46:05 CST 2008 cmirror-1.1.22-1.el5 BUILT: Thu Jul 24 15:59:03 CDT 2008 kmod-cmirror-0.1.13-2.el5 BUILT: Thu 24 Jul 2008 04:00:48 PM CD An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2009-0158.html |