Bug 436839
Summary: | RHEL5 cmirror tracker: left over dm devices after deactivation and removal | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Corey Marthaler <cmarthal> | ||||
Component: | cmirror | Assignee: | Jonathan Earl Brassow <jbrassow> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Cluster QE <mspqa-list> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 5.2 | CC: | agk, ccaulfie, dwysocha, edamato, happy, heinzm, mbroz | ||||
Target Milestone: | rc | Keywords: | Reopened | ||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2010-02-01 21:45:15 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Corey Marthaler
2008-03-10 18:55:52 UTC
This bug still exists in the latest packages that I have: 2.6.18-85.el5 lvm2-2.02.32-4.el5 lvm2-cluster-2.02.32-4.el5 cmirror-1.1.15-1.el5 kmod-cmirror-0.1.8-1.el5 device-mapper-1.02.24-1.el5 openais-0.80.3-15.el5 This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Development Management has reviewed and declined this request. You may appeal this decision by reopening this request. One of the cmirror release requirments for RHEL5 is to have all components of a cmirror removed after deletion. This really needs to be fixed in order for cmirrors to make RHEL5.3. Deletion: Must be able to remove a cluster mirror A removal must remove all components of the cluster mirror including the dm devices Do you have a guide for hitting this bug? Having trouble reproducing this issue with the latest build... Closing this bug since I'm currently unable to reproduce it. Will reopen if seen again. I appear to have just hit this, so I'm going to reopen and put into NEEDINFO. Here's the case that I saw this with: ============================================================ Iteration 378 of 10000 started at Tue Sep 23 10:03:27 CDT 2008 ============================================================ SCENARIO - [disklog_convert_during_down_convert] Create a 3-way core log mirror and attempt to down convert it to a 2-way disk log mirror hayes-02: lvcreate -m 2 --corelog -n double_convert -L 1G mirror_sanity Converting both the log (from core to disk) and of legs (from 3 to 2) Verify that both converts occured Deactivating mirror double_convert... and removing all double_convert mirror images weren't removed from dm on hayes-03 After the remove the dm devices were still present on hayes-03 only, but properly deleted on the other 2 machines in the cluster. [root@hayes-03 ~]# dmsetup ls mirror_sanity-double_convert_mimage_0 (253, 2) VolGroup00-LogVol01 (253, 1) VolGroup00-LogVol00 (253, 0) mirror_sanity-double_convert_mimage_1 (253, 3) As you can see from the iteration banner, this exact test case had previously run 377 times in a row without issue before this occurred. Reproduced this again after running just the disklog_convert_during_down_convert scenario of mirror_sanity. It took about an hour of running that test case before it tripped this issue. FYI- here are the latest pkg versions that this is been seen with. 2.6.18-115.gfs2abhi.001 lvm2-2.02.40-2.el5 BUILT: Fri Sep 19 09:46:26 CDT 2008 lvm2-cluster-2.02.40-2.el5 BUILT: Fri Sep 19 09:49:59 CDT 2008 device-mapper-1.02.28-2.el5 BUILT: Fri Sep 19 02:50:32 CDT 2008 cmirror-1.1.25-1.el5 BUILT: Fri Sep 19 16:27:46 CDT 2008 kmod-cmirror-0.1.17-1.el5 BUILT: Fri Sep 19 16:27:33 CDT 2008 openais-0.80.3-19.el5 BUILT: Tue 23 Sep 2008 12:58:51 PM CDT Unable to reproduce this issue with the latest rpms. Taking this BZ off the 5.3 list but will leave it open for documentation in case this issue still exists. 2.6.18-116.el5 lvm2-2.02.40-3.el5 BUILT: Thu Sep 25 14:59:07 CDT 2008 lvm2-cluster-2.02.40-3.el5 BUILT: Thu Sep 25 15:00:54 CDT 2008 device-mapper-1.02.28-2.el5 BUILT: Fri Sep 19 02:50:32 CDT 2008 cmirror-1.1.25-1.el5 BUILT: Fri Sep 19 16:27:46 CDT 2008 kmod-cmirror-0.1.17-1.el5 BUILT: Fri Sep 19 16:27:33 CDT 2008 Hit this during a random lvm_config iteration involving a cmirror. Test output: DEACTIVATING LVs IN mirror_1_5115 REMOVING LVs IN mirror_1_5115 Quick vgreduce/vgextend regression check (bz 427382) before cleaning up the volume group reducing vg by removing /dev/etherd/e1.1p3 verifying that /dev/etherd/e1.1p3 is actually gone and that all other pvs are not extending vg by added /dev/etherd/e1.1p3 back in REMOVING VG mirror_1_5115 REMOVING PVs /dev/etherd/e1.1p1 /dev/etherd/e1.1p2 /dev/etherd/e1.1p3 The mirror is removed and deactivated on hayes-01: Oct 9 14:25:51 hayes-01 qarshd[23879]: Running cmdline: vgchange -an mirror_1_5115 Oct 9 14:25:51 hayes-01 lvm[3427]: No longer monitoring mirror device mirror_1_5115-mirror_1_51150 for events Oct 9 14:25:52 hayes-01 xinetd[2744]: EXIT: qarsh status=0 pid=23879 duration=1(sec) Oct 9 14:25:52 hayes-01 xinetd[2744]: START: qarsh pid=23902 from=10.15.80.47 Oct 9 14:25:52 hayes-01 qarshd[23902]: Talking to peer 10.15.80.47:46428 Oct 9 14:25:52 hayes-01 qarshd[23902]: Running cmdline: lvremove -f mirror_1_5115 Oct 9 14:25:52 hayes-01 [3427]: Monitoring mirror device mirror_1_5115-mirror_1_51150 for events Oct 9 14:25:53 hayes-01 lvm[3427]: No longer monitoring mirror device mirror_1_5115-mirror_1_51150 for events Hayes-03 realizes that the mirror has been deactivated/removed: Oct 9 14:26:55 hayes-03 [3441]: Monitoring mirror device mirror_1_5115-mirror_1_51150 for events Oct 9 14:26:56 hayes-03 lvm[3441]: No longer monitoring mirror device mirror_1_5115-mirror_1_51150 for events The dm devices remain on hayes-03: [root@hayes-03 ~]# lvs -a -o +devices LV VG Attr LSize Origin Snap% Move Log Copy% Convert Devices LogVol00 VolGroup00 -wi-ao 72.44G /dev/sda2(0) LogVol01 VolGroup00 -wi-ao 1.94G /dev/sda2(2318) [root@hayes-03 ~]# dmsetup ls VolGroup00-LogVol01 (253, 1) VolGroup00-LogVol00 (253, 0) mirror_1_5115-mirror_1_51150_mimage_1 (253, 4) mirror_1_5115-mirror_1_51150_mimage_0 (253, 3) mirror_1_5115-mirror_1_51150_mlog (253, 2) Name: mirror_1_5115-mirror_1_51150_mimage_1 State: ACTIVE Read Ahead: 256 Tables present: LIVE Open count: 0 Event number: 0 Major, minor: 253, 4 Number of targets: 1 UUID: LVM-fBFpfzPpaMbP355RHXcAB8yWHcyNAa8RbqlJtK6LXqplJuB1Z2U1sqnxsaswtrbo Name: mirror_1_5115-mirror_1_51150_mimage_0 State: ACTIVE Read Ahead: 256 Tables present: LIVE Open count: 0 Event number: 0 Major, minor: 253, 3 Number of targets: 1 UUID: LVM-fBFpfzPpaMbP355RHXcAB8yWHcyNAa8R2esCsbPCK9aG0PRL8pSgty9MWCznExS9 Name: mirror_1_5115-mirror_1_51150_mlog State: ACTIVE Read Ahead: 256 Tables present: LIVE Open count: 0 Event number: 0 Major, minor: 253, 2 Number of targets: 1 UUID: LVM-fBFpfzPpaMbP355RHXcAB8yWHcyNAa8R7TedoKZZetO55lYWOd4i2gG9Nznbcf7v [root@hayes-03 ~]# /usr/tests/sts-rhel5.3/lvm2/bin/lvm_rpms 2.6.18-117.el5 lvm2-2.02.40-4.el5 BUILT: Mon Oct 6 05:21:53 CDT 2008 lvm2-cluster-2.02.40-4.el5 BUILT: Mon Oct 6 05:23:25 CDT 2008 device-mapper-1.02.28-2.el5 BUILT: Fri Sep 19 02:50:32 CDT 2008 cmirror-1.1.30-1.el5 BUILT: Wed Oct 8 18:18:36 CDT 2008 kmod-cmirror-0.1.18-1.el5 BUILT: Mon Sep 29 16:20:21 CDT 2008 Jon, Besides a nice an easily reproducible test case (which doesn't appear to exist) what else can I provide for this BZ? Proposing this issue for 5.4 consideration. I'm currently able to reproduce this every so often by running the verify_nosync_corelog_copy_percents test case of mirror_sanity. Just a note that I'm able to reproduce this bug with the 'thirty_two_mirrors' test case of mirror_sanity as well (though not every time). Removing the thirty two mirrors 32 Deactivating mirror thirty_32... and removing all thirty_32 mirror images weren't removed from dm on hayes-02 [root@hayes-01 ~]# dmsetup ls | grep thirty_32 [root@hayes-01 ~]# [root@hayes-02 ~]# dmsetup ls | grep thirty_32 mirror_sanity-thirty_32_mimage_1 (253, 128) mirror_sanity-thirty_32_mimage_0 (253, 127) mirror_sanity-thirty_32_mlog (253, 126) [root@hayes-02 ~]# [root@hayes-03 ~]# dmsetup ls | grep thirty_32 [root@hayes-03 ~]# Version: 2.6.18-125.el5 lvm2-2.02.40-6.el5 BUILT: Fri Oct 24 07:37:33 CDT 2008 lvm2-cluster-2.02.40-7.el5 BUILT: Wed Nov 26 07:19:19 CST 2008 device-mapper-1.02.28-2.el5 BUILT: Fri Sep 19 02:50:32 CDT 2008 cmirror-1.1.36-1.el5 BUILT: Tue Dec 9 16:38:13 CST 2008 kmod-cmirror-0.1.21-4.el5 BUILT: Tue Nov 18 14:49:49 CST 2008 I spoke with the other guys on the team. They seem to think this is udev related (i.e. udev has the device open, therefore, it cannot be deactivated). I am suspicious that it /could/ be something else that is keeping the device open... like the cluster log server. A verbose trace of the command used to remove the mirror could tell me why it is failing. Alternatively, if you have a simple reproducer that doesn't take too long; I'd be willing to spend some time hunting this myself. Otherwise, I need to defer to rhel5.5. I was able to reproduce this today on taft-02 running cmirror_lock_stress, though I wouldn't call that a reliable reproducer. I'll attach the syslog with all the debugging info incase that helps. [root@taft-01 ~]# lvs LV VG Attr LSize Origin Snap% Move Log Copy% Convert LogVol00 VolGroup00 -wi-ao 58.38G LogVol01 VolGroup00 -wi-ao 9.75G taft-02-bond.22182 lock_stress mwi--- 500.00M taft-02-bond.22182_mlog [root@taft-01 ~]# dmsetup ls VolGroup00-LogVol01 (253, 1) VolGroup00-LogVol00 (253, 0) [root@taft-02 ~]# lvs LV VG Attr LSize Origin Snap% Move Log Copy% Convert LogVol00 VolGroup00 -wi-ao 58.38G LogVol01 VolGroup00 -wi-ao 9.75G taft-02-bond.22182 lock_stress mwi--- 500.00M taft-02-bond.22182_mlog [root@taft-02 ~]# dmsetup ls lock_stress-taft--03--bond.20643_mimage_0 (253, 3) lock_stress-taft--02--bond.22182_mimage_2 (253, 21) lock_stress-taft--02--bond.22182_mimage_1 (253, 20) lock_stress-taft--02--bond.22182_mimage_0 (253, 19) lock_stress-taft--02--bond.22182_mlog (253, 12) lock_stress-taft--03--bond.20643_mlog (253, 2) lock_stress-taft--03--bond.20643_mimage_4 (253, 7) VolGroup00-LogVol01 (253, 1) lock_stress-taft--03--bond.20643_mimage_3 (253, 6) VolGroup00-LogVol00 (253, 0) lock_stress-taft--03--bond.20643_mimage_2 (253, 5) lock_stress-taft--03--bond.20643_mimage_1 (253, 4) Created attachment 338799 [details]
syslog from taft-02
We haven't tried to reproduce it but it looks like we hit this bug just by creating and removing LVs on a RHEL 5.2 box. FYI - Hit this over the weekend. SCENARIO - [force_mirror_layout] Create a mirror but force the layout of each leg on a specified device and then verify Verifing that the mirror is laid out properly image1 should be on /dev/etherd/e1.1p2 image2 should be on /dev/etherd/e1.1p4 log should be on /dev/etherd/e1.1p1 Deactivating mirror force_layout... and removing all force_layout mirror images weren't removed from dm on hayes-02 [root@hayes-01 ~]# dmsetup ls VolGroup00-LogVol01 (253, 1) VolGroup00-LogVol00 (253, 0) [root@hayes-02 ~]# dmsetup ls mirror_sanity-force_layout_mimage_0 (253, 3) mirror_sanity-force_layout_mlog (253, 2) VolGroup00-LogVol01 (253, 1) VolGroup00-LogVol00 (253, 0) mirror_sanity-force_layout_mimage_1 (253, 4) [root@hayes-03 ~]# dmsetup ls VolGroup00-LogVol01 (253, 1) VolGroup00-LogVol00 (253, 0) 2.6.18-162.el5 lvm2-2.02.46-8.el5 BUILT: Thu Jun 18 08:06:12 CDT 2009 lvm2-cluster-2.02.46-8.el5 BUILT: Thu Jun 18 08:05:27 CDT 2009 device-mapper-1.02.32-1.el5 BUILT: Thu May 21 02:18:23 CDT 2009 cmirror-1.1.39-2.el5 BUILT: Mon Jul 27 15:39:05 CDT 2009 kmod-cmirror-0.1.22-1.el5 BUILT: Mon Jul 27 15:28:46 CDT 2009 SCENARIO - [corelog_mirror] Create a mirror using a log in memory grant-02: lvcreate -m 1 -n core_log -L 2G --corelog mirror_sanity Deactivating mirror core_log... and removing all core_log mirror images weren't removed from dm on grant-03 2.6.18-164.el5 lvm2-2.02.46-8.el5_4.2 BUILT: Thu Oct 15 09:28:12 CDT 2009 lvm2-cluster-2.02.46-8.el5_4.1 BUILT: Wed Sep 16 05:31:16 CDT 2009 device-mapper-1.02.32-1.el5 BUILT: Thu May 21 02:18:23 CDT 2009 cmirror-1.1.39-2.el5 BUILT: Mon Jul 27 15:39:05 CDT 2009 kmod-cmirror-0.1.22-1.el5 BUILT: Mon Jul 27 15:28:46 CDT 2009 I'll try to spend some time on this now. Would you be willing to try to reproduce with the latest rpms (>= the following)? device-mapper-1.02.39-1.el5 BUILT: Wed Nov 11 12:31:44 CST 2009 device-mapper-event-1.02.39-1.el5 BUILT: Wed Nov 11 12:31:44 CST 2009 lvm2-2.02.56-1.el5 BUILT: Tue Nov 24 13:24:02 CST 2009 lvm2-cluster-2.02.56-1.el5 BUILT: Tue Nov 24 13:27:05 CST 2009 I wasn't able to reproduce this issue with the latest rpms (lvm2-2.02.56-6.el5/lvm2-cluster-2.02.56-6.el5). Marking closed, and will reopen if seen again. |