Bug 436839 - RHEL5 cmirror tracker: left over dm devices after deactivation and removal
Summary: RHEL5 cmirror tracker: left over dm devices after deactivation and removal
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: cmirror
Version: 5.2
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Jonathan Earl Brassow
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-03-10 18:55 UTC by Corey Marthaler
Modified: 2010-11-09 12:47 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-02-01 21:45:15 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
syslog from taft-02 (2.10 MB, text/plain)
2009-04-08 20:44 UTC, Corey Marthaler
no flags Details

Description Corey Marthaler 2008-03-10 18:55:52 UTC
Description of problem:
Device mapper devices are being left around after deactivating and deleting the
volumes. I've reproduced this now a few times after successfully converting
mirrors to and from disk and core log. The remove typically does what it's
supposed to but not always. So far the left over devices have only existed on
one of the nodes in the cluster.

taft-02: lvchange -an /dev/mirror_sanity/mirror_log_convert
taft-02: lvremove -f /dev/mirror_sanity/mirror_log_convert

[root@taft-01 ~]# dmsetup ls
mirror_sanity-mirror_log_convert_mimage_1       (253, 3)
VolGroup00-LogVol01     (253, 1)
mirror_sanity-mirror_log_convert_mimage_0       (253, 2)
VolGroup00-LogVol00     (253, 0)

Version-Release number of selected component (if applicable):
Linux taft-01 2.6.18-71.el5 #1 SMP Sat Jan 19 22:03:16 EST 2008 x86_64 x86_64
x86_64 GNU/Linux

lvm2-2.02.32-2.el5
lvm2-cluster-2.02.32-2.el5
cmirror-1.1.15-1.el5
kmod-cmirror-0.1.8-1.el5
device-mapper-1.02.24-1.el5

Comment 1 Corey Marthaler 2008-04-10 21:30:44 UTC
This bug still exists in the latest packages that I have:

2.6.18-85.el5
lvm2-2.02.32-4.el5
lvm2-cluster-2.02.32-4.el5
cmirror-1.1.15-1.el5
kmod-cmirror-0.1.8-1.el5
device-mapper-1.02.24-1.el5
openais-0.80.3-15.el5

Comment 2 RHEL Program Management 2008-07-14 15:01:23 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 3 RHEL Program Management 2008-07-14 20:58:17 UTC
Development Management has reviewed and declined this request.  You may appeal
this decision by reopening this request. 

Comment 4 Corey Marthaler 2008-07-15 13:53:30 UTC
One of the cmirror release requirments for RHEL5 is to have all components of a
cmirror removed after deletion. This really needs to be fixed in order for
cmirrors to make RHEL5.3.

Deletion:
       Must be able to remove a cluster mirror
       A removal must remove all components of the cluster mirror including
       the dm devices


Comment 5 Jonathan Earl Brassow 2008-07-21 14:44:32 UTC
Do you have a guide for hitting this bug?


Comment 6 Corey Marthaler 2008-08-20 19:24:42 UTC
Having trouble reproducing this issue with the latest build...

Comment 7 Corey Marthaler 2008-08-25 22:34:50 UTC
Closing this bug since I'm currently unable to reproduce it. Will reopen if seen again.

Comment 8 Corey Marthaler 2008-09-23 16:42:16 UTC
I appear to have just hit this, so I'm going to reopen and put into NEEDINFO.

Here's the case that I saw this with:

============================================================
Iteration 378 of 10000 started at Tue Sep 23 10:03:27 CDT 2008
============================================================
SCENARIO - [disklog_convert_during_down_convert]
Create a 3-way core log mirror and attempt to down convert it to a 2-way disk log mirror
hayes-02: lvcreate -m 2 --corelog -n double_convert -L 1G mirror_sanity
Converting both the log (from core to disk) and of legs (from 3 to 2)
Verify that both converts occured
Deactivating mirror double_convert... and removing
all double_convert mirror images weren't removed from dm on hayes-03

After the remove the dm devices were still present on hayes-03 only, but properly deleted on the other 2 machines in the cluster.

[root@hayes-03 ~]# dmsetup ls
mirror_sanity-double_convert_mimage_0   (253, 2)
VolGroup00-LogVol01     (253, 1)
VolGroup00-LogVol00     (253, 0)
mirror_sanity-double_convert_mimage_1   (253, 3)

As you can see from the iteration banner, this exact test case had previously run 377 times in a row without issue before this occurred.

Comment 9 Corey Marthaler 2008-09-23 19:14:33 UTC
Reproduced this again after running just the disklog_convert_during_down_convert scenario of mirror_sanity. It took about an hour of running that test case before it tripped this issue.

Comment 10 Corey Marthaler 2008-09-23 20:16:26 UTC
FYI- here are the latest pkg versions that this is been seen with.

2.6.18-115.gfs2abhi.001

lvm2-2.02.40-2.el5    BUILT: Fri Sep 19 09:46:26 CDT 2008
lvm2-cluster-2.02.40-2.el5    BUILT: Fri Sep 19 09:49:59 CDT 2008
device-mapper-1.02.28-2.el5    BUILT: Fri Sep 19 02:50:32 CDT 2008
cmirror-1.1.25-1.el5    BUILT: Fri Sep 19 16:27:46 CDT 2008
kmod-cmirror-0.1.17-1.el5    BUILT: Fri Sep 19 16:27:33 CDT 2008
openais-0.80.3-19.el5     BUILT: Tue 23 Sep 2008 12:58:51 PM CDT

Comment 11 Corey Marthaler 2008-09-30 14:56:24 UTC
Unable to reproduce this issue with the latest rpms. Taking this BZ off the 5.3 list but will leave it open for documentation in case this issue still exists.

2.6.18-116.el5

lvm2-2.02.40-3.el5    BUILT: Thu Sep 25 14:59:07 CDT 2008
lvm2-cluster-2.02.40-3.el5    BUILT: Thu Sep 25 15:00:54 CDT 2008
device-mapper-1.02.28-2.el5    BUILT: Fri Sep 19 02:50:32 CDT 2008
cmirror-1.1.25-1.el5    BUILT: Fri Sep 19 16:27:46 CDT 2008
kmod-cmirror-0.1.17-1.el5    BUILT: Fri Sep 19 16:27:33 CDT 2008

Comment 12 Corey Marthaler 2008-10-13 14:56:41 UTC
Hit this during a random lvm_config iteration involving a cmirror.

Test output:
DEACTIVATING LVs IN mirror_1_5115
REMOVING LVs IN mirror_1_5115

Quick vgreduce/vgextend regression check (bz 427382) before cleaning up the volume group
reducing vg by removing /dev/etherd/e1.1p3
verifying that /dev/etherd/e1.1p3 is actually gone and that all other pvs are not
extending vg by added /dev/etherd/e1.1p3 back in

REMOVING VG mirror_1_5115
REMOVING PVs /dev/etherd/e1.1p1 /dev/etherd/e1.1p2 /dev/etherd/e1.1p3


The mirror is removed and deactivated on hayes-01:

Oct  9 14:25:51 hayes-01 qarshd[23879]: Running cmdline: vgchange -an mirror_1_5115
Oct  9 14:25:51 hayes-01 lvm[3427]: No longer monitoring mirror device mirror_1_5115-mirror_1_51150 for events
Oct  9 14:25:52 hayes-01 xinetd[2744]: EXIT: qarsh status=0 pid=23879 duration=1(sec)
Oct  9 14:25:52 hayes-01 xinetd[2744]: START: qarsh pid=23902 from=10.15.80.47
Oct  9 14:25:52 hayes-01 qarshd[23902]: Talking to peer 10.15.80.47:46428
Oct  9 14:25:52 hayes-01 qarshd[23902]: Running cmdline: lvremove -f mirror_1_5115
Oct  9 14:25:52 hayes-01 [3427]: Monitoring mirror device mirror_1_5115-mirror_1_51150 for events
Oct  9 14:25:53 hayes-01 lvm[3427]: No longer monitoring mirror device mirror_1_5115-mirror_1_51150 for events


Hayes-03 realizes that the mirror has been deactivated/removed:

Oct  9 14:26:55 hayes-03 [3441]: Monitoring mirror device mirror_1_5115-mirror_1_51150 for events
Oct  9 14:26:56 hayes-03 lvm[3441]: No longer monitoring mirror device mirror_1_5115-mirror_1_51150 for events


The dm devices remain on hayes-03:

[root@hayes-03 ~]# lvs -a -o +devices
  LV       VG         Attr   LSize  Origin Snap%  Move Log Copy%  Convert Devices        
  LogVol00 VolGroup00 -wi-ao 72.44G                                       /dev/sda2(0)   
  LogVol01 VolGroup00 -wi-ao  1.94G                                       /dev/sda2(2318)

[root@hayes-03 ~]# dmsetup ls
VolGroup00-LogVol01     (253, 1)
VolGroup00-LogVol00     (253, 0)
mirror_1_5115-mirror_1_51150_mimage_1   (253, 4)
mirror_1_5115-mirror_1_51150_mimage_0   (253, 3)
mirror_1_5115-mirror_1_51150_mlog       (253, 2)

Name:              mirror_1_5115-mirror_1_51150_mimage_1
State:             ACTIVE
Read Ahead:        256
Tables present:    LIVE
Open count:        0
Event number:      0
Major, minor:      253, 4
Number of targets: 1
UUID: LVM-fBFpfzPpaMbP355RHXcAB8yWHcyNAa8RbqlJtK6LXqplJuB1Z2U1sqnxsaswtrbo

Name:              mirror_1_5115-mirror_1_51150_mimage_0
State:             ACTIVE
Read Ahead:        256
Tables present:    LIVE
Open count:        0
Event number:      0
Major, minor:      253, 3
Number of targets: 1
UUID: LVM-fBFpfzPpaMbP355RHXcAB8yWHcyNAa8R2esCsbPCK9aG0PRL8pSgty9MWCznExS9

Name:              mirror_1_5115-mirror_1_51150_mlog
State:             ACTIVE
Read Ahead:        256
Tables present:    LIVE
Open count:        0
Event number:      0
Major, minor:      253, 2
Number of targets: 1
UUID: LVM-fBFpfzPpaMbP355RHXcAB8yWHcyNAa8R7TedoKZZetO55lYWOd4i2gG9Nznbcf7v



[root@hayes-03 ~]# /usr/tests/sts-rhel5.3/lvm2/bin/lvm_rpms 
2.6.18-117.el5

lvm2-2.02.40-4.el5    BUILT: Mon Oct  6 05:21:53 CDT 2008
lvm2-cluster-2.02.40-4.el5    BUILT: Mon Oct  6 05:23:25 CDT 2008
device-mapper-1.02.28-2.el5    BUILT: Fri Sep 19 02:50:32 CDT 2008
cmirror-1.1.30-1.el5    BUILT: Wed Oct  8 18:18:36 CDT 2008
kmod-cmirror-0.1.18-1.el5    BUILT: Mon Sep 29 16:20:21 CDT 2008

Comment 13 Corey Marthaler 2008-10-13 14:58:00 UTC
Jon,

Besides a nice an easily reproducible test case (which doesn't appear to exist) what else can I provide for this BZ?

Comment 14 Corey Marthaler 2008-11-21 21:45:48 UTC
Proposing this issue for 5.4 consideration. I'm currently able to reproduce this every so often by running the verify_nosync_corelog_copy_percents test case of mirror_sanity.

Comment 15 Corey Marthaler 2008-12-10 20:23:28 UTC
Just a note that I'm able to reproduce this bug with the 'thirty_two_mirrors' test case of mirror_sanity as well (though not every time).

Removing the thirty two mirrors
32 Deactivating mirror thirty_32... and removing
all thirty_32 mirror images weren't removed from dm on hayes-02

[root@hayes-01 ~]# dmsetup ls | grep thirty_32
[root@hayes-01 ~]# 

[root@hayes-02 ~]# dmsetup ls | grep thirty_32
mirror_sanity-thirty_32_mimage_1        (253, 128)
mirror_sanity-thirty_32_mimage_0        (253, 127)
mirror_sanity-thirty_32_mlog    (253, 126)
[root@hayes-02 ~]# 

[root@hayes-03 ~]# dmsetup ls | grep thirty_32
[root@hayes-03 ~]# 

Version:
2.6.18-125.el5

lvm2-2.02.40-6.el5    BUILT: Fri Oct 24 07:37:33 CDT 2008
lvm2-cluster-2.02.40-7.el5    BUILT: Wed Nov 26 07:19:19 CST 2008
device-mapper-1.02.28-2.el5    BUILT: Fri Sep 19 02:50:32 CDT 2008
cmirror-1.1.36-1.el5    BUILT: Tue Dec  9 16:38:13 CST 2008
kmod-cmirror-0.1.21-4.el5    BUILT: Tue Nov 18 14:49:49 CST 2008

Comment 16 Jonathan Earl Brassow 2009-03-02 17:51:33 UTC
I spoke with the other guys on the team.  They seem to think this is udev related (i.e. udev has the device open, therefore, it cannot be deactivated).  I am suspicious that it /could/ be something else that is keeping the device open... like the cluster log server.

A verbose trace of the command used to remove the mirror could tell me why it is failing.  Alternatively, if you have a simple reproducer that doesn't take too long; I'd be willing to spend some time hunting this myself.  Otherwise, I need to defer to rhel5.5.

Comment 17 Corey Marthaler 2009-04-08 20:40:13 UTC
I was able to reproduce this today on taft-02 running cmirror_lock_stress, though I wouldn't call that a reliable reproducer. 

I'll attach the syslog with all the debugging info incase that helps.

[root@taft-01 ~]# lvs
  LV                 VG          Attr   LSize   Origin Snap%  Move Log                     Copy%  Convert
  LogVol00           VolGroup00  -wi-ao  58.38G                                                          
  LogVol01           VolGroup00  -wi-ao   9.75G                                                          
  taft-02-bond.22182 lock_stress mwi--- 500.00M                    taft-02-bond.22182_mlog               

[root@taft-01 ~]# dmsetup ls
VolGroup00-LogVol01     (253, 1)
VolGroup00-LogVol00     (253, 0)

[root@taft-02 ~]# lvs
  LV                 VG          Attr   LSize   Origin Snap%  Move Log                     Copy%  Convert
  LogVol00           VolGroup00  -wi-ao  58.38G                                                          
  LogVol01           VolGroup00  -wi-ao   9.75G                                                          
  taft-02-bond.22182 lock_stress mwi--- 500.00M                    taft-02-bond.22182_mlog               

[root@taft-02 ~]# dmsetup ls
lock_stress-taft--03--bond.20643_mimage_0       (253, 3)
lock_stress-taft--02--bond.22182_mimage_2       (253, 21)
lock_stress-taft--02--bond.22182_mimage_1       (253, 20)
lock_stress-taft--02--bond.22182_mimage_0       (253, 19)
lock_stress-taft--02--bond.22182_mlog   (253, 12)
lock_stress-taft--03--bond.20643_mlog   (253, 2)
lock_stress-taft--03--bond.20643_mimage_4       (253, 7)
VolGroup00-LogVol01     (253, 1)
lock_stress-taft--03--bond.20643_mimage_3       (253, 6)
VolGroup00-LogVol00     (253, 0)
lock_stress-taft--03--bond.20643_mimage_2       (253, 5)
lock_stress-taft--03--bond.20643_mimage_1       (253, 4)

Comment 18 Corey Marthaler 2009-04-08 20:44:51 UTC
Created attachment 338799 [details]
syslog from taft-02

Comment 19 Mark Plaksin 2009-06-11 19:54:32 UTC
We haven't tried to reproduce it but it looks like we hit this bug just by creating and removing LVs on a RHEL 5.2 box.

Comment 20 Corey Marthaler 2009-08-31 14:53:13 UTC
FYI - Hit this over the weekend.

SCENARIO - [force_mirror_layout]
Create a mirror but force the layout of each leg on a specified device and then verify
Verifing that the mirror is laid out properly
image1 should be on /dev/etherd/e1.1p2
image2 should be on /dev/etherd/e1.1p4
log should be on /dev/etherd/e1.1p1
Deactivating mirror force_layout... and removing
all force_layout mirror images weren't removed from dm on hayes-02


[root@hayes-01 ~]# dmsetup ls
VolGroup00-LogVol01     (253, 1)
VolGroup00-LogVol00     (253, 0)

[root@hayes-02 ~]# dmsetup ls
mirror_sanity-force_layout_mimage_0     (253, 3)
mirror_sanity-force_layout_mlog (253, 2)
VolGroup00-LogVol01     (253, 1)
VolGroup00-LogVol00     (253, 0)
mirror_sanity-force_layout_mimage_1     (253, 4)

[root@hayes-03 ~]# dmsetup ls
VolGroup00-LogVol01     (253, 1)
VolGroup00-LogVol00     (253, 0)


2.6.18-162.el5                                                                                                                                                                                                                                        lvm2-2.02.46-8.el5    BUILT: Thu Jun 18 08:06:12 CDT 2009                                                                  lvm2-cluster-2.02.46-8.el5    BUILT: Thu Jun 18 08:05:27 CDT 2009
device-mapper-1.02.32-1.el5    BUILT: Thu May 21 02:18:23 CDT 2009
cmirror-1.1.39-2.el5    BUILT: Mon Jul 27 15:39:05 CDT 2009
kmod-cmirror-0.1.22-1.el5    BUILT: Mon Jul 27 15:28:46 CDT 2009

Comment 21 Corey Marthaler 2009-10-23 17:51:18 UTC
SCENARIO - [corelog_mirror]
Create a mirror using a log in memory
grant-02: lvcreate -m 1 -n core_log -L 2G --corelog mirror_sanity
Deactivating mirror core_log... and removing
all core_log mirror images weren't removed from dm on grant-03

2.6.18-164.el5                                                                                                                           
lvm2-2.02.46-8.el5_4.2    BUILT: Thu Oct 15 09:28:12 CDT 2009
lvm2-cluster-2.02.46-8.el5_4.1    BUILT: Wed Sep 16 05:31:16 CDT 2009
device-mapper-1.02.32-1.el5    BUILT: Thu May 21 02:18:23 CDT 2009
cmirror-1.1.39-2.el5    BUILT: Mon Jul 27 15:39:05 CDT 2009
kmod-cmirror-0.1.22-1.el5    BUILT: Mon Jul 27 15:28:46 CDT 2009

Comment 22 Jonathan Earl Brassow 2009-12-03 18:16:59 UTC
I'll try to spend some time on this now.  Would you be willing to try to reproduce with the latest rpms (>= the following)?
device-mapper-1.02.39-1.el5    BUILT: Wed Nov 11 12:31:44 CST 2009
device-mapper-event-1.02.39-1.el5    BUILT: Wed Nov 11 12:31:44 CST 2009
lvm2-2.02.56-1.el5    BUILT: Tue Nov 24 13:24:02 CST 2009
lvm2-cluster-2.02.56-1.el5    BUILT: Tue Nov 24 13:27:05 CST 2009

Comment 25 Corey Marthaler 2010-02-01 21:45:15 UTC
I wasn't able to reproduce this issue with the latest rpms (lvm2-2.02.56-6.el5/lvm2-cluster-2.02.56-6.el5). 

Marking closed, and will reopen if seen again.


Note You need to log in before you can comment on or make changes to this bug.