Bug 436839 - RHEL5 cmirror tracker: left over dm devices after deactivation and removal
RHEL5 cmirror tracker: left over dm devices after deactivation and removal
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: cmirror (Show other bugs)
5.2
All Linux
medium Severity medium
: rc
: ---
Assigned To: Jonathan Earl Brassow
Cluster QE
: Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-03-10 14:55 EDT by Corey Marthaler
Modified: 2010-11-09 07:47 EST (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-02-01 16:45:15 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
syslog from taft-02 (2.10 MB, text/plain)
2009-04-08 16:44 EDT, Corey Marthaler
no flags Details

  None (edit)
Description Corey Marthaler 2008-03-10 14:55:52 EDT
Description of problem:
Device mapper devices are being left around after deactivating and deleting the
volumes. I've reproduced this now a few times after successfully converting
mirrors to and from disk and core log. The remove typically does what it's
supposed to but not always. So far the left over devices have only existed on
one of the nodes in the cluster.

taft-02: lvchange -an /dev/mirror_sanity/mirror_log_convert
taft-02: lvremove -f /dev/mirror_sanity/mirror_log_convert

[root@taft-01 ~]# dmsetup ls
mirror_sanity-mirror_log_convert_mimage_1       (253, 3)
VolGroup00-LogVol01     (253, 1)
mirror_sanity-mirror_log_convert_mimage_0       (253, 2)
VolGroup00-LogVol00     (253, 0)

Version-Release number of selected component (if applicable):
Linux taft-01 2.6.18-71.el5 #1 SMP Sat Jan 19 22:03:16 EST 2008 x86_64 x86_64
x86_64 GNU/Linux

lvm2-2.02.32-2.el5
lvm2-cluster-2.02.32-2.el5
cmirror-1.1.15-1.el5
kmod-cmirror-0.1.8-1.el5
device-mapper-1.02.24-1.el5
Comment 1 Corey Marthaler 2008-04-10 17:30:44 EDT
This bug still exists in the latest packages that I have:

2.6.18-85.el5
lvm2-2.02.32-4.el5
lvm2-cluster-2.02.32-4.el5
cmirror-1.1.15-1.el5
kmod-cmirror-0.1.8-1.el5
device-mapper-1.02.24-1.el5
openais-0.80.3-15.el5
Comment 2 RHEL Product and Program Management 2008-07-14 11:01:23 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 3 RHEL Product and Program Management 2008-07-14 16:58:17 EDT
Development Management has reviewed and declined this request.  You may appeal
this decision by reopening this request. 
Comment 4 Corey Marthaler 2008-07-15 09:53:30 EDT
One of the cmirror release requirments for RHEL5 is to have all components of a
cmirror removed after deletion. This really needs to be fixed in order for
cmirrors to make RHEL5.3.

Deletion:
       Must be able to remove a cluster mirror
       A removal must remove all components of the cluster mirror including
       the dm devices
Comment 5 Jonathan Earl Brassow 2008-07-21 10:44:32 EDT
Do you have a guide for hitting this bug?
Comment 6 Corey Marthaler 2008-08-20 15:24:42 EDT
Having trouble reproducing this issue with the latest build...
Comment 7 Corey Marthaler 2008-08-25 18:34:50 EDT
Closing this bug since I'm currently unable to reproduce it. Will reopen if seen again.
Comment 8 Corey Marthaler 2008-09-23 12:42:16 EDT
I appear to have just hit this, so I'm going to reopen and put into NEEDINFO.

Here's the case that I saw this with:

============================================================
Iteration 378 of 10000 started at Tue Sep 23 10:03:27 CDT 2008
============================================================
SCENARIO - [disklog_convert_during_down_convert]
Create a 3-way core log mirror and attempt to down convert it to a 2-way disk log mirror
hayes-02: lvcreate -m 2 --corelog -n double_convert -L 1G mirror_sanity
Converting both the log (from core to disk) and of legs (from 3 to 2)
Verify that both converts occured
Deactivating mirror double_convert... and removing
all double_convert mirror images weren't removed from dm on hayes-03

After the remove the dm devices were still present on hayes-03 only, but properly deleted on the other 2 machines in the cluster.

[root@hayes-03 ~]# dmsetup ls
mirror_sanity-double_convert_mimage_0   (253, 2)
VolGroup00-LogVol01     (253, 1)
VolGroup00-LogVol00     (253, 0)
mirror_sanity-double_convert_mimage_1   (253, 3)

As you can see from the iteration banner, this exact test case had previously run 377 times in a row without issue before this occurred.
Comment 9 Corey Marthaler 2008-09-23 15:14:33 EDT
Reproduced this again after running just the disklog_convert_during_down_convert scenario of mirror_sanity. It took about an hour of running that test case before it tripped this issue.
Comment 10 Corey Marthaler 2008-09-23 16:16:26 EDT
FYI- here are the latest pkg versions that this is been seen with.

2.6.18-115.gfs2abhi.001

lvm2-2.02.40-2.el5    BUILT: Fri Sep 19 09:46:26 CDT 2008
lvm2-cluster-2.02.40-2.el5    BUILT: Fri Sep 19 09:49:59 CDT 2008
device-mapper-1.02.28-2.el5    BUILT: Fri Sep 19 02:50:32 CDT 2008
cmirror-1.1.25-1.el5    BUILT: Fri Sep 19 16:27:46 CDT 2008
kmod-cmirror-0.1.17-1.el5    BUILT: Fri Sep 19 16:27:33 CDT 2008
openais-0.80.3-19.el5     BUILT: Tue 23 Sep 2008 12:58:51 PM CDT
Comment 11 Corey Marthaler 2008-09-30 10:56:24 EDT
Unable to reproduce this issue with the latest rpms. Taking this BZ off the 5.3 list but will leave it open for documentation in case this issue still exists.

2.6.18-116.el5

lvm2-2.02.40-3.el5    BUILT: Thu Sep 25 14:59:07 CDT 2008
lvm2-cluster-2.02.40-3.el5    BUILT: Thu Sep 25 15:00:54 CDT 2008
device-mapper-1.02.28-2.el5    BUILT: Fri Sep 19 02:50:32 CDT 2008
cmirror-1.1.25-1.el5    BUILT: Fri Sep 19 16:27:46 CDT 2008
kmod-cmirror-0.1.17-1.el5    BUILT: Fri Sep 19 16:27:33 CDT 2008
Comment 12 Corey Marthaler 2008-10-13 10:56:41 EDT
Hit this during a random lvm_config iteration involving a cmirror.

Test output:
DEACTIVATING LVs IN mirror_1_5115
REMOVING LVs IN mirror_1_5115

Quick vgreduce/vgextend regression check (bz 427382) before cleaning up the volume group
reducing vg by removing /dev/etherd/e1.1p3
verifying that /dev/etherd/e1.1p3 is actually gone and that all other pvs are not
extending vg by added /dev/etherd/e1.1p3 back in

REMOVING VG mirror_1_5115
REMOVING PVs /dev/etherd/e1.1p1 /dev/etherd/e1.1p2 /dev/etherd/e1.1p3


The mirror is removed and deactivated on hayes-01:

Oct  9 14:25:51 hayes-01 qarshd[23879]: Running cmdline: vgchange -an mirror_1_5115
Oct  9 14:25:51 hayes-01 lvm[3427]: No longer monitoring mirror device mirror_1_5115-mirror_1_51150 for events
Oct  9 14:25:52 hayes-01 xinetd[2744]: EXIT: qarsh status=0 pid=23879 duration=1(sec)
Oct  9 14:25:52 hayes-01 xinetd[2744]: START: qarsh pid=23902 from=10.15.80.47
Oct  9 14:25:52 hayes-01 qarshd[23902]: Talking to peer 10.15.80.47:46428
Oct  9 14:25:52 hayes-01 qarshd[23902]: Running cmdline: lvremove -f mirror_1_5115
Oct  9 14:25:52 hayes-01 [3427]: Monitoring mirror device mirror_1_5115-mirror_1_51150 for events
Oct  9 14:25:53 hayes-01 lvm[3427]: No longer monitoring mirror device mirror_1_5115-mirror_1_51150 for events


Hayes-03 realizes that the mirror has been deactivated/removed:

Oct  9 14:26:55 hayes-03 [3441]: Monitoring mirror device mirror_1_5115-mirror_1_51150 for events
Oct  9 14:26:56 hayes-03 lvm[3441]: No longer monitoring mirror device mirror_1_5115-mirror_1_51150 for events


The dm devices remain on hayes-03:

[root@hayes-03 ~]# lvs -a -o +devices
  LV       VG         Attr   LSize  Origin Snap%  Move Log Copy%  Convert Devices        
  LogVol00 VolGroup00 -wi-ao 72.44G                                       /dev/sda2(0)   
  LogVol01 VolGroup00 -wi-ao  1.94G                                       /dev/sda2(2318)

[root@hayes-03 ~]# dmsetup ls
VolGroup00-LogVol01     (253, 1)
VolGroup00-LogVol00     (253, 0)
mirror_1_5115-mirror_1_51150_mimage_1   (253, 4)
mirror_1_5115-mirror_1_51150_mimage_0   (253, 3)
mirror_1_5115-mirror_1_51150_mlog       (253, 2)

Name:              mirror_1_5115-mirror_1_51150_mimage_1
State:             ACTIVE
Read Ahead:        256
Tables present:    LIVE
Open count:        0
Event number:      0
Major, minor:      253, 4
Number of targets: 1
UUID: LVM-fBFpfzPpaMbP355RHXcAB8yWHcyNAa8RbqlJtK6LXqplJuB1Z2U1sqnxsaswtrbo

Name:              mirror_1_5115-mirror_1_51150_mimage_0
State:             ACTIVE
Read Ahead:        256
Tables present:    LIVE
Open count:        0
Event number:      0
Major, minor:      253, 3
Number of targets: 1
UUID: LVM-fBFpfzPpaMbP355RHXcAB8yWHcyNAa8R2esCsbPCK9aG0PRL8pSgty9MWCznExS9

Name:              mirror_1_5115-mirror_1_51150_mlog
State:             ACTIVE
Read Ahead:        256
Tables present:    LIVE
Open count:        0
Event number:      0
Major, minor:      253, 2
Number of targets: 1
UUID: LVM-fBFpfzPpaMbP355RHXcAB8yWHcyNAa8R7TedoKZZetO55lYWOd4i2gG9Nznbcf7v



[root@hayes-03 ~]# /usr/tests/sts-rhel5.3/lvm2/bin/lvm_rpms 
2.6.18-117.el5

lvm2-2.02.40-4.el5    BUILT: Mon Oct  6 05:21:53 CDT 2008
lvm2-cluster-2.02.40-4.el5    BUILT: Mon Oct  6 05:23:25 CDT 2008
device-mapper-1.02.28-2.el5    BUILT: Fri Sep 19 02:50:32 CDT 2008
cmirror-1.1.30-1.el5    BUILT: Wed Oct  8 18:18:36 CDT 2008
kmod-cmirror-0.1.18-1.el5    BUILT: Mon Sep 29 16:20:21 CDT 2008
Comment 13 Corey Marthaler 2008-10-13 10:58:00 EDT
Jon,

Besides a nice an easily reproducible test case (which doesn't appear to exist) what else can I provide for this BZ?
Comment 14 Corey Marthaler 2008-11-21 16:45:48 EST
Proposing this issue for 5.4 consideration. I'm currently able to reproduce this every so often by running the verify_nosync_corelog_copy_percents test case of mirror_sanity.
Comment 15 Corey Marthaler 2008-12-10 15:23:28 EST
Just a note that I'm able to reproduce this bug with the 'thirty_two_mirrors' test case of mirror_sanity as well (though not every time).

Removing the thirty two mirrors
32 Deactivating mirror thirty_32... and removing
all thirty_32 mirror images weren't removed from dm on hayes-02

[root@hayes-01 ~]# dmsetup ls | grep thirty_32
[root@hayes-01 ~]# 

[root@hayes-02 ~]# dmsetup ls | grep thirty_32
mirror_sanity-thirty_32_mimage_1        (253, 128)
mirror_sanity-thirty_32_mimage_0        (253, 127)
mirror_sanity-thirty_32_mlog    (253, 126)
[root@hayes-02 ~]# 

[root@hayes-03 ~]# dmsetup ls | grep thirty_32
[root@hayes-03 ~]# 

Version:
2.6.18-125.el5

lvm2-2.02.40-6.el5    BUILT: Fri Oct 24 07:37:33 CDT 2008
lvm2-cluster-2.02.40-7.el5    BUILT: Wed Nov 26 07:19:19 CST 2008
device-mapper-1.02.28-2.el5    BUILT: Fri Sep 19 02:50:32 CDT 2008
cmirror-1.1.36-1.el5    BUILT: Tue Dec  9 16:38:13 CST 2008
kmod-cmirror-0.1.21-4.el5    BUILT: Tue Nov 18 14:49:49 CST 2008
Comment 16 Jonathan Earl Brassow 2009-03-02 12:51:33 EST
I spoke with the other guys on the team.  They seem to think this is udev related (i.e. udev has the device open, therefore, it cannot be deactivated).  I am suspicious that it /could/ be something else that is keeping the device open... like the cluster log server.

A verbose trace of the command used to remove the mirror could tell me why it is failing.  Alternatively, if you have a simple reproducer that doesn't take too long; I'd be willing to spend some time hunting this myself.  Otherwise, I need to defer to rhel5.5.
Comment 17 Corey Marthaler 2009-04-08 16:40:13 EDT
I was able to reproduce this today on taft-02 running cmirror_lock_stress, though I wouldn't call that a reliable reproducer. 

I'll attach the syslog with all the debugging info incase that helps.

[root@taft-01 ~]# lvs
  LV                 VG          Attr   LSize   Origin Snap%  Move Log                     Copy%  Convert
  LogVol00           VolGroup00  -wi-ao  58.38G                                                          
  LogVol01           VolGroup00  -wi-ao   9.75G                                                          
  taft-02-bond.22182 lock_stress mwi--- 500.00M                    taft-02-bond.22182_mlog               

[root@taft-01 ~]# dmsetup ls
VolGroup00-LogVol01     (253, 1)
VolGroup00-LogVol00     (253, 0)

[root@taft-02 ~]# lvs
  LV                 VG          Attr   LSize   Origin Snap%  Move Log                     Copy%  Convert
  LogVol00           VolGroup00  -wi-ao  58.38G                                                          
  LogVol01           VolGroup00  -wi-ao   9.75G                                                          
  taft-02-bond.22182 lock_stress mwi--- 500.00M                    taft-02-bond.22182_mlog               

[root@taft-02 ~]# dmsetup ls
lock_stress-taft--03--bond.20643_mimage_0       (253, 3)
lock_stress-taft--02--bond.22182_mimage_2       (253, 21)
lock_stress-taft--02--bond.22182_mimage_1       (253, 20)
lock_stress-taft--02--bond.22182_mimage_0       (253, 19)
lock_stress-taft--02--bond.22182_mlog   (253, 12)
lock_stress-taft--03--bond.20643_mlog   (253, 2)
lock_stress-taft--03--bond.20643_mimage_4       (253, 7)
VolGroup00-LogVol01     (253, 1)
lock_stress-taft--03--bond.20643_mimage_3       (253, 6)
VolGroup00-LogVol00     (253, 0)
lock_stress-taft--03--bond.20643_mimage_2       (253, 5)
lock_stress-taft--03--bond.20643_mimage_1       (253, 4)
Comment 18 Corey Marthaler 2009-04-08 16:44:51 EDT
Created attachment 338799 [details]
syslog from taft-02
Comment 19 Mark Plaksin 2009-06-11 15:54:32 EDT
We haven't tried to reproduce it but it looks like we hit this bug just by creating and removing LVs on a RHEL 5.2 box.
Comment 20 Corey Marthaler 2009-08-31 10:53:13 EDT
FYI - Hit this over the weekend.

SCENARIO - [force_mirror_layout]
Create a mirror but force the layout of each leg on a specified device and then verify
Verifing that the mirror is laid out properly
image1 should be on /dev/etherd/e1.1p2
image2 should be on /dev/etherd/e1.1p4
log should be on /dev/etherd/e1.1p1
Deactivating mirror force_layout... and removing
all force_layout mirror images weren't removed from dm on hayes-02


[root@hayes-01 ~]# dmsetup ls
VolGroup00-LogVol01     (253, 1)
VolGroup00-LogVol00     (253, 0)

[root@hayes-02 ~]# dmsetup ls
mirror_sanity-force_layout_mimage_0     (253, 3)
mirror_sanity-force_layout_mlog (253, 2)
VolGroup00-LogVol01     (253, 1)
VolGroup00-LogVol00     (253, 0)
mirror_sanity-force_layout_mimage_1     (253, 4)

[root@hayes-03 ~]# dmsetup ls
VolGroup00-LogVol01     (253, 1)
VolGroup00-LogVol00     (253, 0)


2.6.18-162.el5                                                                                                                                                                                                                                        lvm2-2.02.46-8.el5    BUILT: Thu Jun 18 08:06:12 CDT 2009                                                                  lvm2-cluster-2.02.46-8.el5    BUILT: Thu Jun 18 08:05:27 CDT 2009
device-mapper-1.02.32-1.el5    BUILT: Thu May 21 02:18:23 CDT 2009
cmirror-1.1.39-2.el5    BUILT: Mon Jul 27 15:39:05 CDT 2009
kmod-cmirror-0.1.22-1.el5    BUILT: Mon Jul 27 15:28:46 CDT 2009
Comment 21 Corey Marthaler 2009-10-23 13:51:18 EDT
SCENARIO - [corelog_mirror]
Create a mirror using a log in memory
grant-02: lvcreate -m 1 -n core_log -L 2G --corelog mirror_sanity
Deactivating mirror core_log... and removing
all core_log mirror images weren't removed from dm on grant-03

2.6.18-164.el5                                                                                                                           
lvm2-2.02.46-8.el5_4.2    BUILT: Thu Oct 15 09:28:12 CDT 2009
lvm2-cluster-2.02.46-8.el5_4.1    BUILT: Wed Sep 16 05:31:16 CDT 2009
device-mapper-1.02.32-1.el5    BUILT: Thu May 21 02:18:23 CDT 2009
cmirror-1.1.39-2.el5    BUILT: Mon Jul 27 15:39:05 CDT 2009
kmod-cmirror-0.1.22-1.el5    BUILT: Mon Jul 27 15:28:46 CDT 2009
Comment 22 Jonathan Earl Brassow 2009-12-03 13:16:59 EST
I'll try to spend some time on this now.  Would you be willing to try to reproduce with the latest rpms (>= the following)?
device-mapper-1.02.39-1.el5    BUILT: Wed Nov 11 12:31:44 CST 2009
device-mapper-event-1.02.39-1.el5    BUILT: Wed Nov 11 12:31:44 CST 2009
lvm2-2.02.56-1.el5    BUILT: Tue Nov 24 13:24:02 CST 2009
lvm2-cluster-2.02.56-1.el5    BUILT: Tue Nov 24 13:27:05 CST 2009
Comment 25 Corey Marthaler 2010-02-01 16:45:15 EST
I wasn't able to reproduce this issue with the latest rpms (lvm2-2.02.56-6.el5/lvm2-cluster-2.02.56-6.el5). 

Marking closed, and will reopen if seen again.

Note You need to log in before you can comment on or make changes to this bug.