Bug 225487

Summary: mirror remove leaves device mapper devices behind (dmeventd messages need id)
Product: Red Hat Enterprise Linux 4 Reporter: Ben Marzinski <bmarzins>
Component: device-mapperAssignee: Petr Rockai <prockai>
Status: CLOSED ERRATA QA Contact: Corey Marthaler <cmarthal>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.5CC: agk, christophe.varoqui, dwysocha, egoggin, jbrassow, junichi.nomura, kueda, lmb, mbroz, prockai, rkenna, tranlan
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2007-0749 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-11-15 15:58:04 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 221860, 244799    
Bug Blocks: 237673    
Attachments:
Description Flags
patch to add IDs to dmeventd messages
none
strace -f -s 200 of dmeventd while running test case
none
dmeventd-cleanup-fix.patch none

Comment 1 Ben Marzinski 2007-01-30 21:31:40 UTC
The real issue with this is that dmeventd uses fifos to communicate with its
clients.  If a client fails to read the reply for a request from the reply fifo,
that reply will simply sit in the fifo.  The next time a client issues a request
to dmeventd, it will read the old reply, instead of the current one.  This will
continue to happen until dmeventd shuts down. To deal with this, all messages
that are send to dmeventd must include an id, so that the client can verify that
it is receiving the correct reply.

Comment 3 Ben Marzinski 2007-01-31 04:23:16 UTC
Created attachment 146999 [details]
patch to add IDs to dmeventd messages

This patch adds a message id of the form <pid>:<sequence_nr> to the messages
passed between dmeventd and its clients. If a client receives a reply that
doesn't have correct id, it discards that reply and reads a new reply from
dmeventd. This solves the issue of old replies sitting in the fifo.

Comment 8 Alasdair Kergon 2007-02-02 20:36:28 UTC
Original bug text:


Description of problem:

While running lvm_config on the latest trees I hit a case where the underlying
device mapper devices are not removed when a mirror volume is removed.

Version-Release number of selected component (if applicable):
lvm2-2.02.16-3.el5
kernel-2.6.18-1.2961.el5
device-mapper-1.02.13-1.el5


How reproducible:
100%

Steps to Reproduce:
1. lvm_config -o host
2.
3.
  
Actual results:

Full test output attached.

cleaning up LVs if need be on morph-01...
deactivating and then removing lv /dev/linear_2_602/linear_2_6020 on node morph-01

cleaning up VGs if need be on morph-01...
deactivating vg linear_2_602 on node morph-01
removing vg linear_2_602 on node morph-01

cleaning up PVs if need be on morph-01...
removing pv /dev/sda1 on node morph-01
  Can't open /dev/sda1 exclusively - not removing. Mounted filesystem?
couldn't remove pv /dev/sda1 on morph-01

[root@morph-01 ~]# pvs
  PV         VG   Fmt  Attr PSize   PFree  
  /dev/sda1       lvm2 --   118.71G 118.71G
  /dev/sda2       lvm2 --   118.71G 118.71G
  /dev/sdb1       lvm2 --   118.71G 118.71G
  /dev/sdb2       lvm2 --   118.71G 118.71G
  /dev/sdc1       lvm2 --   118.71G 118.71G
  /dev/sdc2       lvm2 --   118.71G 118.71G
[root@morph-01 ~]# vgs
[root@morph-01 ~]# lvs
[root@morph-01 ~]# dmsetup ls
linear_2_602-linear_2_6020_mlog (253, 2)
VolGroup00-LogVol01     (253, 1)
linear_2_602-linear_2_6020_mimage_1     (253, 4)
VolGroup00-LogVol00     (253, 0)
linear_2_602-linear_2_6020_mimage_0     (253, 3)


Expected results:

All device mapper devices related to the mirror should be removed.


Comment 10 Nate Straz 2007-04-23 15:09:24 UTC
I am hitting this bug still with device-mapper-1.02.17-3.el4 and
lvm2-2.02.21-5.el4.  

Comment 11 Nate Straz 2007-04-23 15:12:10 UTC
Output from the test case:

deactivating volume group
CHANGING LV WRITE ACCESS TO RO on morph-01
activating volume group
verifying ro access flag on on morph-01
attempting to write to lv on morph-01
dd: writing to `/dev/mirror_1_8606/mirror_1_86060': Operation not permitted
1+0 records in
0+0 records out
deactivating volume group
CHANGING LV WRITE ACCESS BACK TO RW on morph-01


====================================================================
   Iter 4 started at Mon Apr 23 10:05:40 CDT 2007
====================================================================
VOLUME CREATION: stripe_7_2820, convert= snap=
deactivating LV mirror_1_8606/mirror_1_86060 on morph-01
removing LV mirror_1_8606/mirror_1_86060 on morph-01
removing VG mirror_1_8606 on morph-01
removing PV /dev/sda1 on morph-01
removing PV /dev/sdb1 on morph-01
removing PV /dev/sdc1 on morph-01
  Can't open /dev/sdc1 exclusively - not removing. Mounted filesystem?
unable to remove /dev/sdc1 on morph-01

[root@morph-01 ~]# dmsetup ls
mirror_1_8606-mirror_1_86060_mlog       (253, 2)
VolGroup00-LogVol01     (253, 1)
VolGroup00-LogVol00     (253, 0)


Comment 12 Nate Straz 2007-04-23 16:26:02 UTC
This was trivial to reproduce.  All it it took was converting a linear to a
mirror, activating, then deactivating the new mirror.  The device mapper devices
were not removed after volume deactivation.

[root@morph-01 ~]# pvcreate /dev/sda3 /dev/sda2 /dev/sda1 /dev/sdb3 /dev/sdb2
/dev/sdb1 /dev/sdc3 /dev/sdc2 /dev/sdc1 
  Physical volume "/dev/sda3" successfully created
  Physical volume "/dev/sda2" successfully created
  Physical volume "/dev/sda1" successfully created
  Physical volume "/dev/sdb3" successfully created
  Physical volume "/dev/sdb2" successfully created
  Physical volume "/dev/sdb1" successfully created
  Physical volume "/dev/sdc3" successfully created
  Physical volume "/dev/sdc2" successfully created
  Physical volume "/dev/sdc1" successfully created
[root@morph-01 ~]# vgcreate   linear_7_2848 /dev/sda3 /dev/sda2 /dev/sda1
/dev/sdb3 /dev/sdb2 /dev/sdb1 /dev/sdc3 /dev/sdc2 /dev/sdc1 
  Volume group "linear_7_2848" successfully created
[root@morph-01 ~]# vgchange -ay linear_7_2848 
  0 logical volume(s) in volume group "linear_7_2848" now active
[root@morph-01 ~]# lvcreate -l 60771 -n linear_7_28480 linear_7_2848 
  Logical volume "linear_7_28480" created
[root@morph-01 ~]# lvchange -ay linear_7_2848/linear_7_28480
[root@morph-01 ~]# dmsetup ls
linear_7_2848-linear_7_28480    (253, 2)
VolGroup00-LogVol01     (253, 1)
VolGroup00-LogVol00     (253, 0)
[root@morph-01 ~]# lvchange -an linear_7_2848/linear_7_28480
[root@morph-01 ~]# dmsetup ls
VolGroup00-LogVol01     (253, 1)
VolGroup00-LogVol00     (253, 0)
[root@morph-01 ~]# lvconvert -m 1 /dev/linear_7_2848/linear_7_28480 
  Logical volume linear_7_28480 converted.
[root@morph-01 ~]# dmsetup ls
VolGroup00-LogVol01     (253, 1)
VolGroup00-LogVol00     (253, 0)
[root@morph-01 ~]# lvchange -ay /dev/linear_7_2848/linear_7_28480
[root@morph-01 ~]# dmsetup ls
linear_7_2848-linear_7_28480_mimage_1   (253, 4)
linear_7_2848-linear_7_28480_mimage_0   (253, 3)
linear_7_2848-linear_7_28480_mlog       (253, 2)
linear_7_2848-linear_7_28480    (253, 5)
VolGroup00-LogVol01     (253, 1)
VolGroup00-LogVol00     (253, 0)
[root@morph-01 ~]# lvchange -an /dev/linear_7_2848/linear_7_28480
[root@morph-01 ~]# dmsetup ls
linear_7_2848-linear_7_28480_mimage_1   (253, 4)
linear_7_2848-linear_7_28480_mimage_0   (253, 3)
linear_7_2848-linear_7_28480_mlog       (253, 2)
VolGroup00-LogVol01     (253, 1)
VolGroup00-LogVol00     (253, 0)
[root@morph-01 ~]# lvremove /dev/linear_7_2848/linear_7_28480
  Logical volume "linear_7_28480" successfully removed
[root@morph-01 ~]# dmsetup ls
linear_7_2848-linear_7_28480_mimage_1   (253, 4)
linear_7_2848-linear_7_28480_mimage_0   (253, 3)
linear_7_2848-linear_7_28480_mlog       (253, 2)
VolGroup00-LogVol01     (253, 1)
VolGroup00-LogVol00     (253, 0)
[root@morph-01 ~]# vgremove linear_7_2848
  Volume group "linear_7_2848" successfully removed
[root@morph-01 ~]# dmsetup ls
linear_7_2848-linear_7_28480_mimage_1   (253, 4)
linear_7_2848-linear_7_28480_mimage_0   (253, 3)
linear_7_2848-linear_7_28480_mlog       (253, 2)
VolGroup00-LogVol01     (253, 1)
VolGroup00-LogVol00     (253, 0)


Comment 13 Nate Straz 2007-04-23 19:01:23 UTC
I was able to reproduce this on an x86_64 node.  I also found these log messages
on all nodes where the commands were failing.

lvm[4673]: dm_task_run failed, errno = 4, Interrupted system call


Comment 14 Nate Straz 2007-04-23 19:11:39 UTC
Created attachment 153306 [details]
strace -f -s 200 of dmeventd while running test case

Comment 15 Nate Straz 2007-04-23 19:25:54 UTC
More data points.

The systems I'm hitting this on are running device-mapper-1.02.17-3.el4 and
kernel-smp-2.6.9-55.EL.

The systems I'm NOT hitting this on are running device-mapper-1.02.17-2.el4 and
kernel-2.6.9-54.EL.

Comment 16 Nate Straz 2007-04-23 20:14:26 UTC
I upgraded device-mapper from -2 to -3 on a system running kernel -54.  The test
case started failing on me.

I downgraded device-mapper from -3 to -2 on a system running kernel -55.  The
test case started passing for me.

Ugh.  It seems device-mapper-1.02.17-3.el4 reintroduced this problem. :(

Comment 17 Nate Straz 2007-04-23 20:31:22 UTC
I upgraded my system with device-mapper-1.02.17-2.el4 to
device-mapper-1.02.17-2.0.1.el4 and was able to hit the problem again.

Comment 18 Jonathan Earl Brassow 2007-04-23 20:54:38 UTC
Seems to be the memory leak patch...

Specifically, when i put back in the 'pthread_cancel' section, it works again.



Comment 19 Petr Rockai 2007-04-24 07:32:46 UTC
Diagnosed the bug, fix pending (expect patch in a few hours).

Comment 20 Petr Rockai 2007-04-24 13:22:27 UTC
Created attachment 153349 [details]
dmeventd-cleanup-fix.patch

The attached patch should fix the issue encountered in latest (-2.0.1 and
later) builds. The problem was, that after removing pthread_cancel, the thread
itself would not realise it has to terminate itself right away, but would retry
waiting for event on the device and would only terminate when the device
disappeared. However, this is too late, as lvchange would try to remove the
remaining (mimage/mlog) tables, but fail (presumably due to dmeventd). This
particular race condition remains in the code (it's been always there, even),
but since the dmeventd thread cleanup runs much sooner now, it should not
trigger the race (ie, the timings should be roughly equivalent to those before
the leak fix).

Comment 21 Kiersten (Kerri) Anderson 2007-04-24 16:30:07 UTC
Setting RHEL4.6 flag request as well as the 4.5.z request.  We are blocked on
committing a patch until the 4.5.z flags gets updated.

Comment 23 Corey Marthaler 2007-05-09 15:53:58 UTC
This bug was verified as fixed in rhel4 device-mapper-1.02.17-3.0.1.el4.

Comment 26 Corey Marthaler 2007-11-05 22:21:11 UTC
Marking verified.

Comment 28 errata-xmlrpc 2007-11-15 15:58:04 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0749.html