Bug 225487
Summary: | mirror remove leaves device mapper devices behind (dmeventd messages need id) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Ben Marzinski <bmarzins> | ||||||||
Component: | device-mapper | Assignee: | Petr Rockai <prockai> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Corey Marthaler <cmarthal> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 4.5 | CC: | agk, christophe.varoqui, dwysocha, egoggin, jbrassow, junichi.nomura, kueda, lmb, mbroz, prockai, rkenna, tranlan | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | RHBA-2007-0749 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2007-11-15 15:58:04 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | 221860, 244799 | ||||||||||
Bug Blocks: | 237673 | ||||||||||
Attachments: |
|
Comment 1
Ben Marzinski
2007-01-30 21:31:40 UTC
Created attachment 146999 [details]
patch to add IDs to dmeventd messages
This patch adds a message id of the form <pid>:<sequence_nr> to the messages
passed between dmeventd and its clients. If a client receives a reply that
doesn't have correct id, it discards that reply and reads a new reply from
dmeventd. This solves the issue of old replies sitting in the fifo.
Original bug text: Description of problem: While running lvm_config on the latest trees I hit a case where the underlying device mapper devices are not removed when a mirror volume is removed. Version-Release number of selected component (if applicable): lvm2-2.02.16-3.el5 kernel-2.6.18-1.2961.el5 device-mapper-1.02.13-1.el5 How reproducible: 100% Steps to Reproduce: 1. lvm_config -o host 2. 3. Actual results: Full test output attached. cleaning up LVs if need be on morph-01... deactivating and then removing lv /dev/linear_2_602/linear_2_6020 on node morph-01 cleaning up VGs if need be on morph-01... deactivating vg linear_2_602 on node morph-01 removing vg linear_2_602 on node morph-01 cleaning up PVs if need be on morph-01... removing pv /dev/sda1 on node morph-01 Can't open /dev/sda1 exclusively - not removing. Mounted filesystem? couldn't remove pv /dev/sda1 on morph-01 [root@morph-01 ~]# pvs PV VG Fmt Attr PSize PFree /dev/sda1 lvm2 -- 118.71G 118.71G /dev/sda2 lvm2 -- 118.71G 118.71G /dev/sdb1 lvm2 -- 118.71G 118.71G /dev/sdb2 lvm2 -- 118.71G 118.71G /dev/sdc1 lvm2 -- 118.71G 118.71G /dev/sdc2 lvm2 -- 118.71G 118.71G [root@morph-01 ~]# vgs [root@morph-01 ~]# lvs [root@morph-01 ~]# dmsetup ls linear_2_602-linear_2_6020_mlog (253, 2) VolGroup00-LogVol01 (253, 1) linear_2_602-linear_2_6020_mimage_1 (253, 4) VolGroup00-LogVol00 (253, 0) linear_2_602-linear_2_6020_mimage_0 (253, 3) Expected results: All device mapper devices related to the mirror should be removed. I am hitting this bug still with device-mapper-1.02.17-3.el4 and lvm2-2.02.21-5.el4. Output from the test case: deactivating volume group CHANGING LV WRITE ACCESS TO RO on morph-01 activating volume group verifying ro access flag on on morph-01 attempting to write to lv on morph-01 dd: writing to `/dev/mirror_1_8606/mirror_1_86060': Operation not permitted 1+0 records in 0+0 records out deactivating volume group CHANGING LV WRITE ACCESS BACK TO RW on morph-01 ==================================================================== Iter 4 started at Mon Apr 23 10:05:40 CDT 2007 ==================================================================== VOLUME CREATION: stripe_7_2820, convert= snap= deactivating LV mirror_1_8606/mirror_1_86060 on morph-01 removing LV mirror_1_8606/mirror_1_86060 on morph-01 removing VG mirror_1_8606 on morph-01 removing PV /dev/sda1 on morph-01 removing PV /dev/sdb1 on morph-01 removing PV /dev/sdc1 on morph-01 Can't open /dev/sdc1 exclusively - not removing. Mounted filesystem? unable to remove /dev/sdc1 on morph-01 [root@morph-01 ~]# dmsetup ls mirror_1_8606-mirror_1_86060_mlog (253, 2) VolGroup00-LogVol01 (253, 1) VolGroup00-LogVol00 (253, 0) This was trivial to reproduce. All it it took was converting a linear to a mirror, activating, then deactivating the new mirror. The device mapper devices were not removed after volume deactivation. [root@morph-01 ~]# pvcreate /dev/sda3 /dev/sda2 /dev/sda1 /dev/sdb3 /dev/sdb2 /dev/sdb1 /dev/sdc3 /dev/sdc2 /dev/sdc1 Physical volume "/dev/sda3" successfully created Physical volume "/dev/sda2" successfully created Physical volume "/dev/sda1" successfully created Physical volume "/dev/sdb3" successfully created Physical volume "/dev/sdb2" successfully created Physical volume "/dev/sdb1" successfully created Physical volume "/dev/sdc3" successfully created Physical volume "/dev/sdc2" successfully created Physical volume "/dev/sdc1" successfully created [root@morph-01 ~]# vgcreate linear_7_2848 /dev/sda3 /dev/sda2 /dev/sda1 /dev/sdb3 /dev/sdb2 /dev/sdb1 /dev/sdc3 /dev/sdc2 /dev/sdc1 Volume group "linear_7_2848" successfully created [root@morph-01 ~]# vgchange -ay linear_7_2848 0 logical volume(s) in volume group "linear_7_2848" now active [root@morph-01 ~]# lvcreate -l 60771 -n linear_7_28480 linear_7_2848 Logical volume "linear_7_28480" created [root@morph-01 ~]# lvchange -ay linear_7_2848/linear_7_28480 [root@morph-01 ~]# dmsetup ls linear_7_2848-linear_7_28480 (253, 2) VolGroup00-LogVol01 (253, 1) VolGroup00-LogVol00 (253, 0) [root@morph-01 ~]# lvchange -an linear_7_2848/linear_7_28480 [root@morph-01 ~]# dmsetup ls VolGroup00-LogVol01 (253, 1) VolGroup00-LogVol00 (253, 0) [root@morph-01 ~]# lvconvert -m 1 /dev/linear_7_2848/linear_7_28480 Logical volume linear_7_28480 converted. [root@morph-01 ~]# dmsetup ls VolGroup00-LogVol01 (253, 1) VolGroup00-LogVol00 (253, 0) [root@morph-01 ~]# lvchange -ay /dev/linear_7_2848/linear_7_28480 [root@morph-01 ~]# dmsetup ls linear_7_2848-linear_7_28480_mimage_1 (253, 4) linear_7_2848-linear_7_28480_mimage_0 (253, 3) linear_7_2848-linear_7_28480_mlog (253, 2) linear_7_2848-linear_7_28480 (253, 5) VolGroup00-LogVol01 (253, 1) VolGroup00-LogVol00 (253, 0) [root@morph-01 ~]# lvchange -an /dev/linear_7_2848/linear_7_28480 [root@morph-01 ~]# dmsetup ls linear_7_2848-linear_7_28480_mimage_1 (253, 4) linear_7_2848-linear_7_28480_mimage_0 (253, 3) linear_7_2848-linear_7_28480_mlog (253, 2) VolGroup00-LogVol01 (253, 1) VolGroup00-LogVol00 (253, 0) [root@morph-01 ~]# lvremove /dev/linear_7_2848/linear_7_28480 Logical volume "linear_7_28480" successfully removed [root@morph-01 ~]# dmsetup ls linear_7_2848-linear_7_28480_mimage_1 (253, 4) linear_7_2848-linear_7_28480_mimage_0 (253, 3) linear_7_2848-linear_7_28480_mlog (253, 2) VolGroup00-LogVol01 (253, 1) VolGroup00-LogVol00 (253, 0) [root@morph-01 ~]# vgremove linear_7_2848 Volume group "linear_7_2848" successfully removed [root@morph-01 ~]# dmsetup ls linear_7_2848-linear_7_28480_mimage_1 (253, 4) linear_7_2848-linear_7_28480_mimage_0 (253, 3) linear_7_2848-linear_7_28480_mlog (253, 2) VolGroup00-LogVol01 (253, 1) VolGroup00-LogVol00 (253, 0) I was able to reproduce this on an x86_64 node. I also found these log messages on all nodes where the commands were failing. lvm[4673]: dm_task_run failed, errno = 4, Interrupted system call Created attachment 153306 [details]
strace -f -s 200 of dmeventd while running test case
More data points. The systems I'm hitting this on are running device-mapper-1.02.17-3.el4 and kernel-smp-2.6.9-55.EL. The systems I'm NOT hitting this on are running device-mapper-1.02.17-2.el4 and kernel-2.6.9-54.EL. I upgraded device-mapper from -2 to -3 on a system running kernel -54. The test case started failing on me. I downgraded device-mapper from -3 to -2 on a system running kernel -55. The test case started passing for me. Ugh. It seems device-mapper-1.02.17-3.el4 reintroduced this problem. :( I upgraded my system with device-mapper-1.02.17-2.el4 to device-mapper-1.02.17-2.0.1.el4 and was able to hit the problem again. Seems to be the memory leak patch... Specifically, when i put back in the 'pthread_cancel' section, it works again. Diagnosed the bug, fix pending (expect patch in a few hours). Created attachment 153349 [details]
dmeventd-cleanup-fix.patch
The attached patch should fix the issue encountered in latest (-2.0.1 and
later) builds. The problem was, that after removing pthread_cancel, the thread
itself would not realise it has to terminate itself right away, but would retry
waiting for event on the device and would only terminate when the device
disappeared. However, this is too late, as lvchange would try to remove the
remaining (mimage/mlog) tables, but fail (presumably due to dmeventd). This
particular race condition remains in the code (it's been always there, even),
but since the dmeventd thread cleanup runs much sooner now, it should not
trigger the race (ie, the timings should be roughly equivalent to those before
the leak fix).
Setting RHEL4.6 flag request as well as the 4.5.z request. We are blocked on committing a patch until the 4.5.z flags gets updated. This bug was verified as fixed in rhel4 device-mapper-1.02.17-3.0.1.el4. Marking verified. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0749.html |