Bug 857554
Summary: | segfault in dmeventd, lvremove, vgreduce when removing part of an LV on a missing PV with issue_discards enabled | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Corey Marthaler <cmarthal> |
Component: | lvm2 | Assignee: | Alasdair Kergon <agk> |
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 6.3 | CC: | agk, dwysocha, heinzm, jbrassow, jentrena, jpallich, msnitzer, nperic, prajnoha, prockai, thornber, zkabelac |
Target Milestone: | rc | Keywords: | ZStream |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | lvm2-2.02.98-1.el6 | Doc Type: | Bug Fix |
Doc Text: |
If issue_discards is enabled in the configuration and lvremove is run against a Logical Volume that partially uses a Physical Volume that is missing, lvremove would crash.
This is now fixed. Also, because issuing discards is irreversible, the user is now prompted to confirm their intention.
A new attribute 'p' in 'lvs' is set when the Logical Volume is partially missing.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2013-02-21 08:13:46 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 867009 |
Description
Corey Marthaler
2012-09-14 19:56:47 UTC
This appears to be 100% reproducible if you run enough iterations of helter_skelter. The most it's taken for me to hit this bug was 11. ./helter_skelter -R ../../var/share/resource_files/taft.xml -l /usr/tests/sts-rhel6.3 -r /usr/tests/sts-rhel6.3 ================================================================================ Iteration 0.11 started at Mon Sep 17 12:06:45 CDT 2012 ================================================================================ Scenario kill_primary_and_log_synced_2_legs: Kill primary leg and disk log of synced 2 leg mirror(s) [...] Sep 17 12:08:17 taft-01 kernel: dmeventd[7243]: segfault at 10 ip 00007f533d8c79d0 sp 00007f533d7112c0 error 4 in liblvm2cmd.so.2.02[7f533d8a9000+da000] This exists in the latest rhel6.4 rpms as well. lvm2-2.02.97-3.el6 BUILT: Tue Sep 11 05:06:56 CDT 2012 lvm2-libs-2.02.97-3.el6 BUILT: Tue Sep 11 05:06:56 CDT 2012 lvm2-cluster-2.02.97-3.el6 BUILT: Tue Sep 11 05:06:56 CDT 2012 udev-147-2.41.el6 BUILT: Thu Mar 1 13:01:08 CST 2012 device-mapper-1.02.76-3.el6 BUILT: Tue Sep 11 05:06:56 CDT 2012 device-mapper-libs-1.02.76-3.el6 BUILT: Tue Sep 11 05:06:56 CDT 2012 device-mapper-event-1.02.76-3.el6 BUILT: Tue Sep 11 05:06:56 CDT 2012 device-mapper-event-libs-1.02.76-3.el6 BUILT: Tue Sep 11 05:06:56 CDT 2012 cmirror-2.02.97-3.el6 BUILT: Tue Sep 11 05:06:56 CDT 2012 Core was generated by `/sbin/dmeventd'. Program terminated with signal 11, Segmentation fault. #0 _dev_topology_attribute (attribute=0x7f5488136019 "queue/discard_max_bytes", sysfs_dir=0x1544254 "/sys", dev=0x0) at device/device.c:377 377 if (dm_snprintf(path, PATH_MAX, sysfs_fmt_str, sysfs_dir, Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.80.el6.x86_64 libgcc-4.4.6-4.el6.x86_64 libselinux-2.0.94-5.3.el6.x86_64 libsepol-2.0.41-4.el6.x86_64 libudev-147-2.41.el6.x86_64 (gdb) bt #0 _dev_topology_attribute (attribute=0x7f5488136019 "queue/discard_max_bytes", sysfs_dir=0x1544254 "/sys", dev=0x0) at device/device.c:377 #1 0x00007f54880e520c in discard_pv_segment (peg=0x7f548006da10, discard_area_reduction=125) at metadata/pv_manip.c:209 #2 0x00007f54880cbf5b in _release_and_discard_lv_segment_area (seg=0x7f548006e850, s=<value optimized out>, area_reduction=<value optimized out>, with_discard=<value optimized out>) at metadata/lv_manip.c:309 #3 0x00007f54880cc849 in _lv_segment_reduce (lv=0x7f548006e110, extents=125, delete=1) at metadata/lv_manip.c:519 #4 _lv_reduce (lv=0x7f548006e110, extents=125, delete=1) at metadata/lv_manip.c:560 #5 0x00007f54880cca5c in lv_remove (lv=<value optimized out>) at metadata/lv_manip.c:632 #6 0x00007f54880e0100 in _delete_lv (mirror_lv=0x7f548006de98, lv=0x7f548006e110) at metadata/mirror.c:445 #7 0x00007f54880e304c in _remove_mirror_images (lv=<value optimized out>, num_removed=1, is_removable=0x7f548810f230 <_is_partial_lv>, removable_baton=0x0, remove_log=1, collapse=0, removed=0x7f54887e95fc, preferred_only=1) at metadata/mirror.c:1077 #8 0x00007f54880e37eb in remove_mirror_images (lv=0x7f548006de98, num_mirrors=<value optimized out>, is_removable=0x7f548810f230 <_is_partial_lv>, removable_baton=0x0, remove_log=1) at metadata/mirror.c:1131 #9 0x00007f5488110517 in mirror_remove_missing (cmd=0x15410a0, lv=0x7f548006de98, force=<value optimized out>) at lvconvert.c:1362 #10 0x00007f54881116f6 in _lvconvert_mirrors_repair (cmd=0x15410a0, lv=0x7f548006de98, lp=0x7f54887e9840) at lvconvert.c:1414 #11 _lvconvert_mirrors (cmd=0x15410a0, lv=0x7f548006de98, lp=0x7f54887e9840) at lvconvert.c:1519 #12 0x00007f5488112636 in _lvconvert_single (cmd=0x15410a0, lv=0x7f548006de98, handle=0x7f54887e9840) at lvconvert.c:2033 #13 0x00007f5488112fb4 in lvconvert_single (cmd=0x15410a0, lp=0x7f54887e9840) at lvconvert.c:2135 #14 0x00007f5488113ac6 in lvconvert (cmd=0x15410a0, argc=1, argv=0x7f54887e99c8) at lvconvert.c:2216 #15 0x00007f548811984c in lvm_run_command (cmd=0x15410a0, argc=1, argv=0x7f54887e99c8) at lvmcmdline.c:1116 #16 0x00007f5488131ead in lvm2_run (handle=0x15410a0, cmdline=<value optimized out>) at lvmcmdlib.c:89 #17 0x00007f548859815a in _remove_failed_devices (dmt=<value optimized out>, event=<value optimized out>, unused=<value optimized out>) at dmeventd_mirror.c:145 #18 process_event (dmt=<value optimized out>, event=<value optimized out>, unused=<value optimized out>) at dmeventd_mirror.c:189 #19 0x000000000040408d in _do_process_event (arg=0x153dff0) at dmeventd.c:710 #20 _monitor_thread (arg=0x153dff0) at dmeventd.c:833 #21 0x0000003997c07851 in ?? () #22 0x00007f54887ea700 in ?? () #23 0x0000000000000000 in ?? () pv->dev is NULL if a PV is missing. Should discard_pv_segment just return 1 silently if (!peg->pv->dev) ? Here's the easiest way I've found to reproduce this so far: # With discards turned on in lvm.conf [root@taft-04 ~]# pvscan PV /dev/sdb1 VG test [135.66 GiB / 135.66 GiB free] PV /dev/sdc1 VG test [135.66 GiB / 135.66 GiB free] PV /dev/sdd1 VG test [135.66 GiB / 135.66 GiB free] [root@taft-04 ~]# lvcreate -m 1 -n mirror -L 100M test Logical volume "mirror" created [root@taft-04 ~]# lvs -a -o +devices LV VG Attr LSize Log Copy% Devices mirror test mwi-a-m- 100.00m mirror_mlog 100.00 mirror_mimage_0(0),mirror_mimage_1(0) [mirror_mimage_0] test iwi-aom- 100.00m /dev/sdb1(0) [mirror_mimage_1] test iwi-aom- 100.00m /dev/sdc1(0) [mirror_mlog] test lwi-aom- 4.00m /dev/sdd1(0) [root@taft-04 ~]# echo offline > /sys/block/sdc/device/state [root@taft-04 ~]# lvs -a -o +devices /dev/sdc1: read failed after 0 of 512 at 145669554176: Input/output error /dev/sdc1: read failed after 0 of 512 at 145669664768: Input/output error /dev/sdc1: read failed after 0 of 512 at 0: Input/output error /dev/sdc1: read failed after 0 of 512 at 4096: Input/output error /dev/sdc1: read failed after 0 of 2048 at 0: Input/output error Couldn't find device with uuid rh2PwK-WkZ1-pEWB-glsX-SudD-BlpK-B2xh11. LV VG Attr LSize Log Copy% Devices mirror test mwi-a-m- 100.00m mirror_mlog 100.00 mirror_mimage_0(0),mirror_mimage_1(0) [mirror_mimage_0] test iwi-aom- 100.00m /dev/sdb1(0) [mirror_mimage_1] test iwi-aom- 100.00m unknown device(0) [mirror_mlog] test lwi-aom- 4.00m /dev/sdd1(0) [root@taft-04 ~]# vgreduce --removemissing --force test /dev/sdc1: read failed after 0 of 512 at 145669554176: Input/output error /dev/sdc1: read failed after 0 of 512 at 145669664768: Input/output error /dev/sdc1: read failed after 0 of 512 at 0: Input/output error /dev/sdc1: read failed after 0 of 512 at 4096: Input/output error /dev/sdc1: read failed after 0 of 2048 at 0: Input/output error Couldn't find device with uuid rh2PwK-WkZ1-pEWB-glsX-SudD-BlpK-B2xh11. Segmentation fault (core dumped) Core was generated by `vgreduce --removemissing --force test'. Program terminated with signal 11, Segmentation fault. #0 0x00000000004a13a0 in ?? () Missing separate debuginfos, use: debuginfo-install lvm2-2.02.97-3.el6.x86_64 (gdb) bt #0 0x00000000004a13a0 in ?? () #1 0x000000000047bebc in discard_pv_segment () #2 0x0000000000462c0b in ?? () #3 0x00000000004634f9 in ?? () #4 0x000000000046370c in lv_remove () #5 0x0000000000476db0 in ?? () #6 0x0000000000479cfc in ?? () #7 0x000000000047a49b in remove_mirror_images () #8 0x0000000000417e27 in mirror_remove_missing () #9 0x0000000000437589 in vgreduce () #10 0x000000000042115c in lvm_run_command () #11 0x0000000000423b92 in lvm2_main () #12 0x0000003a4101ecdd in __libc_start_main () from /lib64/libc.so.6 #13 0x0000000000413df9 in _start () I've reproduced this with a simple lvremove too under the same circumstances. lvremove should not be removing LVs where the underlying PVs are missing as it does not yet offer all the protections that vgreduce offers. A regression introduced by 8c5bcdabab5b40498f2051715639cf07f14d877b I think. Also partial LVs are not highlighted in lvs. http://git.fedorahosted.org/cgit/lvm2.git/commit/?id=b737ff01e4038d45798521448dc4ae4864d1b29a (fix) http://git.fedorahosted.org/cgit/lvm2.git/commit/?id=290ae4791ef5eedaae2bc1cfa3645ac8073faf54 (partial attribute - not for z-stream) Well, the fix is not very good, since lvremove is actually required to work with missing PVs. If it's not safe, that's a bug and should be fixed, but making it impossible to remove broken LVs manually is a bad idea, in my opinion. I'm quite sure this is a bug in the discard code (which seems to be much newer). Do you still see problems in lvremove after fixing the discard code? I guess there might be multiple problems together - where it's not quite clear what is the proper fix. i.e.: I've 3 PVs - and multiple LVs spread over them - LV1 on PV1 and PV2 LV2 on PV2 and PV3 LV3 on PV3 Now - if PV1 and PV2 is missing - and I run 'lvremove LV1' - should we remove only LV1 - or also LV2 ? And the situation gets probably more complex with mirrors in this game. Petr - are these 2 upstream commits resolving this BZ: https://www.redhat.com/archives/lvm-devel/2012-September/msg00057.html https://www.redhat.com/archives/lvm-devel/2012-September/msg00058.html Zdenek, you need the latter, but also the changes to discard by Alasdair. First of the patches is behaviour-changing and shouldn't go into .z I think. Did manual tests and ran a full helter_skelter group of tests. Marking this verified with: lvm2-2.02.98-2.el6.x86_64 device-mapper-1.02.77-2.el6.x86_64 kernel-2.6.32-336.el6.x86_64 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-0501.html |