Bug 985584 - lvm segfaults after failed pvmove leaves devices in odd state
lvm segfaults after failed pvmove leaves devices in odd state
Status: CLOSED WORKSFORME
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: lvm2 (Show other bugs)
7.0
x86_64 Linux
unspecified Severity high
: rc
: ---
Assigned To: Zdenek Kabelac
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-17 15:49 EDT by Corey Marthaler
Modified: 2014-01-29 18:53 EST (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-01-29 18:53:46 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
archive file (7.26 KB, text/plain)
2013-07-18 17:07 EDT, Corey Marthaler
no flags Details

  None (edit)
Description Corey Marthaler 2013-07-17 15:49:23 EDT
Description of problem:
I was running a thinp pvmove test case, which ended up failing/hanging, so I rebooted the system.

When it came back up, lvremove failed  when I attempted to clean up the left over devices.

[root@qalvm-01 ~]# lvs
  LV     VG            Attr      LSize  Pool Origin
  POOL   snapper_thinp twi---tz-  4.00g
  origin snapper_thinp Vwi---tz-  1.00g POOL
  other1 snapper_thinp Vwi---tz-  1.00g POOL
  other2 snapper_thinp Vwi---tz-  1.00g POOL
  other3 snapper_thinp Vwi---tz-  1.00g POOL
  other4 snapper_thinp Vwi---tz-  1.00g POOL
  other5 snapper_thinp Vwi---tz-  1.00g POOL
  snap1  snapper_thinp Vwi---tz-  1.00g POOL origin
  snap2  snapper_thinp Vwi---tz-  1.00g POOL origin

[root@qalvm-01 ~]# vgchange -an snapper_thinp
  0 logical volume(s) in volume group "snapper_thinp" now active

[root@qalvm-01 ~]# lvremove snapper_thinp
Removing pool "POOL" will remove 8 dependent volume(s). Proceed? [y/n]: y
Segmentation fault (core dumped)

[root@qalvm-01 ~]# lvs -a -o +devices
  LV           VG            Attr      LSize  Pool Move      Devices
  POOL         snapper_thinp twi---tz-  4.00g                POOL_tdata(0)
  [POOL_tdata] snapper_thinp TwI------  4.00g                pvmove0(0)
  [POOL_tdata] snapper_thinp TwI------  4.00g                /dev/vdg1(0)
  [POOL_tdata] snapper_thinp TwI------  4.00g                /dev/vdf1(0)
  [POOL_tmeta] snapper_thinp ewi------  4.00m                /dev/vdd1(0)
  other1       snapper_thinp Vwi---tz-  1.00g POOL
  other2       snapper_thinp Vwi---tz-  1.00g POOL
  other3       snapper_thinp Vwi---tz-  1.00g POOL
  other4       snapper_thinp Vwi---tz-  1.00g POOL
  other5       snapper_thinp Vwi---tz-  1.00g POOL
  [pvmove0]    snapper_thinp p-C---m--  2.00g      /dev/vde1 /dev/vde1(0),/dev/vdh1(0)
  snap1        snapper_thinp Vwi---tz-  1.00g POOL
  snap2        snapper_thinp Vwi---tz-  1.00g POOL


Core was generated by `lvremove snapper_thinp'.
Program terminated with signal 11, Segmentation fault.
#0  0x000000000047c72d in print_log (level=7, file=0x36cb63dcbd "ioctl/libdm-iface.c", line=1724, dm_errno_or_class=<error reading variable: Cannot access memory at address 0x7fff75f72ffc>, 
    format=<error reading variable: Cannot access memory at address 0x7fff75f72ff0>) at log/log.c:184
184     {
Missing separate debuginfos, use: debuginfo-install glibc-2.17-8.el7.x86_64 libgcc-4.8.1-2.el7.x86_64 libselinux-2.1.13-15.el7.x86_64 libsepol-2.1.9-1.el7.x86_64 ncurses-libs-5.9-11.20130511.el7.x86_64 pcre-8.32-7.el7.x86_64 readline-6.2-6.el7.x86_64 systemd-libs-204-8.el7.x86_64
(gdb) bt
#0  0x000000000047c72d in print_log (level=7, file=0x36cb63dcbd "ioctl/libdm-iface.c", line=1724, dm_errno_or_class=<error reading variable: Cannot access memory at address 0x7fff75f72ffc>, 
    format=<error reading variable: Cannot access memory at address 0x7fff75f72ff0>) at log/log.c:184
#1  0x00000036cb638019 in _do_dm_ioctl (dmt=0x1292af0, command=3241737479, buffer_repeat_count=0, retry_repeat_count=1, retryable=0x7fff75f746f0) at ioctl/libdm-iface.c:1703
#2  0x00000036cb638897 in dm_task_run (dmt=0x1292af0) at ioctl/libdm-iface.c:1844
#3  0x00000000004ce659 in _info_run (name=0x0, dlid=0x3251be8 "LVM-CAHTocYoJ6pWNfKb4xzKYddd6BrcMH18HwSxeUFIrpWpC1DhvTRD2CF2FUM1syfi", info=0x7fff75f74840, read_ahead=0x0, mknodes=0, with_open_count=1, 
    with_read_ahead=0, major=0, minor=0) at activate/dev_manager.c:116
#4  0x00000000004cf766 in _info (dlid=0x3251be8 "LVM-CAHTocYoJ6pWNfKb4xzKYddd6BrcMH18HwSxeUFIrpWpC1DhvTRD2CF2FUM1syfi", with_open_count=1, with_read_ahead=0, info=0x7fff75f74840, read_ahead=0x0)
    at activate/dev_manager.c:442
#5  0x00000000004d1e24 in _add_dev_to_dtree (dm=0x12abc50, dtree=0x1290570, lv=0x12a91a8, layer=0x0) at activate/dev_manager.c:1432
#6  0x00000000004d2ae9 in _add_lv_to_dtree (dm=0x12abc50, dtree=0x1290570, lv=0x12a91a8, origin_only=0) at activate/dev_manager.c:1672
#7  0x00000000004d3187 in _add_lv_to_dtree (dm=0x12abc50, dtree=0x1290570, lv=0x12a9070, origin_only=0) at activate/dev_manager.c:1744
#8  0x00000000004d2ea2 in _add_lv_to_dtree (dm=0x12abc50, dtree=0x1290570, lv=0x12a91a8, origin_only=0) at activate/dev_manager.c:1717
#9  0x00000000004d3187 in _add_lv_to_dtree (dm=0x12abc50, dtree=0x1290570, lv=0x12a9070, origin_only=0) at activate/dev_manager.c:1744
#10 0x00000000004d2ea2 in _add_lv_to_dtree (dm=0x12abc50, dtree=0x1290570, lv=0x12a91a8, origin_only=0) at activate/dev_manager.c:1717
#11 0x00000000004d3187 in _add_lv_to_dtree (dm=0x12abc50, dtree=0x1290570, lv=0x12a9070, origin_only=0) at activate/dev_manager.c:1744
#12 0x00000000004d2ea2 in _add_lv_to_dtree (dm=0x12abc50, dtree=0x1290570, lv=0x12a91a8, origin_only=0) at activate/dev_manager.c:1717
#13 0x00000000004d3187 in _add_lv_to_dtree (dm=0x12abc50, dtree=0x1290570, lv=0x12a9070, origin_only=0) at activate/dev_manager.c:1744
[...]


Version-Release number of selected component (if applicable):
3.10.0-0.rc5.61.el7.x86_64

lvm2-2.02.99-0.79.el7    BUILT: Mon Jul  8 08:28:24 CDT 2013
lvm2-libs-2.02.99-0.79.el7    BUILT: Mon Jul  8 08:28:24 CDT 2013
lvm2-cluster-2.02.99-0.79.el7    BUILT: Mon Jul  8 08:28:24 CDT 2013
device-mapper-1.02.78-0.79.el7    BUILT: Mon Jul  8 08:28:24 CDT 2013
device-mapper-libs-1.02.78-0.79.el7    BUILT: Mon Jul  8 08:28:24 CDT 2013
device-mapper-event-1.02.78-0.79.el7    BUILT: Mon Jul  8 08:28:24 CDT 2013
device-mapper-event-libs-1.02.78-0.79.el7    BUILT: Mon Jul  8 08:28:24 CDT 2013
cmirror-2.02.99-0.79.el7    BUILT: Mon Jul  8 08:28:24 CDT 2013
Comment 1 Zdenek Kabelac 2013-07-17 17:57:35 EDT
Would it be possible to obtain/attach metadata for this case?

There are currently know problems to work with unusable thin pool - tool currently doesn't support to move on when pool metadata gets damaged.

In this case it appears the metadata lead to weird construction of activation tree.
Comment 2 Corey Marthaler 2013-07-18 17:04:52 EDT
Jul 18 14:30:01 qalvm-01 qarshd[1333]: Running cmdline: pvmove /dev/vde1
Jul 18 14:30:01 qalvm-01 kernel: [  389.900657] bio: create slab <bio-3> at 3
Jul 18 14:30:49 qalvm-01 kernel: [  437.842500] end_request: I/O error, dev vde, sector 1031231
Jul 18 14:30:50 qalvm-01 kernel: [  438.125662] device-mapper: raid1: Unable to read primary mirror during recovery
Jul 18 14:31:00 qalvm-01 kernel: [  448.641639] end_request: I/O error, dev vde, sector 1051583
Jul 18 14:31:01 qalvm-01 kernel: [  449.088028] device-mapper: raid1: Unable to read primary mirror during recovery
Jul 18 14:31:14 qalvm-01 kernel: [  461.879531] end_request: I/O error, dev vde, sector 1071167
Jul 18 14:31:14 qalvm-01 kernel: [  462.430659] device-mapper: raid1: Unable to read primary mirror during recovery
Jul 18 14:31:17 qalvm-01 kernel: [  465.708705] end_request: I/O error, dev vde, sector 1077183
Jul 18 14:31:18 qalvm-01 kernel: [  466.125900] device-mapper: raid1: Unable to read primary mirror during recovery
[...]


[root@qalvm-01 ~]# lvs -a -o +devices
 LV           Attr      LSize  Pool Origin Data%  Move     Cpy%Sync Devices
 POOL         twi-a-tz-  4.00g               7.41                   POOL_tdata(0)
 [POOL_tdata] TwI-ao---  4.00g                                      pvmove0(0)
 [POOL_tdata] TwI-ao---  4.00g                                      /dev/vdg1(0)
 [POOL_tdata] TwI-ao---  4.00g                                      /dev/vdf1(0)
 [POOL_tmeta] ewi-ao---  4.00m                                      /dev/vdd1(0)
 origin       Vwi-aotz-  1.00g POOL         29.39
 other1       Vwi-a-tz-  1.00g POOL          0.00
 other2       Vwi-a-tz-  1.00g POOL          0.00
 other3       Vwi-a-tz-  1.00g POOL          0.00
 other4       Vwi-a-tz-  1.00g POOL          0.00
 other5       Vwi-a-tz-  1.00g POOL          0.00
 [pvmove0]    p-C-aom--  2.00g                    /dev/vde1 100.00 /dev/vde1(0),/dev/vdh1(0)
 snap1        Vwi-aotz-  1.00g POOL origin  15.52
 snap2        Vwi-a-tz-  1.00g POOL origin  29.39

[root@qalvm-01 ~]# thin_check /dev/mapper/snapper_thinp-POOL_tmeta

[root@qalvm-01 ~]# thin_dump /dev/mapper/snapper_thinp-POOL_tmeta
<superblock uuid="" time="0" transaction="0" data_block_size="128" nr_data_blocks="65536">
</superblock>

[root@qalvm-01 ~]# vgchange -an snapper_thinp
  0 logical volume(s) in volume group "snapper_thinp" now active

[root@qalvm-01 ~]# lvremove snapper_thinp
Removing pool "POOL" will remove 8 dependent volume(s). Proceed? [y/n]: y
Segmentation fault (core dumped)

I'll attach the archive file as well, though I'm not sure that will help.
Comment 3 Corey Marthaler 2013-07-18 17:07:41 EDT
Created attachment 775504 [details]
archive file
Comment 4 Zdenek Kabelac 2013-08-02 05:50:53 EDT
Ok - so this involves stacking of devices and pvmove operation on raid which has no support for pvmove.

As a short-term fix I think we will need to disable this.
Comment 5 Jonathan Earl Brassow 2014-01-27 17:57:33 EST
pvmove should work for thin on RAID.  We certainly test that in pvmove-thin-segtypes.sh.

also, "device-mapper: raid1:" does not indicate the RAID module, but the old mirror module.  Those messages are likely coming from the mirror constructed by pvmove.  Seems like it is unable to read from the source device, so it's unable to perform the copy.

I'll try to take a better look later.
Comment 6 Corey Marthaler 2014-01-29 18:53:46 EST
I haven't been able to reproduce this issue in the latest kernel/rpms. Closing for now and will reopen if seen again.

3.10.0-80.el7.x86_64
lvm2-2.02.105-2.el7    BUILT: Sun Jan 26 09:00:43 CST 2014
lvm2-libs-2.02.105-2.el7    BUILT: Sun Jan 26 09:00:43 CST 2014
lvm2-cluster-2.02.105-2.el7    BUILT: Sun Jan 26 09:00:43 CST 2014
device-mapper-1.02.84-2.el7    BUILT: Sun Jan 26 09:00:43 CST 2014
device-mapper-libs-1.02.84-2.el7    BUILT: Sun Jan 26 09:00:43 CST 2014
device-mapper-event-1.02.84-2.el7    BUILT: Sun Jan 26 09:00:43 CST 2014
device-mapper-event-libs-1.02.84-2.el7    BUILT: Sun Jan 26 09:00:43 CST 2014
device-mapper-persistent-data-0.2.8-3.el7    BUILT: Fri Dec 27 13:40:56 CST 2013
cmirror-2.02.105-2.el7    BUILT: Sun Jan 26 09:00:43 CST 2014

Note You need to log in before you can comment on or make changes to this bug.