Bug 1463794

Summary: RAID TAKEOVER: offline (umount) hack for takeover attempt segfaults trying to determine if lv is eligible for takeover
Product: Red Hat Enterprise Linux 7 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: Heinz Mauelshagen <heinzm>
lvm2 sub component: Mirroring and RAID QA Contact: cluster-qe <cluster-qe>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: unspecified CC: agk, heinzm, jbrassow, lmiksik, mcsontos, msnitzer, prajnoha, prockai, zkabelac
Version: 7.4Keywords: Regression
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: lvm2-2.02.171-7.el7 Doc Type: No Doc Update
Doc Text:
In release bug
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-01 21:54:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Corey Marthaler 2017-06-21 19:19:49 UTC
Description of problem:

  Reshape is only supported when centipede2/takeover is not in use (e.g. unmount filesystem).

# I unmounted the filesystem and tried again...

[root@host-073 ~]# lvs -a -o +devices
  LV                  VG            Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices
  takeover            centipede2    rwi-a-r---  <4.01g                                    100.00           takeover_rimage_0(0),takeover_rimage_1(0),takeover_rimage_2(0),takeover_rimage_3(0),takeover_rimage_4(0)
  [takeover_rimage_0] centipede2    iwi-aor---  <1.34g                                                     /dev/sde1(1)
  [takeover_rimage_1] centipede2    iwi-aor---  <1.34g                                                     /dev/sdb1(1)
  [takeover_rimage_2] centipede2    iwi-aor---  <1.34g                                                     /dev/sdf1(1)
  [takeover_rimage_3] centipede2    iwi-aor---  <1.34g                                                     /dev/sdd1(1)
  [takeover_rimage_4] centipede2    iwi-aor---  <1.34g                                                     /dev/sdh1(1)
  [takeover_rmeta_0]  centipede2    ewi-aor---   4.00m                                                     /dev/sde1(0)
  [takeover_rmeta_1]  centipede2    ewi-aor---   4.00m                                                     /dev/sdb1(0)
  [takeover_rmeta_2]  centipede2    ewi-aor---   4.00m                                                     /dev/sdf1(0)
  [takeover_rmeta_3]  centipede2    ewi-aor---   4.00m                                                     /dev/sdd1(0)
  [takeover_rmeta_4]  centipede2    ewi-aor---   4.00m                                                     /dev/sdh1(0)

[root@host-073 ~]# lvconvert --yes -R 8192.00k  --type raid6_n_6 centipede2/takeover
  Using default stripesize 64.00 KiB.
  Converting raid6_ra_6 LV centipede2/takeover to raid6_n_6.
Segmentation fault (core dumped)


Core was generated by `lvconvert --yes -R 16384.00k --type raid6_zr centipede2/takeover'.
Program terminated with signal 11, Segmentation fault.
#0  0x000055e9a48e9fb3 in lv_is_cow (lv=lv@entry=0x0) at metadata/snapshot_manip.c:34
34              return (!lv_is_thin_volume(lv) && !lv_is_origin(lv) && lv->snapshot) ? 1 : 0;
Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.x86_64 elfutils-libelf-0.168-5.el7.x86_64 elfutils-libs-0.168-5.el7.x86_64 glibc-2.17-194.el7.x86_64 libattr-2.4.46-12.el7.x86_64 libblkid-2.23.2-39.el7.x86_64 libcap-2.22-9.el7.x86_64 libgcc-4.8.5-14.el7.x86_64 libselinux-2.5-11.el7.x86_64 libsepol-2.5-6.el7.x86_64 libuuid-2.23.2-39.el7.x86_64 ncurses-libs-5.9-13.20130511.el7.x86_64 pcre-8.32-17.el7.x86_64 readline-6.2-10.el7.x86_64 systemd-libs-219-38.el7.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) bt
#0  0x000055e9a48e9fb3 in lv_is_cow (lv=lv@entry=0x0) at metadata/snapshot_manip.c:34
#1  0x000055e9a48a6e98 in lv_lock_holder (lv=lv@entry=0x0) at metadata/lv.c:1600
#2  0x000055e9a48e6215 in _vg_write_lv_suspend_commit_backup (lv=0x0, origin_only=1, do_backup=1, vg=0x55e9a53e1d00) at metadata/raid_manip.c:2120
#3  _vg_write_commit_backup (vg=0x55e9a53e1d00) at metadata/raid_manip.c:2150
#4  _raid_reshape (allocate_pvs=0x55e9a53e1dd8, new_stripe_size=128, new_stripes=<optimized out>, new_region_size=32768, new_data_copies=<optimized out>, force=<optimized out>,
    yes=<optimized out>, new_segtype=0x55e9a5354900, lv=0x55e9a53e2b40) at metadata/raid_manip.c:2376
#5  lv_raid_convert (lv=lv@entry=0x55e9a53e2b40, new_segtype=0x55e9a5354900, yes=<optimized out>, force=<optimized out>, new_stripes=<optimized out>,
    new_stripe_size_supplied=<optimized out>, new_stripe_size=128, new_region_size=32768, allocate_pvs=0x55e9a53e1dd8) at metadata/raid_manip.c:6360
#6  0x000055e9a482f3dd in _lvconvert_raid (lv=lv@entry=0x55e9a53e2b40, lp=lp@entry=0x7fff9d8a0250) at lvconvert.c:1408
#7  0x000055e9a483123c in _convert_striped_raid (cmd=<optimized out>, lp=0x7fff9d8a0250, lv=0x55e9a53e2b40) at lvconvert.c:1617
#8  _convert_striped (lp=<optimized out>, lv=<optimized out>, cmd=<optimized out>) at lvconvert.c:1684
#9  _lvconvert_raid_types (cmd=cmd@entry=0x55e9a5316020, lv=lv@entry=0x55e9a53e2b40, lp=lp@entry=0x7fff9d8a0250) at lvconvert.c:1757
#10 0x000055e9a483145a in _lvconvert_raid_types_single (cmd=cmd@entry=0x55e9a5316020, lv=0x55e9a53e2b40, handle=handle@entry=0x55e9a5361ff8) at lvconvert.c:4249
#11 0x000055e9a4856578 in process_each_lv_in_vg (cmd=cmd@entry=0x55e9a5316020, vg=vg@entry=0x55e9a53e1d00, arg_lvnames=arg_lvnames@entry=0x7fff9d8a0130, tags_in=tags_in@entry=0x7fff9d8a00e0,
    stop_on_error=stop_on_error@entry=0, handle=handle@entry=0x55e9a5361ff8, check_single_lv=check_single_lv@entry=0x55e9a482b4b0 <_lvconvert_raid_types_check>,
    process_single_lv=process_single_lv@entry=0x55e9a48313e0 <_lvconvert_raid_types_single>) at toollib.c:3144
#12 0x000055e9a48579c4 in _process_lv_vgnameid_list (process_single_lv=0x55e9a48313e0 <_lvconvert_raid_types_single>, check_single_lv=0x55e9a482b4b0 <_lvconvert_raid_types_check>,
    handle=0x55e9a5361ff8, arg_tags=0x7fff9d8a00e0, arg_lvnames=0x7fff9d8a0100, arg_vgnames=0x7fff9d8a00f0, vgnameids_to_process=0x7fff9d8a0120, read_flags=1048576, cmd=0x55e9a5316020)
    at toollib.c:3639
#13 process_each_lv (cmd=cmd@entry=0x55e9a5316020, argc=argc@entry=1, argv=<optimized out>, one_vgname=one_vgname@entry=0x0, one_lvname=one_lvname@entry=0x0,
    read_flags=read_flags@entry=1048576, handle=handle@entry=0x55e9a5361ff8, check_single_lv=check_single_lv@entry=0x55e9a482b4b0 <_lvconvert_raid_types_check>,
    process_single_lv=process_single_lv@entry=0x55e9a48313e0 <_lvconvert_raid_types_single>) at toollib.c:3791
#14 0x000055e9a48336b8 in lvconvert_raid_types_cmd (cmd=0x55e9a5316020, argc=<optimized out>, argv=<optimized out>) at lvconvert.c:4336
#15 0x000055e9a483f478 in lvm_run_command (cmd=cmd@entry=0x55e9a5316020, argc=1, argc@entry=7, argv=0x7fff9d8a0718, argv@entry=0x7fff9d8a06e8) at lvmcmdline.c:2951
#16 0x000055e9a48404d3 in lvm2_main (argc=7, argv=0x7fff9d8a06e8) at lvmcmdline.c:3485
#17 0x00007f1191264c05 in __libc_start_main () from /lib64/libc.so.6
#18 0x000055e9a481efae in _start ()


Version-Release number of selected component (if applicable):
3.10.0-685.el7.x86_64

lvm2-2.02.171-6.el7    BUILT: Wed Jun 21 09:35:03 CDT 2017
lvm2-libs-2.02.171-6.el7    BUILT: Wed Jun 21 09:35:03 CDT 2017
lvm2-cluster-2.02.171-6.el7    BUILT: Wed Jun 21 09:35:03 CDT 2017
device-mapper-1.02.140-6.el7    BUILT: Wed Jun 21 09:35:03 CDT 2017
device-mapper-libs-1.02.140-6.el7    BUILT: Wed Jun 21 09:35:03 CDT 2017
device-mapper-event-1.02.140-6.el7    BUILT: Wed Jun 21 09:35:03 CDT 2017
device-mapper-event-libs-1.02.140-6.el7    BUILT: Wed Jun 21 09:35:03 CDT 2017
device-mapper-persistent-data-0.7.0-0.1.rc6.el7    BUILT: Mon Mar 27 10:15:46 CDT 2017


How reproducible:
Everytime

Comment 2 Heinz Mauelshagen 2017-06-21 22:58:46 UTC
Upstream commit 64fac77e8a551f4dfe8f4cfaaf1ca984c9b5146c

Comment 5 Corey Marthaler 2017-06-22 21:28:31 UTC
Fix verified in the latest rpms. Many takeover/reshape iterations on different systems passed in with 171-7 that were failing in 171-6.
 
3.10.0-685.el7.x86_64
lvm2-2.02.171-7.el7    BUILT: Thu Jun 22 08:35:15 CDT 2017
lvm2-libs-2.02.171-7.el7    BUILT: Thu Jun 22 08:35:15 CDT 2017
lvm2-cluster-2.02.171-7.el7    BUILT: Thu Jun 22 08:35:15 CDT 2017
device-mapper-1.02.140-7.el7    BUILT: Thu Jun 22 08:35:15 CDT 2017
device-mapper-libs-1.02.140-7.el7    BUILT: Thu Jun 22 08:35:15 CDT 2017
device-mapper-event-1.02.140-7.el7    BUILT: Thu Jun 22 08:35:15 CDT 2017
device-mapper-event-libs-1.02.140-7.el7    BUILT: Thu Jun 22 08:35:15 CDT 2017
device-mapper-persistent-data-0.7.0-0.1.rc6.el7    BUILT: Mon Mar 27 10:15:46 CDT 2017



================================================================================
Iteration 9.1 started at Thu Jun 22 15:50:33 CDT 2017
================================================================================
Scenario raid6_ra_6: Convert Striped raid6_ra_6 volume
********* Take over hash info for this scenario *********
* from type:    raid6_ra_6
* to type:      raid6_nr
* from legs:    3
* to legs:      5
* from region:  1024.00k
* to region:    256.00k
* contiguous:   1
******************************************************


Creating original volume on host-073...
host-073: lvcreate  --type raid6_ra_6 -R 1024.00k -i 3 -n takeover -L 4G centipede2
  WARNING: Not using lvmetad because a repair command was run.

Waiting until all mirror|raid volumes become fully syncd...
   0/1 mirror(s) are fully synced: ( 11.76% )
   0/1 mirror(s) are fully synced: ( 25.72% )
   0/1 mirror(s) are fully synced: ( 42.19% )
   0/1 mirror(s) are fully synced: ( 61.72% )
   0/1 mirror(s) are fully synced: ( 76.41% )
   0/1 mirror(s) are fully synced: ( 91.85% )
   1/1 mirror(s) are fully synced: ( 100.00% )
Sleeping 15 sec

Current volume device structure:
  WARNING: Not using lvmetad because a repair command was run.
  LV                  Attr       LSize   Cpy%Sync Devices                                                                                                 
  takeover            rwi-a-r---  <4.01g 100.00   takeover_rimage_0(0),takeover_rimage_1(0),takeover_rimage_2(0),takeover_rimage_3(0),takeover_rimage_4(0)
  [takeover_rimage_0] iwi-aor---  <1.34g          /dev/sdf1(1)                                                                                            
  [takeover_rimage_1] iwi-aor---  <1.34g          /dev/sda1(1)                                                                                            
  [takeover_rimage_2] iwi-aor---  <1.34g          /dev/sde1(1)                                                                                            
  [takeover_rimage_3] iwi-aor---  <1.34g          /dev/sdc1(1)                                                                                            
  [takeover_rimage_4] iwi-aor---  <1.34g          /dev/sdg1(1)                                                                                            
  [takeover_rmeta_0]  ewi-aor---   4.00m          /dev/sdf1(0)                                                                                            
  [takeover_rmeta_1]  ewi-aor---   4.00m          /dev/sda1(0)                                                                                            
  [takeover_rmeta_2]  ewi-aor---   4.00m          /dev/sde1(0)                                                                                            
  [takeover_rmeta_3]  ewi-aor---   4.00m          /dev/sdc1(0)                                                                                            
  [takeover_rmeta_4]  ewi-aor---   4.00m          /dev/sdg1(0)                                                                                            

Creating xfs on top of mirror(s) on host-073...
Mounting mirrored xfs filesystems on host-073...

Writing verification files (checkit) to mirror(s) on...
        ---- host-073 ----

Verifying files (checkit) on mirror(s) on...
        ---- host-073 ----

Stopping the io load (collie/xdoio) on mirror(s)
Unmounting xfs and removing mnt point on host-073...

TAKEOVER: lvconvert --yes -R 256.00k  --type raid6_nr centipede2/takeover
  WARNING: Not using lvmetad because a repair command was run.
Waiting until all mirror|raid volumes become fully syncd...
   0/1 mirror(s) are fully synced: ( 6.04% )
   [...]
   0/1 mirror(s) are fully synced: ( 97.00% )
   1/1 mirror(s) are fully synced: ( 100.00% )

RESHAPE: lvconvert --yes  --stripes 5 centipede2/takeover
  WARNING: Not using lvmetad because a repair command was run.
  WARNING: Adding stripes to active and open logical volume centipede2/takeover will grow it from 1026 to 1710 extents!
Waiting until all mirror|raid volumes become fully syncd...
[...]

Comment 6 errata-xmlrpc 2017-08-01 21:54:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2222