Bug 1102919

Summary: vgspliting volume groups without lvmetad running can produce "Checksum error" for each PV split
Product: Red Hat Enterprise Linux 7 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: Alasdair Kergon <agk>
lvm2 sub component: Changing Logical Volumes QA Contact: cluster-qe <cluster-qe>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: low CC: agk, heinzm, jbrassow, lmiksik, msnitzer, nperic, prajnoha, prockai, rbednar, zkabelac
Version: 7.0Keywords: Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: lvm2-2.02.175-1.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1133116 (view as bug list) Environment:
Last Closed: 2018-04-10 15:16:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1133116, 1469559    
Attachments:
Description Flags
-vvvv of the vgsplit none

Description Corey Marthaler 2014-05-29 20:10:32 UTC
Description of problem:
Without lvmetad running, there are errors, with it running, there are no errors. 

[root@harding-03 ~]# pvscan
  PV /dev/sdc2   VG seven             lvm2 [23.29 GiB / 23.29 GiB free]
  PV /dev/sdc3   VG seven             lvm2 [23.29 GiB / 23.29 GiB free]
  PV /dev/sdc4   VG seven             lvm2 [23.29 GiB / 23.29 GiB free]
  PV /dev/sdc1   VG seven             lvm2 [23.29 GiB / 23.29 GiB free]
  PV /dev/sdb4   VG seven             lvm2 [23.29 GiB / 23.29 GiB free]
  PV /dev/sdb3   VG seven             lvm2 [23.29 GiB / 23.29 GiB free]
  PV /dev/sdb2   VG seven             lvm2 [23.29 GiB / 23.29 GiB free]

[root@harding-03 ~]# vgsplit seven ten /dev/sdc2 /dev/sdc3 /dev/sdc4 /dev/sdc1 /dev/sdb4 /dev/sdb3 /dev/sdb2
  /dev/sdc2: Checksum error
  /dev/sdc3: Checksum error
  /dev/sdc4: Checksum error
  /dev/sdc1: Checksum error
  /dev/sdb4: Checksum error
  /dev/sdb3: Checksum error
  /dev/sdb2: Checksum error
  New volume group "ten" successfully split from "seven"


Version-Release number of selected component (if applicable):
3.10.0-110.el7.x86_64
lvm2-2.02.105-14.el7    BUILT: Wed Mar 26 08:29:41 CDT 2014
lvm2-libs-2.02.105-14.el7    BUILT: Wed Mar 26 08:29:41 CDT 2014
lvm2-cluster-2.02.105-14.el7    BUILT: Wed Mar 26 08:29:41 CDT 2014
device-mapper-1.02.84-14.el7    BUILT: Wed Mar 26 08:29:41 CDT 2014
device-mapper-libs-1.02.84-14.el7    BUILT: Wed Mar 26 08:29:41 CDT 2014
device-mapper-event-1.02.84-14.el7    BUILT: Wed Mar 26 08:29:41 CDT 2014
device-mapper-event-libs-1.02.84-14.el7    BUILT: Wed Mar 26 08:29:41 CDT 2014
device-mapper-persistent-data-0.2.8-4.el7    BUILT: Fri Jan 24 14:28:55 CST 2014
cmirror-2.02.105-14.el7    BUILT: Wed Mar 26 08:29:41 CDT 2014

Comment 1 Corey Marthaler 2014-05-29 20:16:43 UTC
Created attachment 900505 [details]
-vvvv of the vgsplit

Comment 2 Corey Marthaler 2014-05-29 20:17:44 UTC
# From -vvvv output

#metadata/vg.c:60         Allocated VG ten at 0x7f7dc216f770.
#label/label.c:155       /dev/sdb4: lvm2 label detected at sector 1
#format_text/text_label.c:421         /dev/sdb4: PV header extension version 1 found
#config/config.c:411   /dev/sdb4: Checksum error
#format_text/import.c:55         <backtrace>
#format_text/format-text.c:1178         <backtrace>

Comment 4 Alasdair Kergon 2014-07-16 19:07:02 UTC
(I reproduced a simpler version of this but got diverted onto something else before I found the cause.)

Comment 6 Jonathan Earl Brassow 2017-07-26 21:05:52 UTC
[root@bp-01 ~]# pvs
  PV         VG         Fmt  Attr PSize    PFree
  /dev/sda2  rhel_bp-01 lvm2 a--  <464.76g    4.00m
  /dev/sdb1             lvm2 ---  <836.69g <836.69g
  /dev/sdc1             lvm2 ---  <836.69g <836.69g
  /dev/sdd1             lvm2 ---  <836.69g <836.69g
  /dev/sde1             lvm2 ---  <836.69g <836.69g
  /dev/sdf1             lvm2 ---  <836.69g <836.69g
  /dev/sdg1             lvm2 ---  <836.69g <836.69g
  /dev/sdh1             lvm2 ---  <836.69g <836.69g
  /dev/sdi1             lvm2 ---  <836.69g <836.69g
[root@bp-01 ~]# pvscan --cache
[root@bp-01 ~]# vgcreate vg /dev/sd[bcdefghi]1
  Volume group "vg" successfully created
[root@bp-01 ~]# pvscan --cache
[root@bp-01 ~]# lvmconfig global/use_lvmetad
use_lvmetad=0
[root@bp-01 ~]# vgsplit vg new /dev/sd[cdefg]1
  /dev/sdc1: Checksum error
  Couldn't read volume group metadata.
  /dev/sdd1: Checksum error
  Couldn't read volume group metadata.
  /dev/sde1: Checksum error
  Couldn't read volume group metadata.
  /dev/sdf1: Checksum error
  Couldn't read volume group metadata.
  /dev/sdg1: Checksum error
  Couldn't read volume group metadata.
  New volume group "new" successfully split from "vg"

still appears to be a problem -

# lvm version
  LVM version:     2.02.172(2)-git (2017-05-03)
  Library version: 1.02.141-git (2017-05-03)
  Driver version:  4.35.0
  Configuration:   ./configure --enable-lvm1_fallback --enable-fsadm --with-pool=internal --with-user= --with-group= --with-device-uid=0 --with-device-gid=6 --with-device-mode=0660 --enable-pkgconfig --enable-units-compat --with-optimisation=-g --enable-cmdlib --enable-dmeventd --libdir=/usr/lib64 --with-usrlibdir=/usr/lib64 --with-pool=internal --enable-applib --enable-python2-bindings --enable-udev_sync --with-thin=internal --enable-lvmetad --with-cache=internal

Comment 8 Alasdair Kergon 2017-09-21 16:07:40 UTC
This seems to be a very old problem that occurs when there are no LVs in the VG.

Comment 9 Alasdair Kergon 2017-09-22 01:43:12 UTC
There are a number of problems with vgsplit.  The main one is that the change is not atomic and recovery may be awkward if the process gets interrupted.  With a bit of thought, this could be improved considerably.

The bug itself comes about because it sometimes overwrites part of the old on-disk metadata with the new and then tries to read the old metadata back again and finds it got corrupted - the checksum error.  When I added an LV, this shifted the metadata within the buffer and it didn't happen.

The vgrename mechanism gets used for the new VG, but it incorrectly sets the 'old' VG name to the same as the 'new' one (because the VG structure got created afresh with the new name).  By setting it instead to the correct old name the errors disappear.

--- a/tools/vgsplit.c
+++ b/tools/vgsplit.c
@@ -705,6 +705,9 @@ int vgsplit(struct cmd_context *cmd, int argc, char **argv)
        if (!vg_rename(cmd, vg_to, vg_name_to))
                goto_bad;
 
+       /* Set old VG name so the metadata operations recognise that the PVs are in an existing VG */
+       vg_to->old_name = vg_from->name;
+
        /* store it on disks */
        log_verbose("Writing out updated volume groups");

Comment 12 Roman Bednář 2017-10-16 13:41:02 UTC
Marking verified with latest rpms. Checksum error no longer appears when splitting a pv from vg while lvmetad is not running. Adding regression check to seven_ten test suite to have this covered.

(note: pvscan has to be run prior to splitting in order to trigger this bug)


BEFORE PATCH:

# pvs
  PV         VG            Fmt  Attr PSize  PFree 
  /dev/sda1  vg            lvm2 a--  29.98g 29.98g
  /dev/sdb1  vg            lvm2 a--  29.98g 29.98g
  /dev/vda2  rhel_virt-366 lvm2 a--   7.51g 40.00m

# systemctl is-active lvm2-lvmetad
inactive

# pvscan --cache

# vgsplit vg vg2 /dev/sdb1
  /dev/sdb1: Checksum error
  Couldn't read volume group metadata.
  New volume group "vg2" successfully split from "vg"

===================================================
AFTER PATCH:

# pvs
  PV         VG            Fmt  Attr PSize   PFree  
  /dev/sda1  vg            lvm2 a--  <29.99g <29.99g
  /dev/sdb1  vg            lvm2 a--  <29.99g <29.99g
  /dev/sdc1                lvm2 ---  <30.00g <30.00g
  /dev/sdd1                lvm2 ---  <30.00g <30.00g
  /dev/sde1                lvm2 ---  <30.00g <30.00g
  /dev/sdf1                lvm2 ---  <30.00g <30.00g
  /dev/sdg1                lvm2 ---  <30.00g <30.00g
  /dev/sdh1                lvm2 ---  <30.00g <30.00g
  /dev/sdi1                lvm2 ---  <30.00g <30.00g
  /dev/sdj1                lvm2 ---  <30.00g <30.00g
  /dev/vda2  rhel_virt-371 lvm2 a--   <7.00g      0 

# systemctl is-active lvm2-lvmetad
inactive

# pvscan --cache

# vgsplit vg vg2 /dev/sdb1
  New volume group "vg2" successfully split from "vg"



===================================================


3.10.0-727.el7.x86_64

lvm2-2.02.175-2.el7    BUILT: Fri Oct 13 13:31:22 CEST 2017
lvm2-libs-2.02.175-2.el7    BUILT: Fri Oct 13 13:31:22 CEST 2017
lvm2-cluster-2.02.175-2.el7    BUILT: Fri Oct 13 13:31:22 CEST 2017
device-mapper-1.02.144-2.el7    BUILT: Fri Oct 13 13:31:22 CEST 2017
device-mapper-libs-1.02.144-2.el7    BUILT: Fri Oct 13 13:31:22 CEST 2017
device-mapper-event-1.02.144-2.el7    BUILT: Fri Oct 13 13:31:22 CEST 2017
device-mapper-event-libs-1.02.144-2.el7    BUILT: Fri Oct 13 13:31:22 CEST 2017
device-mapper-persistent-data-0.7.3-2.el7    BUILT: Tue Oct 10 11:00:07 CEST 2017
cmirror-2.02.175-2.el7    BUILT: Fri Oct 13 13:31:22 CEST 2017

Comment 15 errata-xmlrpc 2018-04-10 15:16:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:0853