Bug 1644206

Summary: lvm can overwrite extents beyond metadata area [rhel-7.6.z]
Product: Red Hat Enterprise Linux 7 Reporter: Oneata Mircea Teodor <toneata>
Component: lvm2Assignee: LVM and device-mapper development team <lvm-team>
lvm2 sub component: Command-line tools QA Contact: cluster-qe <cluster-qe>
Status: CLOSED ERRATA Docs Contact: Marek Suchánek <msuchane>
Severity: urgent    
Priority: urgent CC: agk, amarecek, cluster-qe, cmarthal, heinzm, jbrassow, mcsontos, msnitzer, msuchane, prajnoha, prockai, rhandlin, salmy, teigland, thornber, zkabelac
Version: 7.6Keywords: ZStream
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: lvm2-2.02.180-10.el7_6.2 Doc Type: Bug Fix
Doc Text:
A bug in the I/O layer of LVM caused LVM to read and write back the first 128kB of data that immediately followed the LVM metadata on the disk. If another program or the file system was modifying these blocks when an LVM command was used, changes might have been lost. As a consequence, this might have lead to data corruption in rare cases. With this update, LVM no longer writes past the metadata area. As a result, the data corruption no longer occurs in the described scenario.
Story Points: ---
Clone Of: 1643651 Environment:
Last Closed: 2018-11-01 10:34:20 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1643651    
Bug Blocks:    

Description Oneata Mircea Teodor 2018-10-30 09:01:02 UTC
This bug has been copied from bug #1643651 and has been proposed to be backported to 7.6 z-stream (EUS).

Comment 8 Corey Marthaler 2018-10-31 17:00:26 UTC
This does appear fixed in lvm2-2.02.180-10.el7_6.2. 

I will continue with additional iterations of this test scenario, but the first iteration did not cause this bug with 180-10.el7_6.2 but it did with older 180-10.el7_6.1.


### HAYES-01 w/o the fix

3.10.0-958.el7.x86_64
lvm2-2.02.180-10.el7_6.1    BUILT: Wed Oct 10 12:43:42 CDT 2018
lvm2-libs-2.02.180-10.el7_6.1    BUILT: Wed Oct 10 12:43:42 CDT 2018
lvm2-cluster-2.02.180-10.el7_6.1    BUILT: Wed Oct 10 12:43:42 CDT 2018
lvm2-lockd-2.02.180-10.el7_6.1    BUILT: Wed Oct 10 12:43:42 CDT 2018
device-mapper-1.02.149-10.el7_6.1    BUILT: Wed Oct 10 12:43:42 CDT 2018
device-mapper-libs-1.02.149-10.el7_6.1    BUILT: Wed Oct 10 12:43:42 CDT 2018
device-mapper-event-1.02.149-10.el7_6.1    BUILT: Wed Oct 10 12:43:42 CDT 2018
device-mapper-event-libs-1.02.149-10.el7_6.1    BUILT: Wed Oct 10 12:43:42 CDT 2018

[root@hayes-01 ~]# vgcreate --config devices/default_data_alignment=0 --metadatasize 520k gg /dev/sdg1
  Physical volume "/dev/sdg1" successfully created.
  Volume group "gg" successfully created
[root@hayes-01 ~]# vgs -o+pe_start gg
  VG #PV #LV #SN Attr   VSize  VFree  1st PE 
  gg   1   0   0 wz--n- <1.82t <1.82t 576.00k
[root@hayes-01 ~]# lvcreate -l1 -n test gg
  Logical volume "test" created.
[root@hayes-01 ~]# sanlock daemon -w 0
[root@hayes-01 ~]# sanlock client init -s LS:0:/dev/gg/test:0 -o 1
init
init done 0
[root@hayes-01 ~]# sanlock client add_lockspace -s LS:1:/dev/gg/test:0 -o 1
add_lockspace_timeout 1
add_lockspace_timeout done 0
[root@hayes-01 ~]# tail -f /var/log/sanlock.log
2018-10-31 11:34:41 1324 [25811]: sanlock daemon started 3.6.0 host 4c0475af-605a-4ea6-bed9-fb20027d8bb3.hayes-01.l
2018-10-31 11:35:01 1345 [25814]: s1 lockspace LS:1:/dev/gg/test:0
2018-10-31 11:35:05 1348 [25811]: s1 host 1 1 1345 4c0475af-605a-4ea6-bed9-fb20027d8bb3.hayes-01.l

# this will fill the VG metadata area
[root@hayes-01 ~]# for i in `seq 1 1000`; do lvcreate -an -l1 -n lv$i gg; done

2018-10-31 11:38:00 1524 [25823]: s1 delta_renew long write time 1 sec
2018-10-31 11:40:25 1669 [25823]: s1 delta_renew long write time 1 sec

[root@hayes-01 ~]# lvremove -f gg

### This is the corruption ###
2018-10-31 11:44:03 1886 [25823]: s1 delta_renew reread mismatch
2018-10-31 11:44:03 1886 [25823]: leader1 delta_renew_last error 0 lockspace LS host_id 1
2018-10-31 11:44:03 1886 [25823]: leader2 path /dev/gg/test offset 0
2018-10-31 11:44:03 1886 [25823]: leader3 m 12212010 v 30003 ss 512 nh 0 mh 1 oi 1 og 1 lv 0
2018-10-31 11:44:03 1886 [25823]: leader4 sn LS rn 4c0475af-605a-4ea6-bed9-fb20027d8bb3.hayes-01.l ts 1884 cs da5a1805
2018-10-31 11:44:03 1886 [25823]: leader1 delta_renew_read error 0 lockspace LS host_id 1
2018-10-31 11:44:03 1886 [25823]: leader2 path /dev/gg/test offset 0                                                                                                              
2018-10-31 11:44:03 1886 [25823]: leader3 m 12212010 v 30003 ss 512 nh 0 mh 1 oi 1 og 1 lv 0                                                                                      
2018-10-31 11:44:03 1886 [25823]: leader4 sn LS rn 4c0475af-605a-4ea6-bed9-fb20027d8bb3.hayes-01.l ts 1881 cs ee52b910                                                            
2018-10-31 11:44:03 1886 [25823]: s1 renewal error -261 delta_length 0 last_success 1884                                                                                          
2018-10-31 11:44:03 1887 [25811]: s1 check_our_lease corrupt -261                                                                                                                 
2018-10-31 11:44:03 1887 [25811]: s1 all pids clear                                                                                                                               




### HAYES-03 w/ the fix

3.10.0-958.el7.x86_64

lvm2-2.02.180-10.el7_6.2    BUILT: Wed Oct 31 03:55:58 CDT 2018
lvm2-libs-2.02.180-10.el7_6.2    BUILT: Wed Oct 31 03:55:58 CDT 2018
lvm2-cluster-2.02.180-10.el7_6.2    BUILT: Wed Oct 31 03:55:58 CDT 2018
lvm2-lockd-2.02.180-10.el7_6.2    BUILT: Wed Oct 31 03:55:58 CDT 2018
device-mapper-1.02.149-10.el7_6.2    BUILT: Wed Oct 31 03:55:58 CDT 2018
device-mapper-libs-1.02.149-10.el7_6.2    BUILT: Wed Oct 31 03:55:58 CDT 2018
device-mapper-event-1.02.149-10.el7_6.2    BUILT: Wed Oct 31 03:55:58 CDT 2018
device-mapper-event-libs-1.02.149-10.el7_6.2    BUILT: Wed Oct 31 03:55:58 CDT 2018

[root@hayes-03 ~]#  vgcreate --config devices/default_data_alignment=0 --metadatasize 520k gg /dev/sdg1
  Physical volume "/dev/sdg1" successfully created.
  Volume group "gg" successfully created
[root@hayes-03 ~]# vgs -o+pe_start gg
  VG #PV #LV #SN Attr   VSize  VFree  1st PE 
  gg   1   0   0 wz--n- <1.82t <1.82t 576.00k
[root@hayes-03 ~]# lvcreate -l1 -n test gg
  Logical volume "test" created.
[root@hayes-03 ~]# sanlock daemon -w 0
[root@hayes-03 ~]# sanlock client init -s LS:0:/dev/gg/test:0 -o 1
init
init done 0
[root@hayes-03 ~]# sanlock client add_lockspace -s LS:1:/dev/gg/test:0 -o 1
add_lockspace_timeout 1
add_lockspace_timeout done 0
[root@hayes-03 ~]# tail -f /var/log/sanlock.log
2018-10-31 11:07:50 71163 [25791]: helper pid 25792 term signal 15
2018-10-31 11:34:38 1160 [16993]: sanlock daemon started 3.6.0 host 897a5627-94f7-43a2-ab1c-72093b575fc0.hayes-03.l
2018-10-31 11:34:58 1181 [16996]: s1 lockspace LS:1:/dev/gg/test:0
2018-10-31 11:35:02 1184 [16993]: s1 host 1 1 1181 897a5627-94f7-43a2-ab1c-72093b575fc0.hayes-03.l

# this will fill the VG metadata area
[root@hayes-03 ~]# for i in `seq 1 1000`; do lvcreate -an -l1 -n lv$i gg; done


2018-10-31 11:42:41 1644 [17005]: s1 delta_renew long write time 1 sec
2018-10-31 11:43:58 1721 [17005]: s1 delta_renew long write time 1 sec
2018-10-31 11:49:29 2052 [17005]: s1 delta_renew long write time 1 sec
2018-10-31 11:51:26 2169 [17005]: s1 delta_renew long write time 1 sec
2018-10-31 11:52:48 2251 [17005]: s1 delta_renew long write time 1 sec
2018-10-31 11:54:00 2323 [17005]: s1 delta_renew long write time 1 sec

### no reported corruption ###

Comment 9 Corey Marthaler 2018-10-31 18:42:27 UTC
Additional testing was not able to hit this issue with lvm2-2.02.180-10.el7_6.2 while continuing to reproduce it with lvm2-2.02.180-10.el7_6.1.

Marking verified.

3.10.0-958.el7.x86_64

lvm2-2.02.180-10.el7_6.2    BUILT: Wed Oct 31 03:55:58 CDT 2018
lvm2-libs-2.02.180-10.el7_6.2    BUILT: Wed Oct 31 03:55:58 CDT 2018
lvm2-lockd-2.02.180-10.el7_6.2    BUILT: Wed Oct 31 03:55:58 CDT 2018
device-mapper-1.02.149-10.el7_6.2    BUILT: Wed Oct 31 03:55:58 CDT 2018
device-mapper-libs-1.02.149-10.el7_6.2    BUILT: Wed Oct 31 03:55:58 CDT 2018
device-mapper-event-1.02.149-10.el7_6.2    BUILT: Wed Oct 31 03:55:58 CDT 2018
device-mapper-event-libs-1.02.149-10.el7_6.2    BUILT: Wed Oct 31 03:55:58 CDT 2018

Comment 11 errata-xmlrpc 2018-11-01 10:34:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3442