Bug 1829597

Summary: Require lvm2 with supporting locking_type=4 for pvs command on CentOS
Product: [oVirt] vdsm Reporter: Nir Soffer <nsoffer>
Component: CoreAssignee: bugs <bugs>
Status: CLOSED CURRENTRELEASE QA Contact: Ilan Zuckerman <izuckerm>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.30.45CC: aefrat, bugs, sbonazzo, tnisan
Target Milestone: ovirt-4.3.10   
Target Release: 4.30.46   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: vdsm-4.30.46 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-06-10 10:57:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nir Soffer 2020-04-29 20:41:55 UTC
Description of problem:

Bug 1811391 was fixed in 4.3.9 for RHEL 7.8. On CentOS the fix was partial
because lvm2 package providing fix for bug 1809660 was not available when
vdsm was released.

Using locking_type=1 when running pvs command can lead to VG metadata
corruption, when host running as non-SPM runs pvs command when the SPM is
modifying VG metadta. The command may see inconsistent metadata and may try
to fix the metadata, corrupting the VG metadata.

The package is available since 2020-04-08:
http://mirror.centos.org/centos/7/updates/x86_64/Packages/lvm2-2.02.186-7.el7_8.1.x86_64.rpm

Vdsm should require this package and enable locking_type=4 when running on
CentOS 7.8.

Version-Release number of selected component (if applicable):
3.30.45.

How reproducible:
Always

Steps to Reproduce:
1. Activate a non-spm host on CentOS 7.8

Actual results:
Vdsm using locking_type=1 in pvs command when host is not the SPM.

Expected results:
Vdsm using locking_type=4 in pvs command when host is not the SPM.

Comment 1 Nir Soffer 2020-04-29 20:43:52 UTC
Milestone 4.3.10 is not available, setting 4.3.9-1.

Sandro, we need to deliver this fix in the next ovirt release.

Comment 2 Avihai 2020-05-17 08:04:10 UTC
Hi Nir,
Please provide a clear verification scenario,

Comment 3 Nir Soffer 2020-05-18 09:30:18 UTC
This bug is relevant only to CentOS - on RHEL this was fixed in 4.3.9.

To verify, check vdsm DEBUG logs in non-spm host. Before this fix vdsm
would run "pvs" command with locking_type=1. Now pvs command runs with
locking_type=4.

Note that on the spm host, pvs command run with locking_type=1.

Comment 4 Ilan Zuckerman 2020-05-18 13:20:07 UTC
Verified on specially build centos env:

ovirt-engine-4.3.10.3-0.0.master.20200513172426.git741b00b.el7.noarch
vdsm-4.30.45-1.el7.x86_64


[root@storage-ge4-vdsm1 ~]# cat /etc/redhat-release 
CentOS Linux release 7.8.2003 (Core)


Vdsm log on a non SPM host showing locking_type=4 :

2020-05-18 14:10:48,891+0300 DEBUG (monitor/52a60f2) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-3 /usr/bin/sudo -n /usr/sbin/lvm vgs --config 'devices {  preferred_names=["^/dev/mapper/"]  ignore_suspended_devices=1  write_cache_state=0  disable_after_error_count=3  filter=["a|^/dev/mapper/360002ac0000000000000001600021f6b$|^/dev/mapper/360002ac0000000000000001700021f6b$|^/dev/mapper/360002ac0000000000000001800021f6b$|^/dev/mapper/360002ac0000000000000001900021f6b$|^/dev/mapper/360002ac0000000000000001a00021f6b$|^/dev/mapper/360002ac0000000000000001b00021f6b$|", "r|.*|"] } global {  locking_type=4  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 } backup {  retain_min=50  retain_days=0 }' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name --select 'vg_name = 52a60f26-b230-4505-a80e-5f9c2305cc62' (cwd None) (commands:198)


Vdsm log of a SPM host showing locking_type=1 :

2020-05-18 13:10:57,923+0300 DEBUG (monitor/0e3a104) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-3 /usr/bin/sudo -n /usr/sbin/lvm vgs --config 'devices {  preferred_names=["^/dev/mapper/"]  ignore_suspended_devices=1  write_cache_state=0  disable_after_error_count=3  filter=["a|^/dev/mapper/360002ac0000000000000001600021f6b$|^/dev/mapper/360002ac0000000000000001700021f6b$|^/dev/mapper/360002ac0000000000000001800021f6b$|^/dev/mapper/360002ac0000000000000001900021f6b$|^/dev/mapper/360002ac0000000000000001a00021f6b$|^/dev/mapper/360002ac0000000000000001b00021f6b$|", "r|.*|"] } global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 } backup {  retain_min=50  retain_days=0 }' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name --select 'vg_name = 0e3a104d-cf58-4446-8e88-1a93e653331c' (cwd None) (commands:198)


Nir, pls acknowledge that this can be considered as a verification. Because i didnt find 'pvs' command anywhere in the vdsm log. Just 'vgs'.

Comment 5 Nir Soffer 2020-05-18 14:54:36 UTC
(In reply to Ilan Zuckerman from comment #4)
> Verified on specially build centos env:

Verification is incorrect, the issue was only in pvs command. vgs command was
used correctly since 4.3.9.

To trigger pvs command you can do:

# vdsm-client Host setLogLevel level=DEBUG 
true

# echo -n >/var/log/vdsm/vdsm.log

# vdsm-client Host getDeviceList > /dev/null

# grep pvs /var/log/vdsm/vdsm.log | tail -1
2020-05-18 17:43:43,211+0300 DEBUG (jsonrpc/1) [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/lvm pvs --config 'devices {  preferred_names=["^/dev/mapper/"]  ignore_suspended_devices=1  write_cache_state=0  disable_after_error_count=3  filter=["r|.*|"]  hints="none" } global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 } backup {  retain_min=50  retain_days=0 }' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,size,vg_name,vg_uuid,pe_start,pe_count,pe_alloc_count,mda_count,dev_size,mda_used_count (cwd None) (commands:153)

This was on SPM host -on other hosts we should see locking_type=4.

Comment 7 Ilan Zuckerman 2020-05-19 10:17:09 UTC
Verified on 
ovirt-engine-4.3.10.3-0.0.master.20200513172426.git741b00b.el7.noarch
vdsm-4.30.46-1.el7.x86_64

On vdsm SPM host locking_type=1:

[root@storage-ge4-vdsm3 ~]# grep pvs /var/log/vdsm/vdsm.log | grep locking_type
2020-05-19 13:11:05,410+0300 DEBUG (jsonrpc/5) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-3 /usr/bin/sudo -n /usr/sbin/lvm pvs --config 'devices {  preferred_names=["^/dev/mapper/"]  ignore_suspended_devices=1  write_cache_state=0  disable_after_error_count=3  filter=["a|^/dev/mapper/360002ac0000000000000001600021f6b$|^/dev/mapper/360002ac0000000000000001700021f6b$|^/dev/mapper/360002ac0000000000000001800021f6b$|^/dev/mapper/360002ac0000000000000001900021f6b$|^/dev/mapper/360002ac0000000000000001a00021f6b$|^/dev/mapper/360002ac0000000000000001b00021f6b$|", "r|.*|"] } global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 } backup {  retain_min=50  retain_days=0 }' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,size,vg_name,vg_uuid,pe_start,pe_count,pe_alloc_count,mda_count,dev_size,mda_used_count (cwd None) (commands:198)


On regular vdsm (NOT SPM) locking_type=4

[root@storage-ge4-vdsm1 ~]# grep pvs /var/log/vdsm/vdsm.log | grep locking_type
2020-05-19 13:13:22,538+0300 DEBUG (jsonrpc/2) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-3 /usr/bin/sudo -n /usr/sbin/lvm pvs --config 'devices {  preferred_names=["^/dev/mapper/"]  ignore_suspended_devices=1  write_cache_state=0  disable_after_error_count=3  filter=["a|^/dev/mapper/360002ac0000000000000001600021f6b$|^/dev/mapper/360002ac0000000000000001700021f6b$|^/dev/mapper/360002ac0000000000000001800021f6b$|^/dev/mapper/360002ac0000000000000001900021f6b$|^/dev/mapper/360002ac0000000000000001a00021f6b$|^/dev/mapper/360002ac0000000000000001b00021f6b$|", "r|.*|"] } global {  locking_type=4  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 } backup {  retain_min=50  retain_days=0 }' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,size,vg_name,vg_uuid,pe_start,pe_count,pe_alloc_count,mda_count,dev_size,mda_used_count (cwd None) (commands:198)

Comment 8 Michal Skrivanek 2020-06-10 10:57:08 UTC
4.3.10 was released