Bug 1290775

Summary: PV appears only on 2nd and further call after PV present in the system while /etc/lvm/cache/.cache not having it in its records before
Product: Red Hat Enterprise Linux 6 Reporter: Peter Rajnoha <prajnoha>
Component: lvm2Assignee: David Teigland <teigland>
lvm2 sub component: Devices, Filtering and Stacking (RHEL6) QA Contact: cluster-qe <cluster-qe>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: unspecified CC: agk, cmarthal, heinzm, jbrassow, msnitzer, prajnoha, prockai, rbednar, teigland, zkabelac
Version: 6.8   
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: lvm2-2.02.140-1.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-11 01:19:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Peter Rajnoha 2015-12-11 12:21:35 UTC
(Original spotted while investigating bug #1190120)

Problem is traced down to commit 1f246fedfc349c25749da501e68a7f70bd122b0 (https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=c1f246fedfc349c25749da501e68a7f70bd122b0) which caused the regression.

(dev del is "echo 1 > /sys/block/.../device/delete" and dev rescan is "echo "- - -" > /sys/class/scsi_host/host0/scan" for the deleted device to be present in the system again)

Before the commit:

# pvs
  PV         VG     Fmt  Attr PSize   PFree  
  /dev/sda          lvm2 ---  128.00m 128.00m

# dev del sda
  Deleting device sda.

# pvs

# dev rescan
  Rescanning SCSI host /sys/class/scsi_host/host0.

# pvs
  PV         VG     Fmt  Attr PSize   PFree  
  /dev/sda          lvm2 ---  128.00m 128.00m


After and including the commit:

# pvs
  PV         VG     Fmt  Attr PSize   PFree  
  /dev/sda          lvm2 ---  128.00m 128.00m

# dev del sda
  Deleting device sda.

# pvs

# dev rescan
  Rescanning SCSI host /sys/class/scsi_host/host0.

# pvs
--> pvs not seeing /dev/sda here which is already present in the system!

# pvs
  PV         VG     Fmt  Attr PSize   PFree  
  /dev/sda          lvm2 ---  128.00m 128.00m

Comment 2 Peter Rajnoha 2015-12-11 12:36:41 UTC
Note: this is with devices/obtain_device_list_from_udev=0 in which case the /etc/lvm/cache/.cache is used.

Comment 3 Peter Rajnoha 2015-12-11 12:44:20 UTC
(In reply to Peter Rajnoha from comment #2)
> Note: this is with devices/obtain_device_list_from_udev=0 in which case the
> /etc/lvm/cache/.cache is used.

...and with global/use_lvmetad=0.

Comment 4 David Teigland 2015-12-11 16:11:34 UTC
If I use 'pvs -a', I do see the device after the rescan, but it's not recognized as a PV until the second 'pvs -a'.


[root@null-03 ~]# ./dev_del sdf

[root@null-03 ~]# pvs -a
  PV                     VG           Fmt  Attr PSize   PFree
  /dev/sda1                                ---       0     0 
  /dev/sda2              rhel_null-03 lvm2 a--  465.27g    0 

[root@null-03 ~]# for i in `seq 0 7`; do echo $i; echo "- - -" > /sys/class/scsi_host/host$i/scan; done

[root@null-03 ~]# pvs -a
  PV                     VG           Fmt  Attr PSize   PFree
  /dev/sda1                                ---       0     0 
  /dev/sda2              rhel_null-03 lvm2 a--  465.27g    0 
  /dev/sdf                                 ---       0     0 

[root@null-03 ~]# pvs -a
  PV                     VG           Fmt  Attr PSize   PFree  
  /dev/sda1                                ---       0       0 
  /dev/sda2              rhel_null-03 lvm2 a--  465.27g      0 
  /dev/sdf                            lvm2 ---  931.01g 931.01g

Comment 5 David Teigland 2015-12-11 16:31:08 UTC
Using 'pvs' instead of 'pvs -a', the device also seems to be known by the first 'pvs' (but not displayed because it's not recognized as a PV).  I'm inferring that from the fact that the device appears in .cache after the first 'pvs'.

[root@null-03 ~]# grep sdf /etc/lvm/cache/.cache 
                "/dev/sdf",

[root@null-03 ~]# pvs
  PV         VG           Fmt  Attr PSize   PFree  
  /dev/sda2  rhel_null-03 lvm2 a--  465.27g      0 
  /dev/sdf                lvm2 ---  931.01g 931.01g

[root@null-03 ~]# ./dev_del sdf

[root@null-03 ~]# pvs
  PV         VG           Fmt  Attr PSize   PFree
  /dev/sda2  rhel_null-03 lvm2 a--  465.27g    0 

[root@null-03 ~]# grep sdf /etc/lvm/cache/.cache 

[root@null-03 ~]# for i in `seq 0 7`; do echo $i; echo "- - -" > /sys/class/scsi_host/host$i/scan; done

[root@null-03 ~]# grep sdf /etc/lvm/cache/.cache 

[root@null-03 ~]# pvs
  PV         VG           Fmt  Attr PSize   PFree
  /dev/sda2  rhel_null-03 lvm2 a--  465.27g    0 

[root@null-03 ~]# grep sdf /etc/lvm/cache/.cache 
                "/dev/sdf",

[root@null-03 ~]# pvs
  PV         VG           Fmt  Attr PSize   PFree  
  /dev/sda2  rhel_null-03 lvm2 a--  465.27g      0 
  /dev/sdf                lvm2 ---  931.01g 931.01g

Comment 6 David Teigland 2015-12-11 16:41:30 UTC
This sequence causes sdf to appear in the first pvs:
rescan, rm /etc/lvm/cache/.cache, pvs

Comment 7 David Teigland 2015-12-11 17:39:13 UTC
I've traced the 'pvs' commands to compare the difference in behavior between the first and second one.

The device label scanning is not doing a "full scan", so it uses .cache.
The first 'pvs' does not see sdf in .cache, so it doesn't scan it.
The first 'pvs' does add sdf to .cache for the next command to use.
The second 'pvs' finds sdf in .cache, and scans it.

If I force a "full scan", then the first 'pvs' command does scan sdf and displays it.

What is supposed to cause 'pvs' to do a full scan (full_scan=2) vs. not doing a full scan (full_scan=0)?

pvs ->
process_each_pv() ->
get_vgnameids() ->
lvmcache_get_vgnameids() ->
lvmcache_label_scan(cmd, 0);

Should pvs detect something is changed and switch the '0' to '2' when calling lvmcache_label_scan()?

Comment 8 David Teigland 2015-12-11 18:05:12 UTC
The reason that this problem appears after commit c1f246fedfc349c25749da501e68a7f70bd122b0 is because that commit changed the ordering between get_vgnameids() and _get_all_devices().

The source of the trouble is that the code tries to hide device scanning within other functions that are not directly related.  Conceptually simple functions like getting lists of devices or lists of VG names, have critical side effects like scanning of devices.  Scanning devices should not be a hidden side effect, but is a very important primary operation, central to the operation of the command.  It needs to be called explicitly at a high level, before other various functions are used that depend on it.

Before the commit above, _get_all_devices() was called first.  This includes code that triggers a full scan.  get_vgnameids() is called second, and benefits from the full scan that was already done, so it finds sdf.

After the commit above, _get_all_devices() is called second.  _get_vgnameids() is called first and does not include code to trigger a full scan, so sdf is missed.

Comment 12 Roman Bednář 2016-02-29 14:28:48 UTC
Verified. First call of 'pvs' now displays devices even if they're not present in /etc/lvm/cache/.cache


=========================================================================================
# pvcreate /dev/sda
  Physical volume "/dev/sda" successfully created

# echo "1" > /sys/block/sda/device/delete

# grep sda /etc/lvm/cache/.cache 
		"/dev/sda",
# pvs
  PV         VG         Fmt  Attr PSize PFree
  /dev/vda2  vg_virt274 lvm2 a--  7.71g    0 

# grep sda /etc/lvm/cache/.cache 

# for i in `seq 0 4`; do echo $i; echo "- - -" > /sys/class/scsi_host/host$i/scan; done
0
1
2
3
4

# grep sda /etc/lvm/cache/.cache 

(/dev/sda is present and is not cached at this point)

# pvs
  PV         VG         Fmt  Attr PSize  PFree 
  /dev/sda              lvm2 ---  10.00g 10.00g
  /dev/vda2  vg_virt274 lvm2 a--   7.71g     0 

# grep sda /etc/lvm/cache/.cache 
		"/dev/sda",

(first call of 'pvs' now caches the device and displays it correctly on first call)
=========================================================================================

Tested on: 
2.6.32-616.el6.x86_64

lvm2-2.02.141-2.el6    BUILT: Wed Feb 10 14:49:03 CET 2016
lvm2-libs-2.02.141-2.el6    BUILT: Wed Feb 10 14:49:03 CET 2016
lvm2-cluster-2.02.141-2.el6    BUILT: Wed Feb 10 14:49:03 CET 2016
udev-147-2.71.el6    BUILT: Wed Feb 10 14:07:17 CET 2016
device-mapper-1.02.115-2.el6    BUILT: Wed Feb 10 14:49:03 CET 2016
device-mapper-libs-1.02.115-2.el6    BUILT: Wed Feb 10 14:49:03 CET 2016
device-mapper-event-1.02.115-2.el6    BUILT: Wed Feb 10 14:49:03 CET 2016
device-mapper-event-libs-1.02.115-2.el6    BUILT: Wed Feb 10 14:49:03 CET 2016
device-mapper-persistent-data-0.6.2-0.1.rc1.el6    BUILT: Wed Feb 10 16:52:15 CET 2016
cmirror-2.02.141-2.el6    BUILT: Wed Feb 10 14:49:03 CET 2016

Comment 14 errata-xmlrpc 2016-05-11 01:19:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0964.html