Bug 1259587

Summary: pvchange fails to find physical volume
Product: Red Hat Enterprise Linux 7 Reporter: Maor <mlipchuk>
Component: lvm2Assignee: Peter Rajnoha <prajnoha>
lvm2 sub component: Changing Logical Volumes QA Contact: cluster-qe <cluster-qe>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: agk, amureini, cmarthal, frolland, heinzm, jbrassow, lmiksik, mlipchuk, msnitzer, nsoffer, prajnoha, prockai, ylavi, zkabelac
Version: 7.2Keywords: Regression, Triaged
Target Milestone: rc   
Target Release: 7.3   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: lvm2-2.02.130-2.el7 Doc Type: Bug Fix
Doc Text:
Intra-release bug. No documentation needed.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-19 12:47:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1272035    
Attachments:
Description Flags
lvm verbose log with pvcreate and pvchange
none
rpm -qa log
none
rpm log
none
lvm conf none

Description Maor 2015-09-03 06:26:37 UTC
Created attachment 1069624 [details]
lvm verbose log with pvcreate and pvchange

Description of problem:
As part of the process of adding an iSCSI storage domain, LVM failed to execute pvchange on a physical volume with the following message:
"Failed to find physical volume "/dev/mapper/36001405b37c244cc7cd4fc9ae147faab"

This is how pvcreate is called from VDSM:
usr/bin/sudo -n /usr/sbin/lvm pvcreate --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ '\''a|/dev/mapper/36001405547004ac2ada44e78a0ed8ec1|/dev/mapper/36001405b37c244cc7cd4fc9ae147faab|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --metadatasize 128m --metadatacopies 2 --metadataignore y /dev/mapper/36001405b37c244cc7cd4fc9ae147faab /dev/mapper/36001405547004ac2ada44e78a0ed8ec1

This is how pvchange is called from VDSM:
/usr/bin/sudo -n /usr/sbin/lvm pvchange --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ '\''a|/dev/mapper/360014053226a7f0077246e392eaa3a61|/dev/mapper/36001405547004ac2ada44e78a0ed8ec1|/dev/mapper/36001405b37c244cc7cd4fc9ae147faab|/dev/mapper/36001405d747b798180e4aa8bb13ccf2b|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --metadataignore n /dev/mapper/36001405b37c244cc7cd4fc9ae147faab
Version-Release number of selected component (if applicable):

We can see that 36001405b37c244cc7cd4fc9ae147faab does exists in /dev/mapper

[root@dhcp-0-138 mapper]# ls -l
total 300
lrwxrwxrwx. 1 root root       7 Sep  3 09:17 360014053226a7f0077246e392eaa3a61 -> ../dm-4
lrwxrwxrwx. 1 root root       8 Sep  3 09:19 36001405547004ac2ada44e78a0ed8ec1 -> ../dm-13
lrwxrwxrwx. 1 root root       8 Sep  3 09:19 36001405b37c244cc7cd4fc9ae147faab -> ../dm-14
lrwxrwxrwx. 1 root root       7 Sep  3 09:17 36001405d747b798180e4aa8bb13ccf2b -> ../dm-3
lrwxrwxrwx. 1 root root       8 Sep  3 09:17 aaea6240--4831--4807--912b--b39b8a88ccec-aed953b8--8387--458b--b8e6--72b55272b609 -> ../dm-12
lrwxrwxrwx. 1 root root       8 Sep  3 09:17 aaea6240--4831--4807--912b--b39b8a88ccec-dc999b35--f2c6--4579--8a01--ac1caca6fbb4 -> ../dm-11
lrwxrwxrwx. 1 root root       7 Sep  3 09:17 aaea6240--4831--4807--912b--b39b8a88ccec-ids -> ../dm-8
lrwxrwxrwx. 1 root root       7 Sep  3 09:17 aaea6240--4831--4807--912b--b39b8a88ccec-inbox -> ../dm-9
lrwxrwxrwx. 1 root root       7 Sep  3 09:17 aaea6240--4831--4807--912b--b39b8a88ccec-leases -> ../dm-7
lrwxrwxrwx. 1 root root       8 Sep  3 09:17 aaea6240--4831--4807--912b--b39b8a88ccec-master -> ../dm-10
lrwxrwxrwx. 1 root root       7 Sep  3 09:17 aaea6240--4831--4807--912b--b39b8a88ccec-metadata -> ../dm-5
lrwxrwxrwx. 1 root root       7 Sep  3 09:17 aaea6240--4831--4807--912b--b39b8a88ccec-outbox -> ../dm-6
crw-------. 1 root root 10, 236 Sep  3 09:14 control
-rw-r--r--. 1 root root  303809 Sep  3 09:20 screen.log
lrwxrwxrwx. 1 root root       7 Sep  3 09:14 vg0-lv_home -> ../dm-2
lrwxrwxrwx. 1 root root       7 Sep  3 09:14 vg0-lv_root -> ../dm-0
lrwxrwxrwx. 1 root root       7 Sep  3 09:14 vg0-lv_swap -> ../dm-1

[root@dhcp-0-138 mapper]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.2 Beta (Maipo)

[root@dhcp-0-138 mapper]# rpm -qa | grep lvm
lvm2-2.02.127-1.el7.x86_64
lvm2-libs-2.02.127-1.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. discover iscsi host with iscsiadm
2. login to a discovered target
3. call pvcreate with metadataignore as true
4. call pvchange on the first PV with metadataignore as false

Actual results:
pvchange fails with the following message:
"Failed to find physical volume "/dev/mapper/36001405b37c244cc7cd4fc9ae147faab"

Expected results:
pvchange should complete successfully.

Additional info:

Comment 2 Nir Soffer 2015-09-03 10:15:26 UTC
Marked as regression since this used to work before 7.2.

Comment 4 Maor 2015-09-03 10:50:17 UTC
Created attachment 1069746 [details]
rpm -qa log

Comment 5 Maor 2015-09-03 10:53:01 UTC
Created attachment 1069747 [details]
rpm log

Comment 6 Peter Rajnoha 2015-09-09 14:07:49 UTC
How does /etc/lvm/lvm.conf look like?

Is that machine accessible (beaker)? If yes, I'd just probably log in and I'd have a closer look...

Comment 7 Nir Soffer 2015-09-09 14:18:09 UTC
(In reply to Peter Rajnoha from comment #6)
> How does /etc/lvm/lvm.conf look like?
Peter, please note that vdsm overrides lvm config from the command line using
--config option as seen in comment 0.

Comment 8 Peter Rajnoha 2015-09-09 14:30:27 UTC
Of course, but I'd like to make sure this does not coincidence somehow with any other LVM commands running outside or whether there's not a bug in how --config is processed (there may be a problem in overriding the conf, not sure now, just trying to rule it out).

Also, RHEL 7 uses lvmetad by default - so there may be a bug in how --config ... use_lvmetad=0 is processed. I don't see a message which is dipslayed when use_lvmetad is overriden while lvmetad is running "WARNING: lvmetad is running but disabled.".

So I just want to make a full picture of the environment that is used.

Comment 9 Peter Rajnoha 2015-09-09 14:35:34 UTC
Based on the log, only /dev/vda2 has a PV header. If you run blkid /dev/mapper/36001405b37c244cc7cd4fc9ae147faab just before running the pvchange, is it identified as TYPE="LVM2_member"?

Is it asured there's nothing else running with respect to LVM in parallel between the pvcreate and pvchange?

Comment 10 Peter Rajnoha 2015-09-09 14:52:00 UTC
OK, I've managed to reproduce (but not in 100% times, it took me several tries to get there):

[0] rhel7-a/~ # pvcreate --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ "a|/dev/mapper/test1|/dev/mapper/test2|", "r|.*|" ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --metadatacopies 2 --metadataignore y /dev/mapper/test1 /dev/mapper/test2
  Physical volume "/dev/mapper/test1" successfully created
  Physical volume "/dev/mapper/test2" successfully created

[0] rhel7-a/~ # pvchange --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ "a|/dev/mapper/test1|/dev/mapper/test2|", "r|.*|" ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --metadataignore n /dev/mapper/test2
  Failed to find physical volume "/dev/mapper/test2".
  0 physical volumes changed / 0 physical volumes not changed

Will inspect further...

Comment 11 Peter Rajnoha 2015-09-09 15:07:04 UTC
I'd need to know whether the lvm.conf has obtain_device_list_from_udev=0 and/or use_lvmetad=0 - this would affect how the .cache is created if at all - a parallel command could be run from within udev rules. If such .cache file is written by that LVM command, it's read by the other command then (just like your pvchange).

I've managed to reproduce only with the .cache file created in parallel by other LVM command.

BTW, you don't need to use obtain_device_list_from_udev=0 in RHEL7 anymore I suppose as all the problems we've seen in RHEL6 with respect to udev and accessing udev database should be resolved in recent udev versions (the version in RHEL6 is ancient and it had bugs - which is why we advised to use obtain_device_list_from_udev=0 there).

Whenever possible, it's good to avoid using .cache file - which is what happens when obtain_device_list_from_udev=1 is used - the .cache file is ignored in that case.

Anyway, I'll trace further to see the exact path leading to this problem...

Comment 12 Maor 2015-09-09 15:50:11 UTC
Created attachment 1071820 [details]
lvm conf

Comment 13 Maor 2015-09-09 15:54:50 UTC
Hi Peter,

I've attached the lvm conf file, and send you my Host details my mail.
BTW I've upgraded my host since the last time I've opened the bug,
so now the LVM version is:
lvm2-libs-2.02.129-2.el7.x86_64
lvm2-2.02.129-2.el7.x86_64

Comment 14 Peter Rajnoha 2015-09-10 08:52:24 UTC
Thanks for the access to the machine!

I've looked inside and the source of the problem is clear - the .cache file is incorrect and it contains only the vda2 device (and its symlinks), it's missing all the other block devices.

When using obtain_device_list_from_udev=0 in the VDSM commands, this .cache file is used. Then LVM sees only that vda2 device (which also corresponds to the verbose log file you sent earlier).

We need to find out how this .cache file got created - if all the VDSM commands are run with write_cache_state=0 and globally the LVM is configured with obtain_device_list_from_udev=1 (in which case the .cache is not used/written too), there must be an extra LVM command run somewhere which generates the .cache file (and in incorrect way).

Now, I recall we've been fixing something very similar recently:
https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=9ea77b788b0822fe06e8efb92d13f80dbc62e351

But this should be included since lvm v2.02.117.

Will inspect further...

Comment 15 Peter Rajnoha 2015-09-10 14:20:09 UTC
OK, in the end, this was a regression caused by https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=f6473baffc2d0b486b2aa941cf2681683026b3a5 which just missed to handle the case in which the configuration changes either by providing --cache argument for an LVM command or simply by editing lvm.conf file. In either case, if there's existing persistent cache file (/etc/lvm/cache/.cache), it needs to be ignored because the configuration changed and the .cache content may not be valid anymore.

https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=2296999cf6922b2be0c6dc38abae0c6766996060

Comment 16 Peter Rajnoha 2015-09-10 14:31:44 UTC
To QA:

To test this issue:

- make sure you have use_lvmead=0 and obtain_device_list_from_udev=0

$ lvmconfig --type diff
global {
	use_lvmetad=0
}
devices {
	obtain_device_list_from_udev=0
}

- create a PV

$ pvcreate /dev/sda 
  Physical volume "/dev/sda" successfully created

- edit /etc/lvm/cache/.cache file to simulate outdated .cache file which doens't have record about /dev/sda (just keep one device there for example /dev/sdb or whatever else than used in the pvcreate before)

persistent_filter_cache {
        valid_devices=[
                "/dev/sdb"
        ]
}

- then try to change the sda with --config used (doesn't matter if you not override anything, just the --config use itself is important here):

$ pvchange -u --config 'devices/obtain_device_list_from_udev=0' /dev/sda
  Failed to find physical volume "/dev/sda".
  0 physical volumes changed / 0 physical volumes not changed

- the same but with the patch applied:

$ pvchange -u --config 'devices/obtain_device_list_from_udev=0' /dev/sda
  Physical volume "/dev/sda" changed
  1 physical volume changed / 0 physical volumes not changed

Comment 18 Nir Soffer 2015-10-15 08:36:18 UTC
Maor, we must require this fix in vdsm. Would you open a vdsm block bug
for consuming this package when it is available?

Comment 19 Nir Soffer 2015-10-15 08:37:06 UTC
Peter, we need this fix also in Fedora; should we open a Fedora bug for this?

Comment 20 Peter Rajnoha 2015-10-15 09:53:29 UTC
(In reply to Nir Soffer from comment #19)
> Peter, we need this fix also in Fedora; should we open a Fedora bug for this?

Nope, F21 and F22 do not have this regression (they have older version < 127). F23 will gain the fix with the rebase.

Comment 21 Nir Soffer 2015-10-15 10:36:38 UTC
(In reply to Nir Soffer from comment #18)
Maor, ignore my need info, I open bug 1272035 for this.

Comment 22 Corey Marthaler 2015-10-15 17:49:08 UTC
Fixed verified in the latest rpms.

3.10.0-322.el7.x86_64
lvm2-2.02.130-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
lvm2-libs-2.02.130-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
lvm2-cluster-2.02.130-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
device-mapper-1.02.107-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
device-mapper-libs-1.02.107-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
device-mapper-event-1.02.107-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
device-mapper-event-libs-1.02.107-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
device-mapper-persistent-data-0.5.5-1.el7    BUILT: Thu Aug 13 09:58:10 CDT 2015
cmirror-2.02.130-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
sanlock-3.2.4-1.el7    BUILT: Fri Jun 19 12:48:49 CDT 2015
sanlock-lib-3.2.4-1.el7    BUILT: Fri Jun 19 12:48:49 CDT 2015
lvm2-lockd-2.02.130-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015



use_lvmetad = 0
obtain_device_list_from_udev = 0

[root@host-113 cache]# cat .cache
# This file is automatically maintained by lvm.

persistent_filter_cache {
        valid_devices=[
                "/dev/sdb1",
        ]
}
[root@host-113 cache]# pvchange -u --config 'devices/obtain_device_list_from_udev=0' /dev/sda1
  Physical volume "/dev/sda1" changed
  1 physical volume changed / 0 physical volumes not changed

Comment 23 Fred Rolland 2015-11-10 14:07:07 UTC
Peter hi, do we have this fix in 7.2 ?

Comment 24 Peter Rajnoha 2015-11-10 14:10:34 UTC
(In reply to Fred Rolland from comment #23)
> Peter hi, do we have this fix in 7.2 ?

Yes, the fix is included in lvm2-2.02.130-2.el7 and higher (so 7.2).

Comment 25 Fred Rolland 2015-11-10 14:25:51 UTC
Thanks.
What about 7.1 ? Was it backported ?

Comment 26 Peter Rajnoha 2015-11-10 14:33:26 UTC
(In reply to Fred Rolland from comment #25)
> Thanks.
> What about 7.1 ? Was it backported ?

Well, looking at this in more detail - this was intra-release bug actually - it was a regression introduced by https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=f6473baffc2d0b486b2aa941cf2681683026b3a5 (that is lvm2 v2.02.127). So it hasn't appeared in any officially released version of lvm2 (7.1 had v2.02.115).

Comment 27 errata-xmlrpc 2015-11-19 12:47:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2147.html