Bug 1259587
Summary: | pvchange fails to find physical volume | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Maor <mlipchuk> | ||||||||||
Component: | lvm2 | Assignee: | Peter Rajnoha <prajnoha> | ||||||||||
lvm2 sub component: | Changing Logical Volumes | QA Contact: | cluster-qe <cluster-qe> | ||||||||||
Status: | CLOSED ERRATA | Docs Contact: | |||||||||||
Severity: | high | ||||||||||||
Priority: | high | CC: | agk, amureini, cmarthal, frolland, heinzm, jbrassow, lmiksik, mlipchuk, msnitzer, nsoffer, prajnoha, prockai, ylavi, zkabelac | ||||||||||
Version: | 7.2 | Keywords: | Regression, Triaged | ||||||||||
Target Milestone: | rc | ||||||||||||
Target Release: | 7.3 | ||||||||||||
Hardware: | x86_64 | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | lvm2-2.02.130-2.el7 | Doc Type: | Bug Fix | ||||||||||
Doc Text: |
Intra-release bug. No documentation needed.
|
Story Points: | --- | ||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2015-11-19 12:47:59 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 1272035 | ||||||||||||
Attachments: |
|
Marked as regression since this used to work before 7.2. Created attachment 1069746 [details]
rpm -qa log
Created attachment 1069747 [details]
rpm log
How does /etc/lvm/lvm.conf look like? Is that machine accessible (beaker)? If yes, I'd just probably log in and I'd have a closer look... (In reply to Peter Rajnoha from comment #6) > How does /etc/lvm/lvm.conf look like? Peter, please note that vdsm overrides lvm config from the command line using --config option as seen in comment 0. Of course, but I'd like to make sure this does not coincidence somehow with any other LVM commands running outside or whether there's not a bug in how --config is processed (there may be a problem in overriding the conf, not sure now, just trying to rule it out). Also, RHEL 7 uses lvmetad by default - so there may be a bug in how --config ... use_lvmetad=0 is processed. I don't see a message which is dipslayed when use_lvmetad is overriden while lvmetad is running "WARNING: lvmetad is running but disabled.". So I just want to make a full picture of the environment that is used. Based on the log, only /dev/vda2 has a PV header. If you run blkid /dev/mapper/36001405b37c244cc7cd4fc9ae147faab just before running the pvchange, is it identified as TYPE="LVM2_member"? Is it asured there's nothing else running with respect to LVM in parallel between the pvcreate and pvchange? OK, I've managed to reproduce (but not in 100% times, it took me several tries to get there): [0] rhel7-a/~ # pvcreate --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ "a|/dev/mapper/test1|/dev/mapper/test2|", "r|.*|" ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' --metadatacopies 2 --metadataignore y /dev/mapper/test1 /dev/mapper/test2 Physical volume "/dev/mapper/test1" successfully created Physical volume "/dev/mapper/test2" successfully created [0] rhel7-a/~ # pvchange --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ "a|/dev/mapper/test1|/dev/mapper/test2|", "r|.*|" ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' --metadataignore n /dev/mapper/test2 Failed to find physical volume "/dev/mapper/test2". 0 physical volumes changed / 0 physical volumes not changed Will inspect further... I'd need to know whether the lvm.conf has obtain_device_list_from_udev=0 and/or use_lvmetad=0 - this would affect how the .cache is created if at all - a parallel command could be run from within udev rules. If such .cache file is written by that LVM command, it's read by the other command then (just like your pvchange). I've managed to reproduce only with the .cache file created in parallel by other LVM command. BTW, you don't need to use obtain_device_list_from_udev=0 in RHEL7 anymore I suppose as all the problems we've seen in RHEL6 with respect to udev and accessing udev database should be resolved in recent udev versions (the version in RHEL6 is ancient and it had bugs - which is why we advised to use obtain_device_list_from_udev=0 there). Whenever possible, it's good to avoid using .cache file - which is what happens when obtain_device_list_from_udev=1 is used - the .cache file is ignored in that case. Anyway, I'll trace further to see the exact path leading to this problem... Created attachment 1071820 [details]
lvm conf
Hi Peter, I've attached the lvm conf file, and send you my Host details my mail. BTW I've upgraded my host since the last time I've opened the bug, so now the LVM version is: lvm2-libs-2.02.129-2.el7.x86_64 lvm2-2.02.129-2.el7.x86_64 Thanks for the access to the machine! I've looked inside and the source of the problem is clear - the .cache file is incorrect and it contains only the vda2 device (and its symlinks), it's missing all the other block devices. When using obtain_device_list_from_udev=0 in the VDSM commands, this .cache file is used. Then LVM sees only that vda2 device (which also corresponds to the verbose log file you sent earlier). We need to find out how this .cache file got created - if all the VDSM commands are run with write_cache_state=0 and globally the LVM is configured with obtain_device_list_from_udev=1 (in which case the .cache is not used/written too), there must be an extra LVM command run somewhere which generates the .cache file (and in incorrect way). Now, I recall we've been fixing something very similar recently: https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=9ea77b788b0822fe06e8efb92d13f80dbc62e351 But this should be included since lvm v2.02.117. Will inspect further... OK, in the end, this was a regression caused by https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=f6473baffc2d0b486b2aa941cf2681683026b3a5 which just missed to handle the case in which the configuration changes either by providing --cache argument for an LVM command or simply by editing lvm.conf file. In either case, if there's existing persistent cache file (/etc/lvm/cache/.cache), it needs to be ignored because the configuration changed and the .cache content may not be valid anymore. https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=2296999cf6922b2be0c6dc38abae0c6766996060 To QA: To test this issue: - make sure you have use_lvmead=0 and obtain_device_list_from_udev=0 $ lvmconfig --type diff global { use_lvmetad=0 } devices { obtain_device_list_from_udev=0 } - create a PV $ pvcreate /dev/sda Physical volume "/dev/sda" successfully created - edit /etc/lvm/cache/.cache file to simulate outdated .cache file which doens't have record about /dev/sda (just keep one device there for example /dev/sdb or whatever else than used in the pvcreate before) persistent_filter_cache { valid_devices=[ "/dev/sdb" ] } - then try to change the sda with --config used (doesn't matter if you not override anything, just the --config use itself is important here): $ pvchange -u --config 'devices/obtain_device_list_from_udev=0' /dev/sda Failed to find physical volume "/dev/sda". 0 physical volumes changed / 0 physical volumes not changed - the same but with the patch applied: $ pvchange -u --config 'devices/obtain_device_list_from_udev=0' /dev/sda Physical volume "/dev/sda" changed 1 physical volume changed / 0 physical volumes not changed Maor, we must require this fix in vdsm. Would you open a vdsm block bug for consuming this package when it is available? Peter, we need this fix also in Fedora; should we open a Fedora bug for this? (In reply to Nir Soffer from comment #19) > Peter, we need this fix also in Fedora; should we open a Fedora bug for this? Nope, F21 and F22 do not have this regression (they have older version < 127). F23 will gain the fix with the rebase. (In reply to Nir Soffer from comment #18) Maor, ignore my need info, I open bug 1272035 for this. Fixed verified in the latest rpms. 3.10.0-322.el7.x86_64 lvm2-2.02.130-5.el7 BUILT: Wed Oct 14 08:27:29 CDT 2015 lvm2-libs-2.02.130-5.el7 BUILT: Wed Oct 14 08:27:29 CDT 2015 lvm2-cluster-2.02.130-5.el7 BUILT: Wed Oct 14 08:27:29 CDT 2015 device-mapper-1.02.107-5.el7 BUILT: Wed Oct 14 08:27:29 CDT 2015 device-mapper-libs-1.02.107-5.el7 BUILT: Wed Oct 14 08:27:29 CDT 2015 device-mapper-event-1.02.107-5.el7 BUILT: Wed Oct 14 08:27:29 CDT 2015 device-mapper-event-libs-1.02.107-5.el7 BUILT: Wed Oct 14 08:27:29 CDT 2015 device-mapper-persistent-data-0.5.5-1.el7 BUILT: Thu Aug 13 09:58:10 CDT 2015 cmirror-2.02.130-5.el7 BUILT: Wed Oct 14 08:27:29 CDT 2015 sanlock-3.2.4-1.el7 BUILT: Fri Jun 19 12:48:49 CDT 2015 sanlock-lib-3.2.4-1.el7 BUILT: Fri Jun 19 12:48:49 CDT 2015 lvm2-lockd-2.02.130-5.el7 BUILT: Wed Oct 14 08:27:29 CDT 2015 use_lvmetad = 0 obtain_device_list_from_udev = 0 [root@host-113 cache]# cat .cache # This file is automatically maintained by lvm. persistent_filter_cache { valid_devices=[ "/dev/sdb1", ] } [root@host-113 cache]# pvchange -u --config 'devices/obtain_device_list_from_udev=0' /dev/sda1 Physical volume "/dev/sda1" changed 1 physical volume changed / 0 physical volumes not changed Peter hi, do we have this fix in 7.2 ? (In reply to Fred Rolland from comment #23) > Peter hi, do we have this fix in 7.2 ? Yes, the fix is included in lvm2-2.02.130-2.el7 and higher (so 7.2). Thanks. What about 7.1 ? Was it backported ? (In reply to Fred Rolland from comment #25) > Thanks. > What about 7.1 ? Was it backported ? Well, looking at this in more detail - this was intra-release bug actually - it was a regression introduced by https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=f6473baffc2d0b486b2aa941cf2681683026b3a5 (that is lvm2 v2.02.127). So it hasn't appeared in any officially released version of lvm2 (7.1 had v2.02.115). Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-2147.html |
Created attachment 1069624 [details] lvm verbose log with pvcreate and pvchange Description of problem: As part of the process of adding an iSCSI storage domain, LVM failed to execute pvchange on a physical volume with the following message: "Failed to find physical volume "/dev/mapper/36001405b37c244cc7cd4fc9ae147faab" This is how pvcreate is called from VDSM: usr/bin/sudo -n /usr/sbin/lvm pvcreate --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ '\''a|/dev/mapper/36001405547004ac2ada44e78a0ed8ec1|/dev/mapper/36001405b37c244cc7cd4fc9ae147faab|'\'', '\''r|.*|'\'' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' --metadatasize 128m --metadatacopies 2 --metadataignore y /dev/mapper/36001405b37c244cc7cd4fc9ae147faab /dev/mapper/36001405547004ac2ada44e78a0ed8ec1 This is how pvchange is called from VDSM: /usr/bin/sudo -n /usr/sbin/lvm pvchange --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ '\''a|/dev/mapper/360014053226a7f0077246e392eaa3a61|/dev/mapper/36001405547004ac2ada44e78a0ed8ec1|/dev/mapper/36001405b37c244cc7cd4fc9ae147faab|/dev/mapper/36001405d747b798180e4aa8bb13ccf2b|'\'', '\''r|.*|'\'' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' --metadataignore n /dev/mapper/36001405b37c244cc7cd4fc9ae147faab Version-Release number of selected component (if applicable): We can see that 36001405b37c244cc7cd4fc9ae147faab does exists in /dev/mapper [root@dhcp-0-138 mapper]# ls -l total 300 lrwxrwxrwx. 1 root root 7 Sep 3 09:17 360014053226a7f0077246e392eaa3a61 -> ../dm-4 lrwxrwxrwx. 1 root root 8 Sep 3 09:19 36001405547004ac2ada44e78a0ed8ec1 -> ../dm-13 lrwxrwxrwx. 1 root root 8 Sep 3 09:19 36001405b37c244cc7cd4fc9ae147faab -> ../dm-14 lrwxrwxrwx. 1 root root 7 Sep 3 09:17 36001405d747b798180e4aa8bb13ccf2b -> ../dm-3 lrwxrwxrwx. 1 root root 8 Sep 3 09:17 aaea6240--4831--4807--912b--b39b8a88ccec-aed953b8--8387--458b--b8e6--72b55272b609 -> ../dm-12 lrwxrwxrwx. 1 root root 8 Sep 3 09:17 aaea6240--4831--4807--912b--b39b8a88ccec-dc999b35--f2c6--4579--8a01--ac1caca6fbb4 -> ../dm-11 lrwxrwxrwx. 1 root root 7 Sep 3 09:17 aaea6240--4831--4807--912b--b39b8a88ccec-ids -> ../dm-8 lrwxrwxrwx. 1 root root 7 Sep 3 09:17 aaea6240--4831--4807--912b--b39b8a88ccec-inbox -> ../dm-9 lrwxrwxrwx. 1 root root 7 Sep 3 09:17 aaea6240--4831--4807--912b--b39b8a88ccec-leases -> ../dm-7 lrwxrwxrwx. 1 root root 8 Sep 3 09:17 aaea6240--4831--4807--912b--b39b8a88ccec-master -> ../dm-10 lrwxrwxrwx. 1 root root 7 Sep 3 09:17 aaea6240--4831--4807--912b--b39b8a88ccec-metadata -> ../dm-5 lrwxrwxrwx. 1 root root 7 Sep 3 09:17 aaea6240--4831--4807--912b--b39b8a88ccec-outbox -> ../dm-6 crw-------. 1 root root 10, 236 Sep 3 09:14 control -rw-r--r--. 1 root root 303809 Sep 3 09:20 screen.log lrwxrwxrwx. 1 root root 7 Sep 3 09:14 vg0-lv_home -> ../dm-2 lrwxrwxrwx. 1 root root 7 Sep 3 09:14 vg0-lv_root -> ../dm-0 lrwxrwxrwx. 1 root root 7 Sep 3 09:14 vg0-lv_swap -> ../dm-1 [root@dhcp-0-138 mapper]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.2 Beta (Maipo) [root@dhcp-0-138 mapper]# rpm -qa | grep lvm lvm2-2.02.127-1.el7.x86_64 lvm2-libs-2.02.127-1.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1. discover iscsi host with iscsiadm 2. login to a discovered target 3. call pvcreate with metadataignore as true 4. call pvchange on the first PV with metadataignore as false Actual results: pvchange fails with the following message: "Failed to find physical volume "/dev/mapper/36001405b37c244cc7cd4fc9ae147faab" Expected results: pvchange should complete successfully. Additional info: