Bug 1244153 - Clone operation fails when use_lvmetad is enabled
Summary: Clone operation fails when use_lvmetad is enabled
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: lvm2
Version: 7.0
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: rc
: ---
Assignee: David Teigland
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On: 1244140
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-07-17 10:24 UTC by lourdu
Modified: 2017-11-29 08:49 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Duplicate PVs would not be recognized correctly with lvmetad enabled. Consequence: LVM may use the wrong PV. Fix: lvmetad is temporarily disabled when duplicate PVs exist, and LVM keeps better track of all the duplicate PVs. Result: LVM can better distinguish duplicate PVs, and can make a better choice about which PV to use.
Clone Of: 1244140
Environment:
Last Closed: 2016-11-04 04:10:14 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1445 normal SHIPPED_LIVE lvm2 bug fix and enhancement update 2016-11-03 13:46:41 UTC

Internal Links: 1421941

Description lourdu 2015-07-17 10:24:36 UTC
Description of problem:
Clone operation is failing when use_lvmetad set to '1' using Snapdrive for UNIX in RHEL 7 and RHEL 7.1.

Version-Release number of selected component (if applicable):
Red Hat Enterprise Linux Server release 7.1

How reproducible:
Perform snapdrive snap connect operation in RHEL 7.x env with enabled use_lvmetad and lvm2-lvmetad.service. 

Steps to Reproduce:
1. Install Snapdrive for UNIX in RHEL 7 or RHEL 7.1 OS.
2. Disable use_lvmetad (set use_lvmetad =1)
3. Disable lvm2-lvmetad.service (systemctl start lvm2-lvmetad.service)
4. Perform snapdrive snap connect operation.

Actual results:
Clone operation is hanging in cleanup step after importing disk groups. If auto mount is working, then snap clone is not working.

Expected results:
Clone operation should succeed without any issues.

Additional info:
SnapDrive for UNIX is a host-based storage and data management solution for UNIX environments. With use_lvmetad set to '0', all snapdrive operations (snap restore, snap connect)works.

Comment 2 Ian Kent 2015-07-17 12:04:28 UTC
Are you sure you have the correct component for this?

If you think you have got the right component post your
autofs maps and a full debug log of the problem along
with a description of how autofs is involved and how
you think it should function wrt. to the map entries
you have configured.

Comment 3 lourdu 2015-07-21 13:35:41 UTC
RHEL 7 and RHEL 7.1 OS reboot enters emergency mode when SAN filespec has persistent entry:

For Snapdrive operations to work, disable use_lvmetad and lvm2-lvmetad.service in RHEL 7 and RHEL 7.1 environment.
When SAN filespec has persistent entries in /etc/fstab file on reboot, host enters emergency mode.
Workaround:
1. Provide root password for maintenance mode  
2. Comment the corresponding san filespec entry in /etc/fstab file
3. Do reboot  
4. Mount the filespec

Question: Is disabling use_lvmetad is recommended as a workaround to customers?

Comment 4 Peter Rajnoha 2015-07-21 14:02:13 UTC
Do you have(In reply to lourdu from comment #0)
> Steps to Reproduce:
> 1. Install Snapdrive for UNIX in RHEL 7 or RHEL 7.1 OS.
> 2. Disable use_lvmetad (set use_lvmetad =1)
> 3. Disable lvm2-lvmetad.service (systemctl start lvm2-lvmetad.service)

(I assume this was supposed to be "enable")

(In reply to lourdu from comment #3)
> Question: Is disabling use_lvmetad is recommended as a workaround to
> customers?

No, all should work even with lvmetad.

However, with lvmetad, you need to be more cautious with configuration because if lvmetad is used, LVM's event-based activation takes place (in contrast to environment where lvmetad is not used and you always need to call vgchange/lvchange --activate y to activate any VGs/LVs on any newly attached device/PV). Hence you need to choose carefully which VGs/LVs should be autoactivated and which should be ignored by setting the activation/auto_activation_volume_list (to select which LVs to automatically activate) and devices/global_filter (to completely filter out devices you don't want LVM to scan).

If you already have devices/filter set, be sure to also have devices/global_filter set also for devices you don't want LVM to be scanned at all and cached by lvmetad.

Comment 5 lourdu 2015-07-22 08:51:26 UTC
>> (I assume this was supposed to be "enable")

Yes.
1. Install Snapdrive for UNIX in RHEL 7 or RHEL 7.1 OS.
2. Enable use_lvmetad (set use_lvmetad =1)
3. Enable lvm2-lvmetad.service (systemctl start lvm2-lvmetad.service)
4. Perform snapdrive snap connect operation

Without lvmetad, all snapdrive operations works fine. 
With lvmetad, snap connect operation hangs. 

Questions:
Are there any implications in disabling lvmetad, if we want to suggest that as a workaround to the customers from our end? 

Why does the OS going to maintenance mode in presence of SAN persistent entries when lvmetad disabled?

Comment 6 lourdu 2015-07-22 09:05:58 UTC
(In reply to Peter Rajnoha from comment #4)

(In reply to lourdu from comment #3)
> Question: Is disabling
> use_lvmetad is recommended as a workaround to
> customers?

No, all should
> work even with lvmetad.

But with local boot, lvmetad is not running by default. Is it necessary to start lvmetad all the time?

However, with lvmetad, you need to be more cautious
> with configuration because if lvmetad is used, LVM's event-based activation
> takes place (in contrast to environment where lvmetad is not used and you
> always need to call vgchange/lvchange --activate y to activate any VGs/LVs
> on any newly attached device/PV). Hence you need to choose carefully which
> VGs/LVs should be autoactivated and which should be ignored by setting the
> activation/auto_activation_volume_list (to select which LVs to automatically
> activate) and devices/global_filter (to completely filter out devices you
> don't want LVM to scan).

If you already have devices/filter set, be sure to
> also have devices/global_filter set also for devices you don't want LVM to
> be scanned at all and cached by lvmetad.

Could you propose the better approach out of these two - disabling lvmetad or instead add global filter (which will stop activation of LV)?

Comment 7 Peter Rajnoha 2015-07-22 10:34:59 UTC
(In reply to lourdu from comment #5)
> Questions:
> Are there any implications in disabling lvmetad, if we want to suggest that
> as a workaround to the customers from our end? 
> 

The main purpose of lvmetad is LVM metadata caching. This prevents numerous scans on each LVM command execution as cached metadata is used which leads to faster LVM command processing mainly if there are lots of disks/physical volumes present in the system.

Also, if lvmetad is enabled, LVM (within its rules for udev event processing) is able to automatically activate volume group as soon as all its physical volumes are present in the system.

So by disabling lvmetad, you'll lose the advantages above.

> Why does the OS going to maintenance mode in presence of SAN persistent
> entries when lvmetad disabled?

We'd probably need to see more debug info here by providing the log by switching systemd and udev into debug mode by adding "debug" to kernel command line at boot.

But most of the time, when there's a line in /etc/fstab which causes system to switch into emergency mode at boot, it's because the device stated in fstab was not found. If this happens only in case you disable lvmetad (and hence you disable event-based LVM autoactivation that way too), there's lvm2-activation-early.service, lvm2-activation.service and lvm2-activation-net.service which is responsible for activating VGs/LVs.

The difference between using LVM autoactivation and using lvm2-activation-*.service is that lvm2-activation-*.service will scan *current state* of the system so if any of the disks (representing PVs) are not initialized/attached at that time, any VGs/LVs on such PVs are not activated. When LVM autoactivation is used, the VG/LV is activated as soon as all PVs are present (so the activation can happen any time, not just at certain point in time during boot).

RHEL7 uses systemd and as such, it reads /etc/fstab content. For each line it finds in /etc/fstab, it creates a separate <mount_name>.mount unit with a timeout which waits for the underlying device to appear. I suppose you're dropped to the emergency shell exactly after this timeout.

We should see all of this in the debug log then (the "debug" keyword added to kernel command line).

> Could you propose the better approach out of these two - disabling lvmetad
> or instead add global filter (which will stop activation of LV)?

Just to make sure - you're cloning devices which hold VGs/LVs, right? Whenever such clones are used, I strongly recommend using global_filter to filter out the clones as otherwise LVM is not able to tell which clone is the right one and it can lead to confusion.

Comment 8 Peter Rajnoha 2015-07-22 10:35:56 UTC
(In reply to Peter Rajnoha from comment #7)
> We'd probably need to see more debug info here by providing the log by
> switching systemd and udev into debug mode by adding "debug" to kernel
> command line at boot.

Also, please attach the /etc/lvm/lvm.conf that is used together with the content of /etc/fstab.

Comment 9 Peter Rajnoha 2015-07-22 10:44:48 UTC
(In reply to Peter Rajnoha from comment #7)
> But most of the time, when there's a line in /etc/fstab which causes system
> to switch into emergency mode at boot, it's because the device stated in
> fstab was not found. If this happens only in case you disable lvmetad (and
> hence you disable event-based LVM autoactivation that way too), there's
> lvm2-activation-early.service, lvm2-activation.service and
> lvm2-activation-net.service which is responsible for activating VGs/LVs.

You can check the state of lvm activation by using:

  - in case use_lvmetad=0:
    "systemctl status lvm2-activation-early.service lvm2-activation.service lvm2-activation-net.service"

  - in case use_lvmetad=1 there are various lvm2-pvscan@major:minor.service systemd units which are responsible for updating lvmetad state and also for the autoactivation itself (systemctl -a shows all systemd units where you can find also all the names of various lvm2-pvscan@major:minor.service units used)

Comment 10 lourdu 2015-07-27 10:30:44 UTC
(In reply to Peter Rajnoha from comment #7)

> Could you propose the
> better approach out of these two - disabling lvmetad
> or instead add global
> filter (which will stop activation of LV)?

Just to make sure - you're
> cloning devices which hold VGs/LVs, right? Whenever such clones are used, I
> strongly recommend using global_filter to filter out the clones as otherwise
> LVM is not able to tell which clone is the right one and it can lead to
> confusion.

Tried global_filter to filter out the VGs/LVs which holds the cloning devices, but i'm not sure of syntax used for global_filter. 

[root@manali ~]# lvdisplay
--- Logical volume ---
LV Path       /dev/testfs_SdDg/testfs_SdHv
LV Name       testfs_SdHv
VG Name       testfs_SdDg
...

For above parent LV, tried global_filter = [ "a|/dev/mapper/testfs_*|" ], [ "a|/dev/testfs_SdDg/testfs_SdHv|" ], [ "a|testfs|" ] and also [ "a|/dev/testfs_*|" ]

But still Clone creation fails. Could you please let me know the correct syntax of using global_filter option?

Comment 11 Peter Rajnoha 2015-07-27 10:44:21 UTC
List all devices which you're sure are valid for scanning and exclude all the others (that is missing in your config you listed above), see also examples in lvm.conf file for the global_filter or filter setting:

    # By default we accept every block device:
    # filter = [ "a/.*/" ]

    # Exclude the cdrom drive
    # filter = [ "r|/dev/cdrom|" ]

    # When testing I like to work with just loopback devices:
    # filter = [ "a/loop/", "r/.*/" ]

    # Or maybe all loops and ide drives except hdc:
    # filter =[ "a|loop|", "r|/dev/hdc|", "a|/dev/ide|", "r|.*|" ]

    # Use anchors if you want to be really specific
    # filter = [ "a|^/dev/hda8$|", "r/.*/" ]

Comment 12 Peter Rajnoha 2015-07-27 10:48:03 UTC
(In reply to Peter Rajnoha from comment #11)
> List all devices which you're sure are valid for scanning and exclude all
> the others (that is missing in your config you listed above)

(...that's the "r|.*|" filter rule for the exclusion used at the end of examples in comment #11)

Comment 13 Peter Rajnoha 2015-07-27 10:52:49 UTC
If you're also using LVs for any system devices (e.g. the one where root filesystem is or any other part needed during bootup), be sure to include this in the filter as "allowed device" too.

Comment 14 Peter Rajnoha 2015-07-31 10:37:59 UTC
Has setting of the global_filter helped (together with the "reject all the other devs" suffix as mentioned in comment #11 and comment #12)? Do you need more assistance?

Comment 15 lourdu 2015-08-04 12:44:21 UTC
(In reply to Peter Rajnoha from comment #14)
> Has setting of the global_filter helped (together with the "reject all the
> other devs" suffix as mentioned in comment #11 and comment #12)? Do you need
> more assistance?

It didn't work Peter and please find the detailed steps followed:

1.	Created a LUN.
2.	Created a LV and add below global filter
 global_filter = [ "a|/dev/sda[0-9]|", "r|^/dev/sd*|", "r|^/dev/mapper/360*|" ]

3.	Global filer will allow, local HDD and block the mapped LUN (/dev/mapper/360*)and all SCSI devices. 

Question: 
Why I have to block underlying SCSI devices? If I block only /dev/mapper/360* it’s not working.

4.	Then clone the LUN on the target and map the same LUN to server, automatically LV comes as active, since the new LUN has different /dev/mapper/360* number and snap connect fails while trying to perform pvcreate -ff -y /dev/mapper/360* (of cloned lun). 

Question:
Is it possible to rename PV when lvmetad is running?

Comment 16 Peter Rajnoha 2015-09-11 08:56:15 UTC
(In reply to lourdu from comment #15)
> 2.	Created a LV and add below global filter
>  global_filter = [ "a|/dev/sda[0-9]|", "r|^/dev/sd*|",
> "r|^/dev/mapper/360*|" ]
> 
> 3.	Global filer will allow, local HDD and block the mapped LUN
> (/dev/mapper/360*)and all SCSI devices. 
> 
> Question: 
> Why I have to block underlying SCSI devices? If I block only
> /dev/mapper/360* it’s not working.
> 

You should be using the top-level device always, not any of the components underneath. In your case, that's the /dev/mapper/360* device which is the top-level one, I suppose. But, please, can you send the output of "lsblk" command, just to make sure what's the exact block device layout on your machine...

> 4.	Then clone the LUN on the target and map the same LUN to server,
> automatically LV comes as active, since the new LUN has different
> /dev/mapper/360* number and snap connect fails while trying to perform
> pvcreate -ff -y /dev/mapper/360* (of cloned lun). 
> 
> Question:
> Is it possible to rename PV when lvmetad is running?

The PV name is the name of the device as found in /dev and that always has a unique name. What matters is the content of the device - the LVM metadata found there.

It all depends on what you're trying to achieve - why is the clone done? If it's just for backup, you need to filter it out. If it's for the purpose of making a copy and then using it independently of the original (so the original and copy diverges), you need to do the following:

  - If the duplicate is just a PV without any VG on it yet, you can just change the PV UUID of the duplicate PV to make it unique again (pvchange -u).

  - If the duplicate PV contained VG metadata, then you will also have duplicate VG - in that case you need to use vgimportclone to make it unique.

It all depends on the use case. All of the above can be done under lvmetad.

So I just need to understand clearly your exact use case for the clones you're creating and attaching/connecting to the your system.

Comment 17 Peter Rajnoha 2015-09-11 09:00:16 UTC
If you have multipath in place, which seems to be the case when I see the name /dev/mapper/360*, then LVM automatically filters out multipath components if you have devices/multipath_component_detection=1 in your lvm.conf set. And this is correct as you should always be using the top-level device, not the components. But I'm not yet sure about your exact setup yet - the lsblk output would provide the info.

Comment 18 Jonathan Earl Brassow 2016-01-22 15:33:17 UTC
This bug is an issue dealing with (apparent) duplicate PVs.  Re-assigning to Dave since he's done most recent work on this.

Comment 19 David Teigland 2016-06-10 20:56:47 UTC
With lvmetad, duplicate PVs cannot be handled correctly.  In 7.3 lvm will have much improved duplicate handling, and lvmetad will be automatically disabled when duplicates exist.

Comment 20 Shivananda 2016-06-13 10:24:21 UTC
Thanks, David for the update that 'lvmetad' will be automatically disabled.
However, I assume that 'lvmetad' will be enabled by 'vgimportlcone' after resolving duplicate PVs.

Comment 21 David Teigland 2016-06-13 14:32:57 UTC
Yes, vgimportclone should attempt to clear the lvmetad disabled state when it's done.

Comment 22 Roman Bednář 2016-07-20 12:13:53 UTC
Adding QA ack for 7.3. Sanity Only.

Comment 24 Roman Bednář 2016-09-20 12:57:28 UTC
Marking as verified. Improved lvmetad handling already tested in bug #1254393.


3.10.0-505.el7.x86_64

lvm2-2.02.165-2.el7    BUILT: Wed Sep 14 16:01:43 CEST 2016
lvm2-libs-2.02.165-2.el7    BUILT: Wed Sep 14 16:01:43 CEST 2016
lvm2-cluster-2.02.165-2.el7    BUILT: Wed Sep 14 16:01:43 CEST 2016
device-mapper-1.02.134-2.el7    BUILT: Wed Sep 14 16:01:43 CEST 2016
device-mapper-libs-1.02.134-2.el7    BUILT: Wed Sep 14 16:01:43 CEST 2016
device-mapper-event-1.02.134-2.el7    BUILT: Wed Sep 14 16:01:43 CEST 2016
device-mapper-event-libs-1.02.134-2.el7    BUILT: Wed Sep 14 16:01:43 CEST 2016
device-mapper-persistent-data-0.6.3-1.el7    BUILT: Fri Jul 22 12:29:13 CEST 2016
cmirror-2.02.165-2.el7    BUILT: Wed Sep 14 16:01:43 CEST 2016

Comment 26 errata-xmlrpc 2016-11-04 04:10:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1445.html

Comment 27 Shobhit Sethi 2016-12-22 11:32:44 UTC
I validated following steps on RHEL 7.3 with latest release of Snapdrive for Unix- 

1. Installed Snapdrive for UNIX in RHEL 7.3 OS.
2. Set use_lvmetad =1
3. Start lvm2-lvmetad.service (systemctl start lvm2-lvmetad.service)
4. Perform snapdrive snap connect operation.

We are still facing the same issue with RHEL 7.3 and need to disable lvmetad and its service for cloning operation.

Comment 28 Abdul Khumani 2017-11-29 08:49:19 UTC
We tried reproducing this issue on RHEL 7.4 with latest release of Snapdrive for Unix:- 

The same issue is not reproducible with RHEL 7.4. All Snapdrive for Unix operations are working fine without applying any workaround.


Note You need to log in before you can comment on or make changes to this bug.