Bug 1244153
Summary: | Clone operation fails when use_lvmetad is enabled | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | lourdu <princy.lourdu> |
Component: | lvm2 | Assignee: | David Teigland <teigland> |
lvm2 sub component: | LVM Metadata / lvmetad | QA Contact: | cluster-qe <cluster-qe> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | urgent | ||
Priority: | unspecified | CC: | abdul.khumani, agk, ayyanar.perumal, fs-qe, heinzm, hkathed, ikent, jbrassow, lmiksik, msnitzer, prajnoha, princy.lourdu, prockai, rbednar, shobhit.sethi, sudharss, vanishri.n, vprabhu, zkabelac |
Version: | 7.0 | ||
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause:
Duplicate PVs would not be recognized correctly with lvmetad enabled.
Consequence:
LVM may use the wrong PV.
Fix:
lvmetad is temporarily disabled when duplicate PVs exist, and LVM keeps better track of all the duplicate PVs.
Result:
LVM can better distinguish duplicate PVs, and can make a better choice about which PV to use.
|
Story Points: | --- |
Clone Of: | 1244140 | Environment: | |
Last Closed: | 2016-11-04 04:10:14 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1244140 | ||
Bug Blocks: |
Description
lourdu
2015-07-17 10:24:36 UTC
Are you sure you have the correct component for this? If you think you have got the right component post your autofs maps and a full debug log of the problem along with a description of how autofs is involved and how you think it should function wrt. to the map entries you have configured. RHEL 7 and RHEL 7.1 OS reboot enters emergency mode when SAN filespec has persistent entry: For Snapdrive operations to work, disable use_lvmetad and lvm2-lvmetad.service in RHEL 7 and RHEL 7.1 environment. When SAN filespec has persistent entries in /etc/fstab file on reboot, host enters emergency mode. Workaround: 1. Provide root password for maintenance mode 2. Comment the corresponding san filespec entry in /etc/fstab file 3. Do reboot 4. Mount the filespec Question: Is disabling use_lvmetad is recommended as a workaround to customers? Do you have(In reply to lourdu from comment #0) > Steps to Reproduce: > 1. Install Snapdrive for UNIX in RHEL 7 or RHEL 7.1 OS. > 2. Disable use_lvmetad (set use_lvmetad =1) > 3. Disable lvm2-lvmetad.service (systemctl start lvm2-lvmetad.service) (I assume this was supposed to be "enable") (In reply to lourdu from comment #3) > Question: Is disabling use_lvmetad is recommended as a workaround to > customers? No, all should work even with lvmetad. However, with lvmetad, you need to be more cautious with configuration because if lvmetad is used, LVM's event-based activation takes place (in contrast to environment where lvmetad is not used and you always need to call vgchange/lvchange --activate y to activate any VGs/LVs on any newly attached device/PV). Hence you need to choose carefully which VGs/LVs should be autoactivated and which should be ignored by setting the activation/auto_activation_volume_list (to select which LVs to automatically activate) and devices/global_filter (to completely filter out devices you don't want LVM to scan). If you already have devices/filter set, be sure to also have devices/global_filter set also for devices you don't want LVM to be scanned at all and cached by lvmetad. >> (I assume this was supposed to be "enable")
Yes.
1. Install Snapdrive for UNIX in RHEL 7 or RHEL 7.1 OS.
2. Enable use_lvmetad (set use_lvmetad =1)
3. Enable lvm2-lvmetad.service (systemctl start lvm2-lvmetad.service)
4. Perform snapdrive snap connect operation
Without lvmetad, all snapdrive operations works fine.
With lvmetad, snap connect operation hangs.
Questions:
Are there any implications in disabling lvmetad, if we want to suggest that as a workaround to the customers from our end?
Why does the OS going to maintenance mode in presence of SAN persistent entries when lvmetad disabled?
(In reply to Peter Rajnoha from comment #4) (In reply to lourdu from comment #3) > Question: Is disabling > use_lvmetad is recommended as a workaround to > customers? No, all should > work even with lvmetad. But with local boot, lvmetad is not running by default. Is it necessary to start lvmetad all the time? However, with lvmetad, you need to be more cautious > with configuration because if lvmetad is used, LVM's event-based activation > takes place (in contrast to environment where lvmetad is not used and you > always need to call vgchange/lvchange --activate y to activate any VGs/LVs > on any newly attached device/PV). Hence you need to choose carefully which > VGs/LVs should be autoactivated and which should be ignored by setting the > activation/auto_activation_volume_list (to select which LVs to automatically > activate) and devices/global_filter (to completely filter out devices you > don't want LVM to scan). If you already have devices/filter set, be sure to > also have devices/global_filter set also for devices you don't want LVM to > be scanned at all and cached by lvmetad. Could you propose the better approach out of these two - disabling lvmetad or instead add global filter (which will stop activation of LV)? (In reply to lourdu from comment #5) > Questions: > Are there any implications in disabling lvmetad, if we want to suggest that > as a workaround to the customers from our end? > The main purpose of lvmetad is LVM metadata caching. This prevents numerous scans on each LVM command execution as cached metadata is used which leads to faster LVM command processing mainly if there are lots of disks/physical volumes present in the system. Also, if lvmetad is enabled, LVM (within its rules for udev event processing) is able to automatically activate volume group as soon as all its physical volumes are present in the system. So by disabling lvmetad, you'll lose the advantages above. > Why does the OS going to maintenance mode in presence of SAN persistent > entries when lvmetad disabled? We'd probably need to see more debug info here by providing the log by switching systemd and udev into debug mode by adding "debug" to kernel command line at boot. But most of the time, when there's a line in /etc/fstab which causes system to switch into emergency mode at boot, it's because the device stated in fstab was not found. If this happens only in case you disable lvmetad (and hence you disable event-based LVM autoactivation that way too), there's lvm2-activation-early.service, lvm2-activation.service and lvm2-activation-net.service which is responsible for activating VGs/LVs. The difference between using LVM autoactivation and using lvm2-activation-*.service is that lvm2-activation-*.service will scan *current state* of the system so if any of the disks (representing PVs) are not initialized/attached at that time, any VGs/LVs on such PVs are not activated. When LVM autoactivation is used, the VG/LV is activated as soon as all PVs are present (so the activation can happen any time, not just at certain point in time during boot). RHEL7 uses systemd and as such, it reads /etc/fstab content. For each line it finds in /etc/fstab, it creates a separate <mount_name>.mount unit with a timeout which waits for the underlying device to appear. I suppose you're dropped to the emergency shell exactly after this timeout. We should see all of this in the debug log then (the "debug" keyword added to kernel command line). > Could you propose the better approach out of these two - disabling lvmetad > or instead add global filter (which will stop activation of LV)? Just to make sure - you're cloning devices which hold VGs/LVs, right? Whenever such clones are used, I strongly recommend using global_filter to filter out the clones as otherwise LVM is not able to tell which clone is the right one and it can lead to confusion. (In reply to Peter Rajnoha from comment #7) > We'd probably need to see more debug info here by providing the log by > switching systemd and udev into debug mode by adding "debug" to kernel > command line at boot. Also, please attach the /etc/lvm/lvm.conf that is used together with the content of /etc/fstab. (In reply to Peter Rajnoha from comment #7) > But most of the time, when there's a line in /etc/fstab which causes system > to switch into emergency mode at boot, it's because the device stated in > fstab was not found. If this happens only in case you disable lvmetad (and > hence you disable event-based LVM autoactivation that way too), there's > lvm2-activation-early.service, lvm2-activation.service and > lvm2-activation-net.service which is responsible for activating VGs/LVs. You can check the state of lvm activation by using: - in case use_lvmetad=0: "systemctl status lvm2-activation-early.service lvm2-activation.service lvm2-activation-net.service" - in case use_lvmetad=1 there are various lvm2-pvscan@major:minor.service systemd units which are responsible for updating lvmetad state and also for the autoactivation itself (systemctl -a shows all systemd units where you can find also all the names of various lvm2-pvscan@major:minor.service units used) (In reply to Peter Rajnoha from comment #7) > Could you propose the > better approach out of these two - disabling lvmetad > or instead add global > filter (which will stop activation of LV)? Just to make sure - you're > cloning devices which hold VGs/LVs, right? Whenever such clones are used, I > strongly recommend using global_filter to filter out the clones as otherwise > LVM is not able to tell which clone is the right one and it can lead to > confusion. Tried global_filter to filter out the VGs/LVs which holds the cloning devices, but i'm not sure of syntax used for global_filter. [root@manali ~]# lvdisplay --- Logical volume --- LV Path /dev/testfs_SdDg/testfs_SdHv LV Name testfs_SdHv VG Name testfs_SdDg ... For above parent LV, tried global_filter = [ "a|/dev/mapper/testfs_*|" ], [ "a|/dev/testfs_SdDg/testfs_SdHv|" ], [ "a|testfs|" ] and also [ "a|/dev/testfs_*|" ] But still Clone creation fails. Could you please let me know the correct syntax of using global_filter option? List all devices which you're sure are valid for scanning and exclude all the others (that is missing in your config you listed above), see also examples in lvm.conf file for the global_filter or filter setting: # By default we accept every block device: # filter = [ "a/.*/" ] # Exclude the cdrom drive # filter = [ "r|/dev/cdrom|" ] # When testing I like to work with just loopback devices: # filter = [ "a/loop/", "r/.*/" ] # Or maybe all loops and ide drives except hdc: # filter =[ "a|loop|", "r|/dev/hdc|", "a|/dev/ide|", "r|.*|" ] # Use anchors if you want to be really specific # filter = [ "a|^/dev/hda8$|", "r/.*/" ] (In reply to Peter Rajnoha from comment #11) > List all devices which you're sure are valid for scanning and exclude all > the others (that is missing in your config you listed above) (...that's the "r|.*|" filter rule for the exclusion used at the end of examples in comment #11) If you're also using LVs for any system devices (e.g. the one where root filesystem is or any other part needed during bootup), be sure to include this in the filter as "allowed device" too. Has setting of the global_filter helped (together with the "reject all the other devs" suffix as mentioned in comment #11 and comment #12)? Do you need more assistance? (In reply to Peter Rajnoha from comment #14) > Has setting of the global_filter helped (together with the "reject all the > other devs" suffix as mentioned in comment #11 and comment #12)? Do you need > more assistance? It didn't work Peter and please find the detailed steps followed: 1. Created a LUN. 2. Created a LV and add below global filter global_filter = [ "a|/dev/sda[0-9]|", "r|^/dev/sd*|", "r|^/dev/mapper/360*|" ] 3. Global filer will allow, local HDD and block the mapped LUN (/dev/mapper/360*)and all SCSI devices. Question: Why I have to block underlying SCSI devices? If I block only /dev/mapper/360* it’s not working. 4. Then clone the LUN on the target and map the same LUN to server, automatically LV comes as active, since the new LUN has different /dev/mapper/360* number and snap connect fails while trying to perform pvcreate -ff -y /dev/mapper/360* (of cloned lun). Question: Is it possible to rename PV when lvmetad is running? (In reply to lourdu from comment #15) > 2. Created a LV and add below global filter > global_filter = [ "a|/dev/sda[0-9]|", "r|^/dev/sd*|", > "r|^/dev/mapper/360*|" ] > > 3. Global filer will allow, local HDD and block the mapped LUN > (/dev/mapper/360*)and all SCSI devices. > > Question: > Why I have to block underlying SCSI devices? If I block only > /dev/mapper/360* it’s not working. > You should be using the top-level device always, not any of the components underneath. In your case, that's the /dev/mapper/360* device which is the top-level one, I suppose. But, please, can you send the output of "lsblk" command, just to make sure what's the exact block device layout on your machine... > 4. Then clone the LUN on the target and map the same LUN to server, > automatically LV comes as active, since the new LUN has different > /dev/mapper/360* number and snap connect fails while trying to perform > pvcreate -ff -y /dev/mapper/360* (of cloned lun). > > Question: > Is it possible to rename PV when lvmetad is running? The PV name is the name of the device as found in /dev and that always has a unique name. What matters is the content of the device - the LVM metadata found there. It all depends on what you're trying to achieve - why is the clone done? If it's just for backup, you need to filter it out. If it's for the purpose of making a copy and then using it independently of the original (so the original and copy diverges), you need to do the following: - If the duplicate is just a PV without any VG on it yet, you can just change the PV UUID of the duplicate PV to make it unique again (pvchange -u). - If the duplicate PV contained VG metadata, then you will also have duplicate VG - in that case you need to use vgimportclone to make it unique. It all depends on the use case. All of the above can be done under lvmetad. So I just need to understand clearly your exact use case for the clones you're creating and attaching/connecting to the your system. If you have multipath in place, which seems to be the case when I see the name /dev/mapper/360*, then LVM automatically filters out multipath components if you have devices/multipath_component_detection=1 in your lvm.conf set. And this is correct as you should always be using the top-level device, not the components. But I'm not yet sure about your exact setup yet - the lsblk output would provide the info. This bug is an issue dealing with (apparent) duplicate PVs. Re-assigning to Dave since he's done most recent work on this. With lvmetad, duplicate PVs cannot be handled correctly. In 7.3 lvm will have much improved duplicate handling, and lvmetad will be automatically disabled when duplicates exist. Thanks, David for the update that 'lvmetad' will be automatically disabled. However, I assume that 'lvmetad' will be enabled by 'vgimportlcone' after resolving duplicate PVs. Yes, vgimportclone should attempt to clear the lvmetad disabled state when it's done. Adding QA ack for 7.3. Sanity Only. Marking as verified. Improved lvmetad handling already tested in bug #1254393. 3.10.0-505.el7.x86_64 lvm2-2.02.165-2.el7 BUILT: Wed Sep 14 16:01:43 CEST 2016 lvm2-libs-2.02.165-2.el7 BUILT: Wed Sep 14 16:01:43 CEST 2016 lvm2-cluster-2.02.165-2.el7 BUILT: Wed Sep 14 16:01:43 CEST 2016 device-mapper-1.02.134-2.el7 BUILT: Wed Sep 14 16:01:43 CEST 2016 device-mapper-libs-1.02.134-2.el7 BUILT: Wed Sep 14 16:01:43 CEST 2016 device-mapper-event-1.02.134-2.el7 BUILT: Wed Sep 14 16:01:43 CEST 2016 device-mapper-event-libs-1.02.134-2.el7 BUILT: Wed Sep 14 16:01:43 CEST 2016 device-mapper-persistent-data-0.6.3-1.el7 BUILT: Fri Jul 22 12:29:13 CEST 2016 cmirror-2.02.165-2.el7 BUILT: Wed Sep 14 16:01:43 CEST 2016 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-1445.html I validated following steps on RHEL 7.3 with latest release of Snapdrive for Unix- 1. Installed Snapdrive for UNIX in RHEL 7.3 OS. 2. Set use_lvmetad =1 3. Start lvm2-lvmetad.service (systemctl start lvm2-lvmetad.service) 4. Perform snapdrive snap connect operation. We are still facing the same issue with RHEL 7.3 and need to disable lvmetad and its service for cloning operation. We tried reproducing this issue on RHEL 7.4 with latest release of Snapdrive for Unix:- The same issue is not reproducible with RHEL 7.4. All Snapdrive for Unix operations are working fine without applying any workaround. |