Bug 951600
Summary: | Provide error-free read-only access to clustered VG metadata visible on a non-clustered system using --readonly. | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Anil Vettathu <avettath> | |
Component: | lvm2 | Assignee: | Alasdair Kergon <agk> | |
lvm2 sub component: | Displaying and Reporting (RHEL6) | QA Contact: | Cluster QE <mspqa-list> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | medium | |||
Priority: | medium | CC: | agk, amureini, anande, avyadav, bazulay, cmarthal, cpelland, cshao, cww, dparikh, dwysocha, gouyang, hadong, hchiramm, heinzm, huiwa, iheim, jbrassow, jkt, jraju, leiwang, lpeer, lyarwood, marcobillpeter, mkalinin, msnitzer, mspqa-list, nperic, prajnoha, prockai, psubrama, rbalakri, sbhat, scohen, slevine, spanjikk, sputhenp, thornber, tvvcox, yaniwang, ycui, yeylon, zkabelac | |
Version: | 6.5 | |||
Target Milestone: | rc | |||
Target Release: | 6.6 | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | storage | |||
Fixed In Version: | lvm2-2.02.107-1.el6 | Doc Type: | Enhancement | |
Doc Text: |
LVM commands that report the state of Logical Volumes, Volume Groups or Physical Volumes acquire a new command-line parameter: --readonly.
This uses a special read-only mode that accesses on-disk metadata without needing locks.
Uses include:
* peeking at disks inside virtual machines while they are in use;
* peeking inside Volume Groups that are marked as clustered when the necessary clustered locking is unavailable for whatever reason.
The commands are unable to report whether or not Logical Volumes are actually in use because there is no communication with any device-mapper kernel driver. The lv_attr field of the 'lvs' command shows an X where this information would normally appear.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1116944 (view as bug list) | Environment: | ||
Last Closed: | 2014-10-14 08:24:14 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 820991 | |||
Bug Blocks: | 988951, 1116944 |
Comment 8
Ayal Baron
2013-04-28 11:21:22 UTC
Please obtain a -vvvv trace, as usual for any LVM question, so we can see what is happening. (No PVs are listed explicitly on the command line, so it expands the command line to the list of all PVs, then finds that some are in clustered VGs so skips them and gives the error because the output requested is incomplete. As a workaround, try listing the PVs to be reported upon explicitly on the command line.) I can only speculate about what might be going on, but my quick attempt to replicate this without RHEV with the current upstream LVM2 code has not shown this behaviour. So it might be something already fixed upstream that could be backported or it might be something specific to the RHEV environment. The exit code 5 is sensible in these circumstances from lvm's point of view: Say PV1 is clustered and PV2 is not clustered. pvs PV1 - gives error pvs PV2 - gives success pvs PV1 PV2 - gives error pvs - gives error Now I suppose we could look into adding a global lvm.conf option to ignore clustered VGs silently - IOW providing incomplete output in these circumstances silently. That would mean that 'pvs' would hide PVs that were clustered without indicating that it had done so. (I'd be very worried about an alternative of including these PVs in the output with information in the columns that is not reliable and still returning success.) With a 'hide clustered VG' option perhaps: pvs PV1 - error pvs PV2 - PV2 displayed OK; success pvs PV1 PV2 - PV2 displayed OK; error from PV1 pvs - PV2 displayed; success pvs -a - PV2 displayed; success The rules here are: clustered PV1 is never displayed If clustered PV1 is explicitly mentioned on the cmdline, you still get an error (In reply to comment #43) > Now I suppose we could look into adding a global lvm.conf option to ignore > clustered VGs silently - IOW providing incomplete output in these > circumstances silently. That would mean that 'pvs' would hide PVs that were > clustered without indicating that it had done so. ...this is already reported as bug #820991. (In reply to comment #47) > (In reply to comment #43) > > Now I suppose we could look into adding a global lvm.conf option to ignore > > clustered VGs silently - IOW providing incomplete output in these > > circumstances silently. That would mean that 'pvs' would hide PVs that were > > clustered without indicating that it had done so. > > ...this is already reported as bug #820991. I cannot say that I agree with the approach since the use case is totally valid and from the user pov she did nothing wrong yet things start to fail. However, if this is the approach we're going to take, since this has hit us multiple times already, can we increase priority of this ability? We will then proceed to incorporate it in vdsm's default config (which is passed from command line, not relying on lvm.conf). It would probably be necessary though to be able to list the clustered PVs (just names would suffice) (In reply to comment #48) > I cannot say that I agree with the approach since the use case is totally > valid and from the user pov she did nothing wrong yet things start to fail. From the lvm point of view this is what we have: A system containing some PVs in clustered VGs and some in non-clustered VGs. To obtain information about the clustered VGs you *must* run with clustered locking enabled. If you ask for information about the clustered VGs without clustered locking, lvm commands skip over them and return an error. You can run queries against the non-clustered VGs and not get an error. The current behaviour of lvm is sensible and correct. So: What is the minimum amount of information it is necessary for vdsm to obtain about these clustered VGs without obtaining clustered locks? Is there then some safe and correct way for lvm tools to provide that information without obtaining clustered locks? For example, what does vdsm try to do with pe_alloc_count on one of these clustered VGs? For example, could vdsm make do with just the "Physical Volume Label Fields" in respect of the PVs that are in clustered VGs? pvs -o help Physical Volume Label Fields ---------------------------- pv_all - All fields in this section. pv_fmt - Type of metadata. pv_uuid - Unique identifier. dev_size - Size of underlying device in current units. pv_name - Name. pv_mda_free - Free metadata area space on this device in current units. pv_mda_size - Size of smallest metadata area on this device in current units. (pv_all there is a bug) What we're currently retrieving is: uuid,name,size,vg_name,vg_uuid,pe_start,pe_count,pe_alloc_count,mda_count,dev_size However, we can split this into 2 different calls: 1. for getting physical info as you suggest (but we would need an indication of whether the pv belongs to a vg, just to know if it's in use or not). 2. for getting vg related info on specific PVs that we manage (obviously would need to be non-clustered). So that would mean: 1) run pvs for the PV label fields - this will tell you all PVs including clustered ones without error, but would not tell you which are clustered and which are not Then either 2a) run pvs with the VG-based fields (the other PV ones are actually obtained from the VG metadata) specifying the PVs you are interested in that you know are not clustered. Is this possible for you and sufficient? or 2b) We change lvm as described in comment #45, and then you run pvs with the new option and get given details of all PVs in VGs that are not clustered, without error. You get no further information about the clustered VG. ----- If that's still insufficient, then 3) We need to look in more detail at 2b to see if there is any other way we can indicate which PVs appeared to belong to clustered VGs and which PVs don't, if it's important for you to know that. what happens if sysadmin or someone mistakenly set 'cluster bit' ( may be from other server ) on ONE of the VG ( == RHEV storage domain) and using it ? if vdsm skip the 'cluster bit' on it and proceeded, it can cause issues in future, Isn't it ? Is there a way to get a list of VGs 'skipped becuase of cluster bit set' from 'pvs' ? if yes, can't vdsm proceed and check whether vdsm really care about those VGs considering the storage domain (== vg) which in control of RHEV always have 'RHAT_storage_domain' tag with it? I may be missing something, but thought of sharing these bits if it helps. (In reply to Alasdair Kergon from comment #54) > So that would mean: > > 1) run pvs for the PV label fields > - this will tell you all PVs including clustered ones without error, but > would not tell you which are clustered and which are not Why not specify which PVs are clustered so we'd know to skip them? > > Then either > > 2a) run pvs with the VG-based fields (the other PV ones are actually > obtained from the VG metadata) specifying the PVs you are interested in that > you know are not clustered. Is this possible for you and sufficient? How would we differentiate? > > or > > 2b) We change lvm as described in comment #45, and then you run pvs with the > new option and get given details of all PVs in VGs that are not clustered, > without error. > > You get no further information about the clustered VG. > > > ----- > > If that's still insufficient, then > > 3) We need to look in more detail at 2b to see if there is any other way we > can indicate which PVs appeared to belong to clustered VGs and which PVs > don't, if it's important for you to know that. It is since we need to report this to the user. There are several flows here: 1. creating a new VG - we need to be able to report a list of PVs/potential PVs and what they have on them (could be that clustered vg is an old remnant that needs to be overwritten). 2. on going work - need to ignore clustered PVs since clearly we don't need them. For number 2 I'm fine with LVM ops just ignoring these PVs (we'll pass the required config param). For number 1 we need a way to make the distinction. (In reply to Ayal Baron from comment #56) > Why not specify which PVs are clustered so we'd know to skip them? 'Clustered' is a VG property and is not available with PV label fields. > 2. on going work - need to ignore clustered PVs since clearly we don't need > them. > For number 2 I'm fine with LVM ops just ignoring these PVs (we'll pass the > required config param). OK - so something like comment #45 could cover case 2. > 1. creating a new VG - we need to be able to report a list of PVs/potential > PVs and what they have on them (could be that clustered vg is an old remnant > that needs to be overwritten). > For number 1 we need a way to make the distinction. At the point of "creating a new VG" - you *know* that the volumes are not in use and the lvm cluster locking is not active (as you're contemplating overwriting it) and therefore it is safe to access the VG while not holding the clustered lock that would normally be required? Otherwise on what basis are you trying to access the clustered VG information? If this metadata was added by a guest that is running, shouldn't you query lvm inside that running guest to obtain this information definitively? Or is what you're really asking for here some mechanism to peer inside volumes owned/shared by a guest from outside *while they might be in use* and without any co-operation from the guest? (In reply to Alasdair Kergon from comment #58) > > 1. creating a new VG - we need to be able to report a list of PVs/potential > > PVs and what they have on them (could be that clustered vg is an old remnant > > that needs to be overwritten). > > > For number 1 we need a way to make the distinction. > > > At the point of "creating a new VG" - you *know* that the volumes are not in > use and the lvm cluster locking is not active (as you're contemplating > overwriting it) and therefore it is safe to access the VG while not holding > the clustered lock that would normally be required? Otherwise on what basis > are you trying to access the clustered VG information? If this metadata was > added by a guest that is running, shouldn't you query lvm inside that > running guest to obtain this information definitively? > > > Or is what you're really asking for here some mechanism to peer inside > volumes owned/shared by a guest from outside *while they might be in use* > and without any co-operation from the guest? This is exactly what I'm asking for here, the ability to determine whether a PV is part of a clustered VG so that: 1. we can know not to touch it 2. we can tell the user 'there be danger' Under bug 820991, I have added an --ignoreskippedcluster option that allows commands to ignore clustered objects that they cannot properly read, which may form part of the solution to this problem. "2. on going work - need to ignore clustered PVs since clearly we don't need them. For number 2 I'm fine with LVM ops just ignoring these PVs (we'll pass the required config param)." So far the new parameter is accepted by: pvs, vgs, lvs, pvdisplay, vgdisplay, lvdisplay, vgchange, lvchange Is it needed by any other commands at this stage? (In reply to Alasdair Kergon from comment #68) > "2. on going work - need to ignore clustered PVs since clearly we don't need > them. > For number 2 I'm fine with LVM ops just ignoring these PVs (we'll pass the > required config param)." > > > So far the new parameter is accepted by: > > pvs, vgs, lvs, pvdisplay, vgdisplay, lvdisplay, vgchange, lvchange > > Is it needed by any other commands at this stage? none that I can think of So we need to be able to report upon the state of any Volume Group metadata regardless of whether it is being used, whether it is consistent, whether it is being changed at the time it is being accessed. Any locks we might take during these operations would be meaningless because we are running in a different domain from the one that holds the locks. We cannot lock the objects. This is like global/locking_type 0 (no locking). The metadata cannot be cached. This means lvmetad must not be used. This is like global/use_lvmetad = 0. We must not perform any action that attempts to change the metadata. This is like locking/metadata_read_only = 1. # If set to 1, no operations that change on-disk metadata will be permitted. # Additionally, read-only commands that encounter metadata in need of repair # will still be allowed to proceed exactly as if the repair had been # performed (except for the unchanged vg_seqno). We must not activate the LV nor probe any activation state (because we have no direct access to the domain where it might be active). This is like --driverloaded n Whether or not the device-mapper kernel driver is loaded. If you set this to n, no attempt will be made to contact the driver. We must not attempt to backup any metadata because no local metadata backup is seen. This is like --autobackup n. Commands needing this support would be: pvs, vgs, lvs, pvdisplay, vgdisplay, lvdisplay, vgcfgbackup plus lvmdiskscan, lvscan, pvscan, vgscan for completeness plus built-ins that don't use metadata This amounts to all commands flagged PERMITTED_READ_ONLY with the exception of the xxchange commands which could do nothing useful under these restrictions. I'm experimenting with adding a new hybrid command line option --readonly to the 11 commands I mentioned above that will set up the particular configuration described in comment 88. --readonly Run the command in a special read-only mode which will read on-disk metadata without needing to take any locks. This can be used to peek inside metadata used by a virtual machine image while the virtual machine is running. It can also be used to peek inside the metadata of clustered Volume Groups when clus- tered locking is not configured or running. No attempt will be made to communicate with the device-mapper kernel driver, so this option is unable to report whether or not Logical Volumes are actually in use. https://www.redhat.com/archives/lvm-devel/2014-April/msg00098.html Also changed lv_attr char to X when we don't know the right value because the --readonly flag has no access to the LV's lock domain and removed 'LV Status' from lvdisplay output. LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert lvol1 vg2 twi-XXtz-- 52.00m Before: # vgs Skipping clustered volume group vg3 After: # vgs --readonly VG #PV #LV #SN Attr VSize VFree vg3 2 1 0 wz--nc 56.00m 44.00m Didn't add it to vgscan as that's pointless. This option should work on a machine that can see disks belonging to a cluster it is not a member of. It should also work when pointed at disks inside a running VM. It should also cope if the cluster or VM is changing the metadata while the command is run. (In some cases it will issue some warnings if it detects this happening, but it should still report the metadata.) [root@virt-063 yum.repos.d]# vgs connect() failed on local socket: No such file or directory Internal cluster locking initialisation failed. WARNING: Falling back to local file-based locking. Volume Groups with the clustered attribute will be inaccessible. Skipping clustered volume group cluster Skipping volume group cluster VG #PV #LV #SN Attr VSize VFree vg_virt063 1 2 0 wz--n- 7.51g 0 [root@virt-063 yum.repos.d]# vgs --readonly VG #PV #LV #SN Attr VSize VFree cluster 5 2 0 wz--nc 74.98g 18.75g vg_virt063 1 2 0 wz--n- 7.51g 0 [root@virt-063 yum.repos.d]# lvs --readonly LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert biglv1 cluster -wi-XX---- 18.75g biglv2 cluster -wi-XX---- 37.49g lv_root vg_virt063 -wi-XX---- 6.71g lv_swap vg_virt063 -wi-XX---- 816.00m [root@virt-063 yum.repos.d]# [root@virt-063 yum.repos.d]# pvs connect() failed on local socket: No such file or directory Internal cluster locking initialisation failed. WARNING: Falling back to local file-based locking. Volume Groups with the clustered attribute will be inaccessible. Skipping clustered volume group cluster Skipping volume group cluster Skipping clustered volume group cluster Skipping volume group cluster Skipping clustered volume group cluster Skipping volume group cluster Skipping clustered volume group cluster Skipping volume group cluster Skipping clustered volume group cluster Skipping volume group cluster PV VG Fmt Attr PSize PFree /dev/vda2 vg_virt063 lvm2 a-- 7.51g 0 [root@virt-063 yum.repos.d]# pvs --readonly PV VG Fmt Attr PSize PFree /dev/sda cluster lvm2 a-- 15.00g 0 /dev/sdb cluster lvm2 a-- 15.00g 0 /dev/sdc cluster lvm2 a-- 15.00g 3.75g /dev/sdd cluster lvm2 a-- 15.00g 0 /dev/sdh cluster lvm2 a-- 15.00g 15.00g /dev/vda2 vg_virt063 lvm2 a-- 7.51g 0 Marking VERIFIED with: lvm2-2.02.107-2.el6 BUILT: Fri Jul 11 15:47:33 CEST 2014 lvm2-libs-2.02.107-2.el6 BUILT: Fri Jul 11 15:47:33 CEST 2014 lvm2-cluster-2.02.107-2.el6 BUILT: Fri Jul 11 15:47:33 CEST 2014 udev-147-2.56.el6 BUILT: Fri Jul 11 16:53:07 CEST 2014 device-mapper-1.02.86-2.el6 BUILT: Fri Jul 11 15:47:33 CEST 2014 device-mapper-libs-1.02.86-2.el6 BUILT: Fri Jul 11 15:47:33 CEST 2014 device-mapper-event-1.02.86-2.el6 BUILT: Fri Jul 11 15:47:33 CEST 2014 device-mapper-event-libs-1.02.86-2.el6 BUILT: Fri Jul 11 15:47:33 CEST 2014 device-mapper-persistent-data-0.3.2-1.el6 BUILT: Fri Apr 4 15:43:06 CEST 2014 cmirror-2.02.107-2.el6 BUILT: Fri Jul 11 15:47:33 CEST 2014 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-1387.html |