1571885 – use a local file to optimize device scanning

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1571885 - use a local file to optimize device scanning

Summary: use a local file to optimize device scanning

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	lvm2
Sub Component:
Version:	7.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	LVM and device-mapper development team
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-04-25 15:19 UTC by David Teigland
Modified:	2021-09-03 12:41 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-10-09 16:00:19 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	229560	0	medium	CLOSED	RFE: lvm2 cache - write vg name in the .cache file	2021-02-22 00:41:40 UTC
Red Hat Bugzilla	586791	0	low	CLOSED	Add support for persistent cache	2021-02-22 00:41:40 UTC

Description David Teigland 2018-04-25 15:19:38 UTC

Description of problem:

In the past lvm used a local cache file to give hints that an lvm command could use to optimize device scanning.  We should bring back something like this.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Zdenek Kabelac 2018-04-25 19:15:58 UTC

One of the reason why the cache file was dropped - was it's usage of udev device list.

So cache file with list of block devices itself become obsoleted - since all known block devices in the system are now listed from udev.

So naturally udev could be seen as a good place where to store information about VG/PV uuid - so while getting list of devices from udev - we could also obtain list of VGs/PVs.

The catch however seems to be coming from problematic usage of udev with duplicates and other unpredictable error behavior.

So the current existing reason why lvm2 still relies in almost all cases on its own scanning.

Adding support for maintenance of caching files (probably stored somewhere in /run/lvm/cache directory) clearly means to put a some code into keeping directory consistent - but that also means new udev machinery to keep cachec in sync - and we have not managed to get it right for lvmetad.

So IMHO we inevitably hit likely same issues we are observing with lvmetad with a caching files as well - so my best impression here is - udev needs to be fixed - and almost are problem will magically disappear.

The clear advantage would be - there would be no 'daemon' consuming memory resources uselessly most of the time and of course we can drop lots of code that sits in lvmetad and duplicated/replicated lvm2 command handling and MUCH more simpler testing ;) - since we do care about less daemons!

The minor disadvantage - it could be possibly a bit slower and consumes ramdisk.
and we get new 'locks' into commands that were previously hidden in lvmetad for updating of cache dir.

Comment 3 Peter Rajnoha 2018-04-26 08:36:58 UTC

(In reply to Zdenek Kabelac from comment #2)
> One of the reason why the  cache file was dropped - was it's usage of udev
> device list.
> 

I think the reason was more in the fact that when we used /etc/lvm/cache/.cache file, we needed to run vgscan after a new device was addeded to system (also documented for our users here https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/4/html/Cluster_Logical_Volume_Manager/vgscan.html).

With the obtain_device_list_from_udev=1, we had the opportunity to stop using the .cache file. But not because it wouldn't be useful much (iirc, it contained the result of last filter-chain execution from last LVM command so some of the filter didn't need to be reexecuted again), but we stopped using it because there were various other bugs reported related to its use. So the obtain_device_list_from_udev=1 was just an "excuse" to stop using .cache file. However, we could still use it even with obtain_device_list_from_udev=1 - this one only gets list of block devices from udev, but we could still apply persistent filter to it so we don't need to reexecute the whole filter chain again.

The issue is, the .cache file was problematic (various bugs reported against that) and we needed to get rid of it and we rather applied the filter chain again.

As for filtering, we can reuse information that is already available in udev by using external_device_info_source="udev" (e.g. that way we can read info about MD components, mpath components and similar so we don't need to reexecute the filters that access the device directly ourselves).

Comment 4 David Teigland 2018-04-26 14:52:11 UTC

I suggest that we narrow the scope of this hint file to one or two simple, common cases.  Start with a command like 'vgs foo', which generally should only need to read the PVs for the named VG.  In this case, if the hint file contained:

device vgname seqno

sda foo 5
sdb foo 5
sdc foo 5

then the vgs command could begin by just scanning those three devices, which usually entails a single read i/o to each one.  After this scan, we would check that the seqno we saw from the scan from each of those devs matched the seqno from the hint file.  If so, we'd continue to vg_read and command processing, without any other reading.  Can anyone think of problematic cases with this example?

Comment 5 Zdenek Kabelac 2018-04-26 15:23:10 UTC

One file is likely a problem - since we now would need to maintain it's updating - and possibly serialize all unrelated VGs on access to 1 single file.
So IMHO that exclude single file.

Also the amount of info we need to store should probably list

PV, LV, VG  - all names + all UUID

So it can 'speedup' access without scanning for it.

One lost feature in time was -  lvm2 used the do 'lite-scan' based on content of device cache - when device was not there - fallback to full scan.
Unsure how this now would play back together.

Note You need to log in before you can comment on or make changes to this bug.