Bug 1067350

Summary:	Reuse existing blkid/udev information to filter devices in lvm2
Product:	[Fedora] Fedora	Reporter:	Peter Rajnoha <prajnoha>
Component:	lvm2	Assignee:	Peter Rajnoha <prajnoha>
Status:	CLOSED RAWHIDE	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	low	Docs Contact:
Priority:	medium
Version:	rawhide	CC:	agk, bmarzins, bmr, dwysocha, heinzm, jonathan, lvm-team, msnitzer, prajnoha, prockai, zkabelac
Target Milestone:	---	Keywords:	FutureFeature
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	lvm2-2.02.116-3.fc22	Doc Type:	Enhancement
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2015-01-30 17:29:26 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	862085, 1067362

Description Peter Rajnoha 2014-02-20 10:08:01 UTC

There's already various information stored in udev db and provided by blkid that we can use to filter devices and provide an alternative to existing direct scanning of devices. This can aid in supporting MD component and mpath component filter as well as any other subsystem identification.

This is feature request to reuse this existing information instead of scanning the devices multiple times which can aid performance.

Besides foreign subsystem identification, we could also reuse existing ID_FS_TYPE=="LVM1_member|LVM2_member" udev db record (also provided by blkid) to improve scanning on lvm2 side and scan only devices that are already identified as PVs, hence saving some time by not do any extra scanning.

This direct scanning is still done for any devices that had appeared before lvmetad was run. Then we're updating lvmetad based on events via pvscan call in udev rules that is only called when ID_FS_TYPE="LVM1_member|LVM2_member". But we can use exactly the same logic when we need to fill lvmetad with *existing* devices plugged in before lvmetad is run.

(...and if not using lvmetad, the direct scanning can be always done only on devices already marked as LVM1_member|LVM2_member)

Let's make this feature optional so people can still fallback to the old way of device identification (the direct scans) if needed.

Comment 1 Peter Rajnoha 2014-02-20 10:15:25 UTC

Also, we can avoid artefacts like these with this feature in:

[0] raw/~ # systemctl stop lvm2-lvmetad
Warning: Stopping lvm2-lvmetad, but it can still be activated by:
  lvm2-lvmetad.socket

[0] raw/~ # pvs
  /dev/cdrom: open failed: No medium found
  PV         VG     Fmt  Attr PSize PFree
  /dev/vda2  fedora lvm2 a--  9.50g    0


In this case lvmetad is activated on first access to lvmetad socket (which is fired by pvs in the example above) and since lvmetad has no information yet, the direct device scan happens on lvm command side that fills lvmetad with metadata then. During this scan, some devices should be avoided as it does not make sense to scan them (like the /dev/cdrom in the example above which has no medium attached).

Comment 2 Peter Rajnoha 2014-02-20 10:39:50 UTC

Also, we can make sure that when the "pvscan --cache" is run from udev rule, we don't do any extra filtering that is already done in udev rules - when the pvscan is run in udev, it's always run for LVM1_member|LVM2_member so doing the same filter in the pvscan command is meaningless.

Comment 3 Peter Rajnoha 2014-02-20 10:48:32 UTC

Side note: the use of sysfs filter when we have devices/obtain_device_list_from_udev=1 set in lvm.conf is also questionable. The sysfs filter makes block devices to pass and any other devices to fail the filter. But we already get only block devices via libudev. So make sure we really don't do any extra work that is not necessary and and identify places wehre we just slow down the device processing unnecessarily.

So this is also a request to reaudit the whole filtering and make a cleanup.

Comment 4 Peter Rajnoha 2014-04-08 12:32:39 UTC

Another piece of information we could reuse from udev db is the number of active paths for mpath devices and if this number reaches 0, we can avoid accessing such a device as that may end up with IO error or the IO request queued. As for queueing and accessing the devices, this can cause some troubles like mentioned in bug #580869.

So it's worth exploring whether this information would be of any help for us... I think it may be worthwile.

Comment 5 Peter Rajnoha 2014-05-05 09:25:38 UTC

...this would also help with detecting GPT partitions - bug #1094162.

Comment 6 Peter Rajnoha 2014-09-03 13:56:28 UTC

This is first iteration on this topic (dev-prajnoha-aux-device-status branch):

https://git.fedorahosted.org/cgit/lvm2.git/log/?h=dev-prajnoha-aux-device-status

It can reuse information stored in udev db to identify MD and mpath components as well as partitioned devices.

Comment 7 Zdenek Kabelac 2014-09-03 14:16:37 UTC

So far I'm rather against adding new 'complicated' configurable pluggable way how to detect things externally for lvm2.

We already have 'obtain_device_list_from_udev' - and we should continue and enhance its use case.

With each such new option we just multiply number of combination we should test - and there is not a big added value -  either  udev DB is correct - and we use it - or it's not and we stick with old internal implementation.

Using  udev/non-udev  makes the case simple to support and test, every else horribly scales amount of testing...

Comment 8 Peter Rajnoha 2014-09-04 06:58:09 UTC

The reason here is that I still believe that once there'll be a daemon/library/database better than udev db that will provide information for the whole stack of (block) devices and various relations between them and it will also provide information about the structure as a whole by gatherining this information from all various other tools/subsystems (there are such movements already I think, though still not complete).

So "udev" is just only one of those sources. Another (existing) one is libblkid If we want to override udev for some reason, e.g. possible problems with out-of-data info stored there - I personally would prefer using libblkid instead of native LVM scanning code here since this kind of scanning is the primary role of blkid which is maintained and cared of (while in LVM, this kind of scanning is something 'peripheral' since we're focusing on our things mainly - and if not, we should :) ).

To sum it up, I want this to be prepared for any future updates which are highly possible - simply, "udev" is just one of such sources. If we added this feature without thinking about other possible sources, we could end up with a messy and unreadable confiuration later on. This one is simple and clean - it sets the source of information to use for auxiliary device status.

The "obtain_device_list_from_udev" obtains the device list from udev and nothing else - not the additional info which is the new setting about. And it should stay this way I think. And as I said earlier, it's not just udev that can provide this additional information - if there was new library/daemon that provides more precise, better synced and quicker access to this information, I would prefer that and, of course, recommend to users as well.

As for testing - well, the only thing we need to test on our own is the binding with the library/external source. The scanning itself is a domain of that external source, we just rely on that (we relay on udev and blkid in a way already and we're not testing whether blkid detects all of the signatures properly for us - we just rely on that external source!).

As for the value added - with using udev as external source of this information, I noticed the speedup is about 2x. And I believe much better interface can come up later which can make this speed up even better.

It's not just MD, multipath or partition detection (the filters), but there's also the information about mount status (for which we use our own code - but that doesn't mean it performs the best). And there may be more places in the code where we could reuse this "external source of device status". The aim is to reuse as much information as possible if it's applicable and once the information is not covered with that source, we'll just fallback to the "native" LVM code that does this. People can then decide what's the best source to use in their scenario which is dependent on system configuration and availability of those sources of course.

Comment 9 Zdenek Kabelac 2014-09-04 14:24:15 UTC

(In reply to Peter Rajnoha from comment #8)

> It's not just MD, multipath or partition detection (the filters), but
> there's also the information about mount status (for which we use our own
> code - but that doesn't mean it performs the best). And there may be more
> places in the code where we could reuse this "external source of device
> status". The aim is to reuse as much information as possible if it's
> applicable and once the information is not covered with that source, we'll
> just fallback to the "native" LVM code that does this. People can then
> decide what's the best source to use in their scenario which is dependent on
> system configuration and availability of those sources of course.

We need to  carefully weight pros and cons.

i.e. does linkage of external library really help us to do things better - when we already have the code to detect mount info (which is in fact pretty simple) and proved to be usable pretty well.

Is such library doing something else/better then we do ?

I don't see much point to replace own our code - with someone else code to do the exactly same stuff (reading/parsing /proc or /sys content)

Also udev  is IMHO supposed to maintain all relevant device info - there is no reason to reinvent 10 new daemons for queering the info that should be in udev  to duplicate all the effort.

IMHO the simplicity and performance should be our primary focus.

Introduction of new set of options (and frankly I've already lost track of at least of 50% of our existing ones) leads to nightmare for support people - you get way to many combination - and you should not forget how much you scale the effort on QA and devel side as well...

I'm convinced that simple selection:
  use  'udev' 
or use 'internal scan' 

is just about right level where it should stop. When the udev works - all is fine - if it doesn't - then let the user disable it and stay with old version.

Comment 10 Peter Rajnoha 2014-09-05 07:39:57 UTC

(In reply to Zdenek Kabelac from comment #9)
> (In reply to Peter Rajnoha from comment #8)
> 
> > It's not just MD, multipath or partition detection (the filters), but
> > there's also the information about mount status (for which we use our own
> > code - but that doesn't mean it performs the best). And there may be more
> > places in the code where we could reuse this "external source of device
> > status". The aim is to reuse as much information as possible if it's
> > applicable and once the information is not covered with that source, we'll
> > just fallback to the "native" LVM code that does this. People can then
> > decide what's the best source to use in their scenario which is dependent on
> > system configuration and availability of those sources of course.
> 
> We need to  carefully weight pros and cons.
> 
> i.e. does linkage of external library really help us to do things better -
> when we already have the code to detect mount info (which is in fact pretty
> simple) and proved to be usable pretty well.
>

We already link libudev and libblkid... And if there's a library to access the information we need in a very efficient way (I mean the representation that the library uses makes it possible to speed things up), then I think it's even worth linking that new library that will appear in the future possibly.
 
> Is such library doing something else/better then we do ?

Most of the time, such libraries are well maintained and it's also the source of information for others. Why should we spent more time on supporting new stuff when others already deal with that in projects that are directly aimed at providing such information.

> 
> I don't see much point to replace own our code - with someone else code to
> do the exactly same stuff (reading/parsing /proc or /sys content)
> 

...I'm also suggesting to stop adding any new native detection code and start relying on this external libraries more (e.g. see bug #862085 for example of request to add detection for something we don't detect yet with our native code).

> Also udev  is IMHO supposed to maintain all relevant device info - there is
> no reason to reinvent 10 new daemons for queering the info that should be in
> udev  to duplicate all the effort.
> 

I agree - udev should keep this info, but for years there's no movement to make udev db better. At least not with respect to handling block devices and their configuration/setup in a stack. It would need to be redesigned for this to keep this information in a more sane way.

> IMHO the simplicity and performance should be our primary focus.
> 

Simplicity == not adding new native code we should bother then about maintaining when other libraries already do that and it's their primary focus!

> Introduction of new set of options (and frankly I've already lost track of
> at least of 50% of our existing ones) leads to nightmare for support people
> - you get way to many combination - and you should not forget how much you
> scale the effort on QA and devel side as well...
> 

People don't need to touch this. We'll make a default for them that we find fits the best. If they're interested, they may read the documentation and change that at their will then. What we need is more documentation for people to make it easier to understand the concepts. Also, we can just prepare profiles for them that will select sane combination of config options.

> I'm convinced that simple selection:
>   use  'udev' 
> or use 'internal scan' 
> 
> is just about right level where it should stop. When the udev works - all is
> fine - if it doesn't - then let the user disable it and stay with old
> version.

And after a year there'll be something new we find that perfectly covers our needs and we'll be forced to add the setting anyway. However, at that time, it will become messy as we would already have "udev" source bound to something else. So let's have this setting since beginning for people to get accustomed to this and once there's something new, it will be just a new source.

Udev is right level, yes, but it's not perfect and things are not changing for years there with respect to how the information is stored - yes, it can speedup things for us due to the existing blkid information stored there - but information about stacks and various relations between levels in this stack is just out of udev possibilites (and blkid doesn't care about stacks too). And this is kind of information that can help us a bit.

I don't think there'll be lots of these new sources to use, maybe we add one or two more in the future, but that'll be all I think.

Comment 11 Peter Rajnoha 2014-09-05 07:43:43 UTC

(In reply to Peter Rajnoha from comment #10)
> (In reply to Zdenek Kabelac from comment #9)
> > I'm convinced that simple selection:
> >   use  'udev' 
> > or use 'internal scan' 
> > 
> > is just about right level where it should stop. When the udev works - all is
> > fine - if it doesn't - then let the user disable it and stay with old
> > version.
> 

Also, the information about block devices is just too domain-specific for udev at the moment until udev considerably changes. Which would mean rewriting that completely - thye won't do that (I wouldn't myself do that if I was the one who maintained udev).

Comment 12 Peter Rajnoha 2014-09-05 07:45:43 UTC

People don't need to know about this setting if they're not interested. Once we find that "udev" source is stable enough for our needs, we'll change the default to that source. People will use that transparently. And if there's the new, even better source, we'll apply the same transition later on.

Comment 13 Peter Rajnoha 2015-01-30 13:01:48 UTC

Patchset is upstream now (lvm2 v2.02.116):

https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=fbfde21e7cdc770f0a1189d40f983aa046d605bb
https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=c50a90c9e6084350682a315ed1732c2a54a3a828
https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=bf8943b0f63336272f3434455b906e0167e27b52
https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=590fbd89618fa8fb2556c89c1d12b5deb5b5f1a4
https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=9c030e81a4e2d1e6b7024991c29114c59337553e
https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=2fc126b00d83991c6a2f3ed9aa61457294a4c45e
https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=787f6ce04ab2d2420cda199c34fdd795153b593f
https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=1c7a509bed900e27932df585f70c0c2ad994d1a9
https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=28ba0450e93db30ee4dd815cbe85478b3c6a719a
https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=c99cb207153b5c103889a1ac67f799f6af4687d2

Currently, this is used for md, mpath filtering and filtering of devices with partition table as well as fw raid filtering and pv_min_size filtering.

Comment 14 Peter Rajnoha 2015-01-30 13:03:50 UTC

There's a new lvm.conf setting:

  external_device_info_source = "none"

By default, the "none" source is used which means the original LVM2 behaviour - all scans are done by LVM2 native code.

This setting also accepts "udev" source in which case records from udev database and libudev interface are used for filtering.