Bug 1020150 - [RFE] Use udev monitoring to enhance synchronization
[RFE] Use udev monitoring to enhance synchronization
Status: NEW
Product: Fedora
Classification: Fedora
Component: lvm2 (Show other bugs)
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: LVM and device-mapper development team
Fedora Extras Quality Assurance
: FutureFeature
Depends On:
  Show dependency treegraph
Reported: 2013-10-17 03:20 EDT by Zdenek Kabelac
Modified: 2017-10-31 10:50 EDT (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Zdenek Kabelac 2013-10-17 03:20:41 EDT
Description of problem:

lvm2 should start to use udev monitoring.

This will improve a bit udev rule process - since at least 1 dmsetup call (for semaphore counter) could be eliminated.

Next advantage will be - no limited RPC resources would be used.

We also gain option to eliminated many retry steps for closing device we have put in because of randomized watch rule. We just 'settle' in case we fail for the first time.

Other distribution will stop messing lvm2 code with their incorrect patches.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Additional info:
Comment 1 Alasdair Kergon 2013-10-17 09:21:00 EDT
settle was ruled out from any solution a long time ago because it waits for unrelated changes (that could be blocked)

What's proposed here was ruled out before the current solution was written and I don't believe udev has been enhanced sufficiently since then for this to work.
Comment 2 Zdenek Kabelac 2013-10-18 04:17:43 EDT
settle could be used in case, we get race with close & watch rule - i.e.

when I use 'sleep xxx < /dev/vg/lv' to keep device open - we could either retry close for ~3 seconds or we may just wait till all udev events from the startup point are processed so we could be sure no relevant watch rules are executed as a result of umount could be in progress.

There are cases where the 'settle' is clear win as well as there could be cases, where settling might take much longer - We may possibly combine the best from both 'worlds' - i.e. retry for at most 3 secs and if we detect udev has 'settled' earlier we may stop retry instantly and report device open error.
IMHO there is now 30sec timeout in udev which should not block things even for plain settle for a very long time.

Anyway - settle is just one minor point here - primary goal would be to elimination of semaphore thing from the field - since monitor could easily filter out only disk events and we just await our own rules.

As a major bonus I could see then embedding   pvscan manipulation for lvmetad which would be executed under proper context of lvm command instead of 'over-complexed' udev/systemd/service - since I believe CPU accounting should be on behalf of the command, that caused device manipulation and not hidden somewhere deeply in systemd services.

Another point to consider here is tracing and debugging should be more relevant.
Comment 3 Zdenek Kabelac 2013-10-18 06:57:36 EDT
Another thing to consider here (with patches like this:
is even the complexity to build and create correctly configured lvm2 package for a user - I think we are already past the point average user can find proper options - but it seems even distro maintainers will now find it hard to tune right set of options.

Compared with the built-in support for monitoring - when lvmetad is just an udev monitor with capabilities to fork lvchange auto-activation  - makes this fairly easier.
Comment 4 Zdenek Kabelac 2013-10-23 05:48:48 EDT
From discussion with udev developer -  we make get same cookieID behavior by using  'TAG' feature and  udev_monitor_filter_add_match_tag().

While man page for TAG support states " Excessive use might result in inefficient event handling" - it's meant for the attaching many different TAGS to the device -  using single TAG is perfectly ok.

So we should be able to monitor only for exact device just like we decrement (limited amount or sysV) semaphore ID.

Udev will also likely implement much easier  'CANCEL' message when udev worker is killed.
Comment 5 Zdenek Kabelac 2013-10-24 05:01:48 EDT
Just to add few more words about 'TAGS' limitation - all the TAGS associated with a single device are 'hashed' into a 64b bitmask used for quick in-kernel socket filtering - so the udev server broadcasts the event to every monitor who listens for a given mask - so there could be some false positives (since all tags are 'OR-ed' together) - so those are at the end filtered on the udev monitor 'client' part. So our cookieID should be probably designed and checked we have good bit-spreading.
Comment 6 Zdenek Kabelac 2014-06-23 05:43:50 EDT
Another thing to not forget about is the synchronization agains device removal.

Now we are not able to synchronize with removal of volume group and its recreation - since the check of VG name existence in /dev may return true when we are not waiting for actual link removal from udev.

Unfortunately at presence there are more tasks executed i.e. via systemd which makes the synchronization process nearly impossible - we simple don't have any source of scheduled/in-flight operations for a device, thus maybe enhancing kernel API with some flags  for suspend/remove operations might be necessary or we will need to fight hard with other process to manipulate a PV/VG/LV.

Note You need to log in before you can comment on or make changes to this bug.