RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1063813 - No pvscan call for lvmetad update if the PV is gone by other means than the REMOVE event - fix special cases like directly erasing the PV signature
Summary: No pvscan call for lvmetad update if the PV is gone by other means than the R...
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: lvm2
Version: 7.0
Hardware: x86_64
OS: Linux
high
medium
Target Milestone: rc
: ---
Assignee: LVM and device-mapper development team
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On: 1067422
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-02-11 13:26 UTC by Milos Malik
Modified: 2023-03-08 07:26 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-05-11 12:43:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Milos Malik 2014-02-11 13:26:34 UTC
Description of problem:
 * the error message appears if vgcreate is called at least twice and if /etc/lvm/lvm.conf file contains "use_lvmetad = 1"

Version-Release number of selected component (if applicable):
lvm2-2.02.105-3.el7.x86_64
lvm2-cluster-2.02.105-3.el7.x86_64
lvm2-libs-2.02.105-3.el7.x86_64
lvm2-python-libs-2.02.105-3.el7.x86_64

How reproducible:
always

First run of vgcreate is successful:
# dd if=/dev/zero of=zeros.img bs=1MB count=128
128+0 records in
128+0 records out
128000000 bytes (128 MB) copied, 0.0770224 s, 1.7 GB/s
# losetup /dev/loop0 zeros.img 
# vgcreate vg-targetd /dev/loop0
  Physical volume "/dev/loop0" successfully created
  Volume group "vg-targetd" successfully created
# vgremove vg-targetd
  Volume group "vg-targetd" successfully removed
# losetup -d /dev/loop0
# rm -f zeros.img 

Other runs of vgcreate always fail:
# dd if=/dev/zero of=zeros.img bs=1MB count=128
128+0 records in
128+0 records out
128000000 bytes (128 MB) copied, 0.0992774 s, 1.3 GB/s
# losetup /dev/loop0 zeros.img 
# vgcreate vg-targetd /dev/loop0
  Incorrect metadata area header checksum on /dev/loop0 at offset 4096
# vgremove vg-targetd
  Volume group "vg-targetd" not found
  Skipping volume group vg-targetd
# losetup -d /dev/loop0
# rm -f zeros.img 
#

Actual results:
 * repeated vgcreate calls fail

Expected results:
 * repeated vgcreate calls are successful

Comment 1 Peter Rajnoha 2014-02-11 13:39:35 UTC
As discussed in-situ, this is a problem with the "detach" event for which the "change" udev event is used and lvmetad does not know about the PV that is gone and hence still caches the old info...

Comment 2 Peter Rajnoha 2014-02-20 11:45:41 UTC
When loop is detached, the loop device has no file attached to it. This is not a REMOVE event - the /dev/loopX still stays in the system, it's just blank without any file attached to it. This detachment is signalled by the CHANGE event.

During this CHANGE event we move from a state with ID_FS_TYPE="LVM2_member" to a state where ID_FS_TYPE="". We can properly detect this situation in 69-dm-lvm-metad.rules and we do properly try to trigger a pvscan in this situation which is supposed to let lvmetad know about the PV being gone.

However, when using

  ENV{SYSTEMD_WANTS}="lvm2-pvscan@$major:$minor.service"

...to instantiate the pvscan, systemd does nothing in this case as the service is already running and hence evaluates this situation to a conslusion that the service does not need to be rerun. This ends up with pvscan missing the "PV GONE" event.

This is the exact sequence:

(considering the a.img contains the PV signature)

  -> losetup -f a.img
    --> ADD event generated (/dev/loop0 created)
    --> CHANGE event generated (the file is attached)
      ---> ENV{SYSTEMD_WANTS}="lvm2-pvscan@$major:$minor.service" assigned
         ----> systemd instantiating the lvm2-pvscan@$major:$minor.service
            -----> pvscan --cache -aay $major:$minor executed
                ------> lvmetad updated about the new PV on loop dev

  -> losetup -d /dev/loop0
    --> CHANGE event generated (the file is detached)
      ---> ENV{SYSTEMD_WANTS}="lvm2-pvscan@$major:$minor.service" assigned
         ----> systemd *not* instantiating the lvm2-pvscan@$major:$minor.service as it's already running!
           ------> lvmetad *not* updated about the PV on loop dev gone!

This works fine if I replace:

  ENV{SYSTEMD_WANTS}="lvm2-pvscan@$major:$minor.service"

with:

  RUN+="/usr/sbin/pvscan --cache -aay $major:$minor"

As then the pvscan is always run unconditionally. BUT we can't use the RUN rule to trigger the pvscan because of other problems we may hit:
  A: udev would kill the pvscan if the system is overloaded and pvscan does not return within default 30s timeout
  B if pvscan is backgrounded to avoid A, systemd would kill the backgrounded pvscan since it's in the same cgroup as the udev process


To put it in a summary:
  - IFF the device has proper REMOVE event, the pvscan is always run to signal lvmetad about the PV being gone (since the lvm2-pvscan@major:minor.service is bound to the systemd device unit that would be removed and hence the lvm2-pvscan@major:minor.service's ExecStop is executed which calls the pvscan)

  - IF the PV can be gone by any other means than the REMOVE event (e.g. by doing a dd on the device and erasing the PV signature or the loop device and its file detachment like described above), we need to run pvscan by other means than the SYSTEMD_WANTS="lvm2-pvscan@$major:$minor" since this won't be executed if the service is still running. And the device unit that the lvm2-pvscan service relies on is still there since the loop dev requires special treatment.


So:

  - we need to fix a possibility of PV being gone signalled by other means than the REMOVE event. This is not working at the moment!

Comment 3 Peter Rajnoha 2014-02-20 11:51:20 UTC
I can imagine running the "systemctl stop <the PV device unit>" directly in the udev rule. Though I'm not sure how clean that is - the big question is whether it can block or not under some cicumstances. I'll give it a try though...

This is *just another* example where systemd/udev does not care about virtual devices that may be created/removed by just erasing signatures. So we need to provide a workaround. The "systemctl stop <the PV device unit>" seems the best to me at the moment....

Comment 4 Peter Rajnoha 2014-02-20 11:55:42 UTC
I'm afraid I can't stop <the PV device unit> as that would also stop the <the loop dev unit> - the PV device unit is just a synonym for the loop dev unit.
So we need some other hack.

Comment 5 Peter Rajnoha 2014-02-20 11:57:26 UTC
I can call "systemctl stop lvm2-pvscan@major:minor" directly though!

Comment 6 Peter Rajnoha 2014-02-20 12:43:23 UTC
A simple "SYSTEMD_READY=0" in case there's no file attached to the loop device is enough to fix this - bug #1067422.

Comment 7 Peter Rajnoha 2014-02-20 12:47:21 UTC
The bug #1067422 covers a sitution with the loop dev.

Still it would be fine to cover a situation in which the PV signature gets erased/is gone in general (and on other devs) and we need to stop and remove the lvm2-pvscan@major:minor.service so any subsequent use of SYSTEMD_WANTS="lvm2-pvscan@$major:$minor.service on the same device works.

Comment 8 Marius Vollmer 2014-03-05 07:51:00 UTC
(In reply to Peter Rajnoha from comment #7)
> The bug #1067422 covers a sitution with the loop dev.
> 
> Still it would be fine to cover a situation in which the PV signature gets
> erased/is gone in general

I'd say it would not just be fine, but essential.  I consider this to be a severe bug.

I propose to just use RUN and and let Fedora/RHEL as a whole worry about the possibility that a udev timeout drives the system into inconsistency.

Comment 9 Peter Rajnoha 2014-03-05 08:22:38 UTC
(In reply to Marius Vollmer from comment #8)
> (In reply to Peter Rajnoha from comment #7)
> > The bug #1067422 covers a sitution with the loop dev.
> > 
> > Still it would be fine to cover a situation in which the PV signature gets
> > erased/is gone in general
> 
> I'd say it would not just be fine, but essential.  I consider this to be a
> severe bug.
> 
> I propose to just use RUN and and let Fedora/RHEL as a whole worry about the
> possibility that a udev timeout drives the system into inconsistency.

That wouldn't work - that's exactly the reason we used the SYSTEMD_WANTS thing (per advice from systemd/udev people). If the udev is under a pressure processing lots of devices, it just makes the system fail with these timeouts, losing the processing based on the events - tools are not called simply since the event processing stopped, making the udev db practically inconsistent and also *unable* to recognize this state when reading it. I've already argued about this state in bug #918511. (also they haven't commented yet anything on bug #1067422 too which would solve the problem reported here exactly).

We use "pvscan --background" in RHEL6 which makes the pvscan to detach from the process processing the event which prevents this. Unfortunately, this is unusable in RHEL7 since such detached process is still in same cgroup under systemd managemnt which can kill that if udev processing stops. So that's the reason we had to use the SYSTEMD_WANTS logic.

However, we could probably use the new "systemd-run" command here to instantiate a one-time systemd service directly from udev rule. But that's not tested and it's really late for such a change now in RHEL7 - it can bring other unexpected regressions.

As for the PV being gone - well, normally, people should use "pvremove" - that's the correct way of removing the PV label, not calling the "dd" directly or just erasing the label on the fly with anything else. There's still a possibility to call "pvscan --cache" directly to update lvmetad directly with actual state.

Sure, I see the pain here, but it's the least pain possible we could come up with. I'll have a think about alternative solution that would be better here, but it's really late for RHEL7.

It's even possible that we probably just end up with *another* lvm helper daemon monitoring events directly. But that's for 7.1, definitely not for 7.0.

Comment 10 Peter Rajnoha 2014-03-05 08:24:59 UTC
For now, I could probably just add "systemd-run pvscan --cache ..." for any PV that is detected as immediately gone in udev rules (still keeping the SYSTEMD_WANTS for the normal operation) - it can't cause any worse situation that the one we have.

Comment 11 Peter Rajnoha 2014-03-05 08:57:13 UTC
(In reply to Peter Rajnoha from comment #10)
> For now, I could probably just add "systemd-run pvscan --cache ..." for any
> PV that is detected as immediately gone in udev rules (still keeping the
> SYSTEMD_WANTS for the normal operation) - it can't cause any worse situation
> that the one we have.

(Well, that would leave the lvm2-pvscan@... and dev-block-<major>:<minor> in the system - they should be cleaned up. So this won't work well.)

Comment 12 Peter Rajnoha 2014-03-05 09:45:32 UTC
(In reply to Peter Rajnoha from comment #9)
> However, we could probably use the new "systemd-run" command here to
> instantiate a one-time systemd service directly from udev rule.

Well, I've tried that - seems that systemd-run does not work at such an early stage when the systemd-udev-trigger.service is run to process all devices already existing from time before the trigger (initramfs etc.).

So I'm afraid the only option now is to create a specific daemon that would listen to events and run pvscan from it's context directly (or have this integrated in lvmetad, but the initial decision was made for lvmetad to not integrate this in lvmetad and there's a strong resistance from the creator of lvmetad).

As for bug #1067422 - I asked systemd guy to look at that - he will try to squeeze that in RHEL7.

I'm moving this bug to 7.1 as there's nothing else we can do at this late time for RHEL7.

Comment 13 Marius Vollmer 2014-03-05 09:47:30 UTC
(In reply to Peter Rajnoha from comment #9)

> That wouldn't work - that's exactly the reason we used the SYSTEMD_WANTS
> thing (per advice from systemd/udev people). If the udev is under a pressure
> processing lots of devices, it just makes the system fail with these
> timeouts, losing the processing based on the events - tools are not called
> simply since the event processing stopped, making the udev db practically
> inconsistent and also *unable* to recognize this state when reading it.

Yes.  Honestly, I would blame that on udev and walk away.  But if pvscan really is not appropriate for a udev rule, then lvmetad needs to listen to udev events directly (or indirectly via some other entity as you mention).

> I've already argued about this state in bug #918511.

I don't have access to that bug, unfortunately.

Comment 14 Peter Rajnoha 2014-03-05 10:07:59 UTC
(In reply to Marius Vollmer from comment #13)
> Yes.  Honestly, I would blame that on udev and walk away.

In that case we'd end up with non-functional LVM and udev would blame us it's our problem. I know this scenario very well :) Not worth trying to spend time here.

> 
> > I've already argued about this state in bug #918511.
> 
> I don't have access to that bug, unfortunately.

Oh, sorry! I've asked the one who made it private to make it public (there are some customer cases attached...).

Comment 15 Marius Vollmer 2014-03-05 10:53:52 UTC
> > > I've already argued about this state in bug #918511.
> > 
> > I don't have access to that bug, unfortunately.
> 
> Oh, sorry! I've asked the one who made it private to make it public (there
> are some customer cases attached...).

I don't know of any reason why I shouldn't have access to bugs like that, and I probably just screwed up my Bugzilla registration.  I know nothing about these things.  I'll ask around.

Comment 19 Peter Rajnoha 2014-03-05 13:55:48 UTC
A quick workaround is to add this line to /usr/lib/udev/rules.d/69-dm-lvm-metad.rules:


ACTION!="remove", ENV{LVM_PV_GONE}=="1", RUN+="/usr/bin/systemd-run /usr/sbin/lvm pvscan --cache $major:$minor", GOTO="lvm_end"

This will cause a transient one-time service to be instantiated by systemd to run the command directly (it needs to be such a service because running it directly in udev could end up with udev killing it on timeout if the system is loaded and running it in background with pvscan --background would cause systemd to kill it since it would be in one cgroup with udev process).

https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=2c42f608904a853328341eac3cfd3c48033a3238

However, I'm not adding this to RHEL 7 at this moment - I'd like to see this working for a while...

Comment 20 Peter Rajnoha 2014-03-05 14:21:55 UTC
Also, can we optimize here a bit - for example, if I'm running "pvremove /dev/sda" then the pvremove itself updates lvmetad. Since pvremove erases the PV header, it opens the dev RW, then closes it - this fires a WATCH event. In turn, this fires the pvscan (since we detect the PV as gone).

In case we use dd, this is OK, but in case of pvremove, it would be nice if we don't call pvscan --cache to update the lvmetad uselessly since pvremove already did that.

So that's something to think of when we're fixing this.

Comment 21 Peter Rajnoha 2014-03-06 15:21:19 UTC
(In reply to Peter Rajnoha from comment #1)
> As discussed in-situ, this is a problem with the "detach" event for which
> the "change" udev event is used and lvmetad does not know about the PV that
> is gone and hence still caches the old info...

The loop file detachment should work now with the fix requested in bug #1038524.

Comment 25 Peter Rajnoha 2014-09-16 13:42:41 UTC
So, let's get back to this. Current state and facts:

  A) the problem being discussed here is about PV header that is erased and then reappears by any other means than official pvcreate/pvremove calls (e.g. using dd)

  B) there was a special case with loop devices that are detached - in this case the loop device is still in the system, though it has no file attached to that - this is already resolved by bug #1067422 that I reported - the loop device is marked as "SYSTEMD_READY=0" which also means the lvm2-pvscan@$major:$minor.service that is bound to such device is also stopped (and so lvmetad is notified about this via lvm2-pvscan's ExecStop action).

  C) when the PV header is erased by pvremove call, lvmetad knows about this directly as pvremove directly notifies lvmetad - there's nothing to solve here - this worked even before of course

  D) when the PV device is gone as a whole, not just the PV header (device disconnected/unplugged from the system), lvmetad knows about this since the lvm2-pvscan@$major:$minor.service is bound to device existence (this is the same as setting SYSTEMD_READY=0 as already described in A)). Once the device does not exist in the system, the lvm2-pvscan service's ExecStop action is called (that is "pvscan --cache" is called which will see the device is gone and notifies lvmetad properly)

  E) when the PV header is erased by any other means BUT the device itself stays in the system (which means that lvm2-pvscan@$major:$minor.service is *not stopped*), lvmetad knows about this too as udev rule detects the PV header that is gone and it calls "systemd-run /usr/sbin/lvm pvscan --cache $major $minor" in this case - the pvscan will notice there's no PV header anymore and notifies lvmetad properly

  F) the problem that still remains is that after E), the lvm2-pvscan@$major:$minor.service stays in the system though it should have been stopped. If this exact device with $major:$minor now gains PV header again (by other means than pvcreate!), the pvscan in lvm2-pvscan@$major:$minor.service won't fire as the service is already running. And lvmetad doesn't know about the PV header that just appeared in the system.

===

To sum it up and to state the problem found in E) more exactly:

The lvm2-pvscan@$major:$minor.service should be bound to PV header existence, not PV device existence as it is today. Either we must find a way how to tell systemd to stop the service directly or let's communicate with systemd team what systemd resource can we use to reach this. That's what we need to solve still here!

Comment 27 Peter Rajnoha 2015-05-11 12:43:06 UTC
(In reply to Peter Rajnoha from comment #25)
> To sum it up and to state the problem found in E) more exactly:
> 
> The lvm2-pvscan@$major:$minor.service should be bound to PV header
> existence, not PV device existence as it is today. Either we must find a way
> how to tell systemd to stop the service directly or let's communicate with
> systemd team what systemd resource can we use to reach this. That's what we
> need to solve still here!

So, to sum it up - the pvscan needs to be instantiated as a service because otherwise there's a possibility of systemd/udev killing the process. There's the "SYSTEMD_WANTS" variable that may be used to initialize a systemd service from within udev. But there's no way to call the opposite - to stop the service reliably.

As for this bug - we can't solve this with current systemd resources - we can bind a service to device existence easily, but not to signature existence only (while device stays in the system). If someone removes the (PV) signature by zeroing it out with, for example, dd if=/dev/zero of=the_PV, the lvm2-pvscan@major:minor.service stays in the system. I've discussed this with systemd team and we shouldn't be calling "systemctl stop lvm2-pvscan@major:minor.service" directly from udev rules (it may cause problems if some of the dependencies fail - it may cause the udev process calling the systemctl stop to hang). So this leaves us with no solution then.

A proper solution here would be for systemd to provide something like "SYSTEMD_UNWANTS" which is the opposite to existing "SYSTEMD_WANTS". Currently, when it comes to services, these can be only bound to device existence, not metadata changes on that device.


Another solution here would be to stop calling lvm2-pvscan@major:minor.service to call pvscan to update lvmetad, but directly listen on udev events via udev monitor and update lvmetad directly based on this direct monitoring (so we'd need to avoid calling tools to update lvmetad state, we'd need to switch to "udev listener" mode instead).

I'm closing this one as WONTFIX for now (but I'll still track this problem in my TODO list - if there's a resource in systemd to make this possible easily, I'll use or if we switch to udev monitoring, I'll use that to resolve the problem stated here).

Comment 28 Peter Rajnoha 2015-05-11 12:49:57 UTC
(In reply to Peter Rajnoha from comment #27)
> A proper solution here would be for systemd to provide something like
> "SYSTEMD_UNWANTS" which is the opposite to existing "SYSTEMD_WANTS".

(Systemd team is not willing to implement the "SYSTEMD_UNWANTS" functionality because of the inhenrent problems that may arise when it comes to stopping a service that may be still used, hence leading to the hangs.

So that leaves us with the other solution actually - to start using monitoring instead of calling pvscan directly to update lvmetad.


Note You need to log in before you can comment on or make changes to this bug.