1706154 – LVM metadata and dmeventd stop jobs fail at reboot on VDO backed VG/LV

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1706154 - LVM metadata and dmeventd stop jobs fail at reboot on VDO backed VG/LV

Summary: LVM metadata and dmeventd stop jobs fail at reboot on VDO backed VG/LV

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	vdo
Sub Component:
Version:	7.6
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Andy Walsh
QA Contact:	Filip Suba
Docs Contact:	Marek Suchánek
URL:
Whiteboard:
Duplicates (1):	1664662 (view as bug list)
Depends On:
Blocks:	1784876
TreeView+	depends on / blocked

Reported:	2019-05-03 16:42 UTC by John Pittman
Modified:	2022-12-22 05:08 UTC (History)
CC List:	19 users (show)
Fixed In Version:	6.1.3.23
Doc Type:	Bug Fix
Doc Text:	.LVM volumes on VDO now shut down correctly Previously, the stacking of block layers on VDO was limited by the configuration of the VDO systemd units. As a result, the system shutdown sequence waited for 90 seconds when it tried to stop LVM volumes stored on VDO. After 90 seconds, the system uncleanly stopped the LVM and VDO volumes. With this update, the VDO systemd units have been improved, and as a result, the system shuts down cleanly with LVM on VDO. Additionally, the VDO startup configuration is now more flexible. You no longer have to add special mount options in the `/etc/fstab` file for most VDO configurations.
Clone Of:
Environment:
Last Closed:	2020-09-29 20:10:53 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
screenshots of failure (28.91 KB, application/gzip) 2019-05-03 16:42 UTC, John Pittman	no flags	Details
debug logs (10.16 MB, text/plain) 2019-05-03 18:38 UTC, John Pittman	no flags	Details
blkdeactivate with VDO support (15.84 KB, text/plain) 2019-10-22 12:58 UTC, Peter Rajnoha	no flags	Details
blkdeactivate with VDO support (patch) (1.67 KB, patch) 2019-10-22 12:59 UTC, Peter Rajnoha	no flags	Details \| Diff
vdo.generator script (3.39 KB, text/plain) 2019-10-30 14:57 UTC, Andy Walsh	no flags	Details
vdo.service unit (199 bytes, text/plain) 2019-10-30 14:57 UTC, Andy Walsh	no flags	Details
All scripts for systemd/udev to test VDO startup and shutdown (5.77 KB, application/gzip) 2019-11-15 11:39 UTC, Peter Rajnoha	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2020:3965	0	None	None	None	2020-09-29 20:10:58 UTC

Description John Pittman 2019-05-03 16:42:31 UTC

Created attachment 1562605 [details]
screenshots of failure

Description of problem:

LVM metadata and dmeventd stop jobs fail at reboot on VDO backed VG/LV

Version-Release number of selected component (if applicable):

vdo-6.1.1.125-3.el7.x86_64
kmod-kvdo-6.1.1.125-5.el7.x86_64
kernel-3.10.0-957.12.1.el7.x86_64
device-mapper-1.02.149-10.el7_6.7.x86_64
lvm2-2.02.180-10.el7_6.7.x86_64

Steps to Reproduce:

# vdo create --name=vdo1 --device=/dev/sda --vdoLogicalSize=12G
# vgcreate test /dev/mapper/vdo1
# lvcreate -n lv1 -l 50%FREE test

[root@localhost ~]# grep -r 'use_lvmetad =' /etc/lvm/lvm.conf
	use_lvmetad = 1

[root@localhost ~]# grep test /etc/fstab 
/dev/test/lv1 /lv1 xfs defaults,x-systemd.requires=vdo.service 0 0

[root@localhost ~]# grep lv1 /proc/mounts 
/dev/mapper/test-lv1 /lv1 xfs rw,seclabel,relatime,attr2,inode64,noquota 0 0

Reboot to test

Actual results:

When issuing reboot, the system does not shut down cleanly. It instead hangs on 2 stop jobs, lvm metadata daemon and dmeventd.

Expected results:

System should shut down cleanly.

Additional info:

Screenshots attached.

Comment 2 John Pittman 2019-05-03 17:06:33 UTC

Missed one:  systemd-219-62.el7_6.6.x86_64

Comment 3 John Pittman 2019-05-03 18:38:38 UTC

Created attachment 1562646 [details]
debug logs

Comment 4 Bryan Gurney 2019-05-03 18:52:33 UTC

In the RHEL 7 Storage Administration Guide, there's a template for a systemd unit file to be created for proper systemd sequencing (since the "x-systemd-requires=vdo.service" hint in /etc/fstab may not be enough)

Example: for a VDO volume /dev/mapper/vdo1 with an XFS filesystem mounted on /mnt/vdo1:

unit file name: /etc/systemd/system/mnt-vdo1.mount

(start of mnt-vdo1.mount)
[Unit]
Description = VDO unit file to mount file system
name = vdo1.mount
Requires = vdo.service
After = multi-user.target
Conflicts = umount.target

[Mount]
What = /dev/mapper/vdo1
Where = /mnt/vdo1
Type = xfs

[Install]
WantedBy = multi-user.target
(end of mnt-vdo1.mount)

Comment 5 Bryan Gurney 2019-05-03 19:04:26 UTC

I found these events from the attached logs:

[  105.025349] systemd[1]: About to execute: /usr/bin/vdo stop --all --confFile /etc/vdoconf.yml
[  105.025515] systemd[1]: Forked /usr/bin/vdo as 4250
[  105.025654] systemd[1]: vdo.service changed exited -> stop
[  105.025681] systemd[1]: Stopping VDO volume services...

...

[  105.026831] systemd[4250]: Executing: /usr/bin/vdo stop --all --confFile /etc/vdoconf.yml
...
[  105.146497] vdo[4250]: ERROR - cannot stop VDO volume vdo1: in use

There was something still using /dev/mapper/vdo1.  Was it the LVM logical volume?

Comment 8 Peter Rajnoha 2019-07-18 12:28:25 UTC

The vdo.service has wrong ordering, there are also variations of these cycles detected (I can reproduce as well):

  [    6.290854] systemd[1]: Found ordering cycle on basic.target/start
  [    6.291822] systemd[1]: Found dependency on paths.target/start
  [    6.293027] systemd[1]: Found dependency on brandbot.path/start
  [    6.294214] systemd[1]: Found dependency on sysinit.target/start                                                                                                                                                                            
  [    6.295074] systemd[1]: Found dependency on rhel-autorelabel.service/start
  [    6.296020] systemd[1]: Found dependency on local-fs.target/start
  [    6.296985] systemd[1]: Found dependency on lv1.mount/start
  [    6.297963] systemd[1]: Found dependency on vdo.service/start
  [    6.298995] systemd[1]: Found dependency on basic.target/start
  [    6.299975] systemd[1]: Breaking ordering cycle by deleting job paths.target/start 

Then it causes problems at shutdown too.

Looking quickly at the vdo.service, I see it has:

  [Install]
  WantedBy=multi-user.target

This is surely not correct (...what if I'm not booting into "multi-user.target" but "basic.target" or any other?).

The vdo.service should be arranged with respect to one of the fs targets and/or sysinit.target which are common for all final boot targets. I'll see which one is suitable... I think that should fix it then.

Comment 9 Peter Rajnoha 2019-07-18 12:29:47 UTC

(In reply to Peter Rajnoha from comment #8)
> The vdo.service should be arranged with respect to one of the fs targets
> and/or sysinit.target which are common for all final boot targets. I'll see
> which one is suitable... I think that should fix it then.

See also: https://www.freedesktop.org/software/systemd/man/bootup.html

Comment 10 Peter Rajnoha 2019-07-18 12:34:42 UTC

(In reply to John Pittman from comment #0)
> [root@localhost ~]# grep test /etc/fstab 
> /dev/test/lv1 /lv1 xfs defaults,x-systemd.requires=vdo.service 0 0

...and once we have the proper ordering with respect to one of the fs targets, the x-systemd.requires=vdo.service won't be necessary then.

Comment 12 Peter Rajnoha 2019-07-19 10:57:45 UTC

There's a proposal already here from Marius dating back to last year: https://github.com/mvollmer/vdo-udev

I think we should revisit this.

Comment 13 Marius Vollmer 2019-07-22 06:15:46 UTC

(In reply to Peter Rajnoha from comment #12)
> There's a proposal already here from Marius dating back to last year:
> https://github.com/mvollmer/vdo-udev
> 
> I think we should revisit this.

There have also been other proposals by the systemd people, which are better thought out.  The VDO people should be aware of them, I think.

Comment 14 Peter Rajnoha 2019-07-22 07:35:26 UTC

(In reply to Marius Vollmer from comment #13)
> There have also been other proposals by the systemd people, which are better
> thought out.  The VDO people should be aware of them, I think.

Please, do you or anybody from VDO have any reference to those proposals (...was that on mailing list)?

Comment 15 Marius Vollmer 2019-07-22 12:15:35 UTC

(In reply to Peter Rajnoha from comment #14)
> (In reply to Marius Vollmer from comment #13)
> > There have also been other proposals by the systemd people, which are better
> > thought out.  The VDO people should be aware of them, I think.
> 
> Please, do you or anybody from VDO have any reference to those proposals
> (...was that on mailing list)?

They came from Lukas Nykryn, now in CC.

Comment 16 Peter Rajnoha 2019-09-04 12:53:19 UTC

Corwin, we need to fix the vdo.service systemd unit - currently, it can cause unit dependency cycles - see comment #8. Has there been any fixes planned in this area or any more discussion around? See also related comment #11, comment #12 and comment #13.

Comment 17 corwin 2019-09-12 19:37:11 UTC

Hi Peter,

We're investigating how we can best resolve these issues. We will let you know when we have a timeframe for getting them resolved.

Comment 18 Andy Walsh 2019-09-13 20:29:28 UTC

Hi, I have been working to understand what's going on with the original problem reported here.  Note that there may be other systemd related issues that VDO needs, but I wanted to just to put some information in this ticket about the original issue.  I found that the issue isn't necessarily that VDO has the wrong dependencies set, but that the vdo.service is trying to be stopped before the filesystem is umounted and the VG is deactivated on top of the VDO device.

If I start with the same setup as in the description and then before trying to reboot run `umount /lv1` followed by `vgchange -an test`, then a reboot immediately happens without any delay for the lvmetad or dmeventd services.

The reason for this problem is that the volume group doesn't get deactivated, and so when the vdo.service tries to run `vdo stop --all` it fails for the volume because it is "in use" by the volume group holding it open.  And so because it never actually stops the VDO volume, then the dmevent daemon and the lvmetad daemon remain running unexpectedly and hold up the shutdown for 90 seconds.

I'm working through reviewing all of the details posted above to see if any of that will directly resolve this issue in a generic sense.

Comment 19 Peter Rajnoha 2019-09-16 13:59:10 UTC

Well, we have two issues here which we need to resolve:

1) The dependency cycle in vdo.service.

The vdo.service needs to replace the WantedBy=multi-user.target with something else which makes the vdo.service to start earlier during bootup and stop later during shutdown, e.g. sysinit.target). To have the service executed this way, we also need to use DefaultDependencies=no for the service unit and then we need to define when the service is stopped/conflicting - e.g. Conclicts: shutdown.target. So I'd suggest:

# systemctl cat vdo.service
# /usr/lib/systemd/system/vdo.service
[Unit]
Description=VDO volume services
Before=shutdown.target
DefaultDependencies=no
Conflicts=shutdown.target

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/bin/vdo start --all --confFile /etc/vdoconf.yml
ExecStop=/usr/bin/vdo stop --all --confFile /etc/vdoconf.yml

[Install]
WantedBy=sysinit.target

The only dependency here is on the /etc/vdoconf.yml file. But that should already be available from the root FS (just like /etc/lvm/lvm.conf). The /etc/ is never on separate drive.

2) Possible device stack on top of the VDO device.

There is already a service to deal with storage device stacks called blk-availability.service which calls the blkdeactivate script (/usr/sbin/blkdeactivate). This script and service currently comes with device-mapper package. It reads the sysfs for the device stack and it tries to deactivate the stack properly from top to bottom and it deals with device-mapper-based devices (including subsystems like LVM, dm-crypt, dm-mpath...) and MD devices where it can call the external tools to deactivate the whole groups, not only doing per-device deactivation on its own. If a device at the very top of the stack is mounted, blkdeactivate first unmounts the device.

If VDO is represented as a DM device, this is already deactivated by blkdeactivate script with generic "dmsetup remove" call (which in turn is the DM_DEVICE_REMOVE ioctl).

The blk-availability is disabled by default in RHEL7 so users need to enable it directly. If it's not enabled, then the only system component that does a kind of device stack deactivation is systemd itself inside shutdown initramfs where it simply iterates over the list of devices and it tries to deactivate it. However, it does that in simple manner, not looking at the stack from top to bottom. If the deactivation fails for any of the device (e.g. because it's still used by another device), the deactivation is tried in next iteration. The number of iterations is limited and so we can end up with shutting down the machine without properly deactivating complete device stack.

So my suggestion here would probably be:

- let's fix the vdo.service to avoid the dependency problems (as suggested in 1))

- if there is any special handling needed for the DM-VDO devices, we can include that inside blkdeactivate script (otherwise, the "dmsetup remove" is called here). Is there anything special required for deactivation of DM-VDO devices?

Comment 20 Peter Rajnoha 2019-09-16 14:04:54 UTC

(In reply to Peter Rajnoha from comment #19)
> So my suggestion here would probably be:
> 
>   - let's fix the vdo.service to avoid the dependency problems (as suggested
> in 1))
> 
>   - if there is any special handling needed for the DM-VDO devices, we can
> include that inside blkdeactivate script (otherwise, the "dmsetup remove" is
> called here). Is there anything special required for deactivation of DM-VDO
> devices?

   - also, we should then add Before: blk-deactivate.service in vdo.service then, so if the blk-availability.service is enabled and it deactivates the stack, the vdo.service then stops after the blk-deactivate.service on shutdown (the service order is simply reversed on shutdown). If blk-availability.service was disabled, the vdo.service would deactivate the VDO devices (...doing it's best, if there's a stack on top still, well, it'll fail to deactivate of course).

Comment 22 Jakub Krysl 2019-10-15 14:38:54 UTC

Mass migration to Filip.

Comment 23 Andy Walsh 2019-10-16 21:51:46 UTC

When I try to take the vdo.service from comment#19 and also add the "Before=blk-deactivate.service" on a freshly installed RHEL-7.7 system without any modifications, I still end up with the condition that delays shutdown for 90 seconds trying to shut down the services for LVM.

Note that I'm talking about this with a RAW->VDO->PV/VG/LV->XFS(fstab) stack.

I don't see a blk-deactivate.service on the machine, but there is the blk-availability.service.  When I reference that in the unit as "Before=blk-availability.service" it doesn't quite work yet.  I have to enable and start that service before it actually does anything.  I have confirmed, however, that once I have enabled and started that service, shutting down the system with the above stack works as expected.

How should I go about making sure that blk-availability.service is enabled by default?  I tried Requires= and Before=, but that had no effect.  I don't know that adding something to %post in the package is the right way to go about making sure a particular service is enabled.

Right now, this is my vdo.service file:
[vagrant@localhost ~]$ cat /etc/systemd/system/vdo.service
[Unit]
Description=VDO volume services
Before=shutdown.target
DefaultDependencies=no
Conflicts=shutdown.target
Before=blk-availability.service
Requires=blk-availability.service

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/bin/vdo start --all --confFile /etc/vdoconf.yml
ExecStop=/usr/bin/vdo stop --all --confFile /etc/vdoconf.yml



If I change the stack to be RAW->PV/VG/LV->VDO->PV/VG/LV->XFS(fstab) with the same configuration, the shutdown hangs again for 90 seconds, but this time only for 'LVM2 metadata daemon'.  Startup ends up dropping me to an emergency shell because the vdo service does not start.  If I run 'vdo start --all' and then 'systemctl isolate default.target', things come up fine.

Comment 25 Peter Rajnoha 2019-10-17 08:40:24 UTC

(In reply to Andy Walsh from comment #23)
> When I try to take the vdo.service from comment#19 and also add the
> "Before=blk-deactivate.service" on a freshly installed RHEL-7.7 system
> without any modifications, I still end up with the condition that delays
> shutdown for 90 seconds trying to shut down the services for LVM.
> 
> Note that I'm talking about this with a RAW->VDO->PV/VG/LV->XFS(fstab) stack.
> 
> I don't see a blk-deactivate.service on the machine, but there is the
> blk-availability.service.

Yes, sorry, it's blk-availability.service, not blk-deactivate.service (...it's blk-availability.service which in turn calls blkdeactivate script so that's why I misspelled the service name).


>  When I reference that in the unit as
> "Before=blk-availability.service" it doesn't quite work yet.  I have to
> enable and start that service before it actually does anything.  I have
> confirmed, however, that once I have enabled and started that service,
> shutting down the system with the above stack works as expected.
> 
> How should I go about making sure that blk-availability.service is enabled
> by default?  I tried Requires= and Before=, but that had no effect.  I don't
> know that adding something to %post in the package is the right way to go
> about making sure a particular service is enabled.
> 

We'd need a different way of making the service to start automatically once the vdo service is available - the "Before" and "Requires" don't play well with each other ("Before" says order before the referenced service while "Requires" says the referenced service is required - so already started for vdo service to start). I'll have a look if there's a way.... but....

> 
> If I change the stack to be RAW->PV/VG/LV->VDO->PV/VG/LV->XFS(fstab) with
> the same configuration, the shutdown hangs again for 90 seconds, but this
> time only for 'LVM2 metadata daemon'.  Startup ends up dropping me to an
> emergency shell because the vdo service does not start.  If I run 'vdo start
> --all' and then 'systemctl isolate default.target', things come up fine.

Yes, because blk-availability.service will traverse the stack from top to bottom, so it unmounts the XFS, then deactivates the PV/VG/LV layer, then it comes to the VDO layer, but since blk-availability currently doesn't handle that one, it's skipped and so the rest of the stack down to bottom is skipped... And then the blk-availability is not executed second time after the VDO has done any deactivations...

For this to really work completely, we'd need to move the "vdo stop" for the exact VDO device that is encountered while the blk-availability traverses the stack from top to bottom. Only in this case the blk-availability can deactivate it (...just like it deactivates DM/LVM or MD devices on the way if they're hit). I think this would be much cleaner solution where one script is traversing the whole stack from top to bottom and just calling appropriate deactivation hooks based on device types. Honestly, I don't think it's feasible to have separate services here for the deactivation part - it's almost impossible to order them among each other then. But for that, we'd really need to make the blk-availability.service to be firmly enabled by default on the system.

As for the proposed solution with systemd generator - I can't say yet - can you share more details on this please?

Comment 26 Andy Walsh 2019-10-18 16:39:41 UTC

(In reply to Peter Rajnoha from comment #25)
> (In reply to Andy Walsh from comment #23)
> >  When I reference that in the unit as
> > "Before=blk-availability.service" it doesn't quite work yet.  I have to
> > enable and start that service before it actually does anything.  I have
> > confirmed, however, that once I have enabled and started that service,
> > shutting down the system with the above stack works as expected.
> > 
> > How should I go about making sure that blk-availability.service is enabled
> > by default?  I tried Requires= and Before=, but that had no effect.  I don't
> > know that adding something to %post in the package is the right way to go
> > about making sure a particular service is enabled.
> > 
> 
> We'd need a different way of making the service to start automatically once
> the vdo service is available - the "Before" and "Requires" don't play well
> with each other ("Before" says order before the referenced service while
> "Requires" says the referenced service is required - so already started for
> vdo service to start). I'll have a look if there's a way.... but....

Yeah, I was hoping that Before and Requires would give us some sort of "automatically enabled" kind of setting.  Before ends up being the opposite on shutdown, and therefore we would call it before trying to shut down the VDO service.  Does LVM have issues tearing complex stacks down automatically?  I've not noticed any before, so I can't refer to how those go about tearing down.

> 
> > 
> > If I change the stack to be RAW->PV/VG/LV->VDO->PV/VG/LV->XFS(fstab) with
> > the same configuration, the shutdown hangs again for 90 seconds, but this
> > time only for 'LVM2 metadata daemon'.  Startup ends up dropping me to an
> > emergency shell because the vdo service does not start.  If I run 'vdo start
> > --all' and then 'systemctl isolate default.target', things come up fine.
> 
> Yes, because blk-availability.service will traverse the stack from top to
> bottom, so it unmounts the XFS, then deactivates the PV/VG/LV layer, then it
> comes to the VDO layer, but since blk-availability currently doesn't handle
> that one, it's skipped and so the rest of the stack down to bottom is
> skipped... And then the blk-availability is not executed second time after
> the VDO has done any deactivations...

Perhaps we should look at contributing to blkdeactivate script to possibly handle VDO volumes as well, then.

> 
> For this to really work completely, we'd need to move the "vdo stop" for the
> exact VDO device that is encountered while the blk-availability traverses
> the stack from top to bottom. Only in this case the blk-availability can
> deactivate it (...just like it deactivates DM/LVM or MD devices on the way
> if they're hit). I think this would be much cleaner solution where one
> script is traversing the whole stack from top to bottom and just calling
> appropriate deactivation hooks based on device types. Honestly, I don't
> think it's feasible to have separate services here for the deactivation part
> - it's almost impossible to order them among each other then. But for that,
> we'd really need to make the blk-availability.service to be firmly enabled
> by default on the system.

Yeah, this sounds like an affirmation that we should look to contribute to the script for blkdeactivate to handle VDO volumes.

> 
> As for the proposed solution with systemd generator - I can't say yet - can
> you share more details on this please?

Well right now, the vdo.service runs 'vdo stop --all', which is a really big hammer and never works in every possible scenario (worse yet, with multiple scenarios on the same system).  Introducing the systemd generator functionality enables us to have per-vdo volume control via systemd.  We don't have a functional generator yet, and it doesn't look like it's quite the "one solution for all" kind of approach that I was hoping for, but it should hopefully be more flexible than what we have today.

Comment 27 Peter Rajnoha 2019-10-22 08:45:07 UTC

(In reply to Andy Walsh from comment #26)
> Yeah, this sounds like an affirmation that we should look to contribute to
> the script for blkdeactivate to handle VDO volumes.
> 

Yes, I'd go with he blkdeactivate because we can cover much more variations of the stack then. What blkdeactivate does is that it traverses the sysfs structure from top to bottom (through "lsblk -s" output), takes the type field and calls appropriate shutdown hook for this type, including unmount operation if needed. It can also handle grouped devices - for example, in case of LVM, if we hit one device, we can deactivate the whole group, then next time the other device from that group is hit, it's simply skipped (because we already deactivated the whole group with all its devices).

So with VDO, I'd add the blkdeactivate hook which, if it encounters a VDO volume, it calls "vdo stop --name=<volume>".

The blkdeactivate looks primarily at the lsblk's "TYPE" field to identify device type, but we can use a different type of check if this is not visible in lsblk yet - the only requirement here is that the check should be quick (if there are numerous devices on the system, we don't want to run any expensive checks for each device).

> > 
> > As for the proposed solution with systemd generator - I can't say yet - can
> > you share more details on this please?
> 
> Well right now, the vdo.service runs 'vdo stop --all', which is a really big
> hammer and never works in every possible scenario (worse yet, with multiple
> scenarios on the same system).  Introducing the systemd generator
> functionality enables us to have per-vdo volume control via systemd.  We
> don't have a functional generator yet, and it doesn't look like it's quite
> the "one solution for all" kind of approach that I was hoping for, but it
> should hopefully be more flexible than what we have today.

Well, I think this would be covered exactly by that blkdeactivate... I can try to work a prototype "VDO hook" for blkdeactivate and we can test that. Then once we have a working blkdeactivate patch, we could create a simple "virtual" service for VDO which just requires the blk-availability.service for it to depend on the deactivation.

The blk-availability.service/blkdeactivate is now part of "device-mapper" package, but I see that vdo already depends on lvm2 (which in turn depends on device-mapper), so we have the deps already covered then.

Comment 28 Peter Rajnoha 2019-10-22 12:58:17 UTC

Created attachment 1627977 [details]
blkdeactivate with VDO support

Andy, this is first shot of VDO support in blkdeactivate - please try it out. Since VDO volumes are mapped 1:1 to underlying devices (there's no grouping like in LVM or MD), things are then much easier.

You also need to enable the blk-availability.service - eventually, we can create that simple service in vdo package that will just rely on blk-availability.service and so it will enable it once vdo is installed.

Also, I see that the vdo.service has:

  ExecStart=/usr/bin/vdo start --all --confFile /etc/vdoconf.yml

...we should convert this to "event-based" activation - so each time we detect a device with VDO signature (based on blkid output), we'd start VDO volume on top of it.

Otherwise, without event-based activation, if there's VDO defined on top of any other volumes which are event-activated (like LVM or MD), it's possible that when we start vdo.service, the vdo won't see the underlying device because it hasn't been activated yet. But that's a different task... Let's have the deactivation part working for now.

Comment 29 Peter Rajnoha 2019-10-22 12:59:20 UTC

Created attachment 1627978 [details]
blkdeactivate with VDO support (patch)

Also the patch, to see the actual changes for blkdeactivate.

Comment 30 Andy Walsh 2019-10-30 14:48:48 UTC

I've been working through our generator that I mentioned previously.  Through this testing, I tried out the patch in comment#29.  It tended to work, but I can't say that I noticed a difference in behavior between the unmodified and modified versions of the blkdeactivate script.  I'm nto sure if that's a good thing or a bad thing.  I did modify the blk-availability.service to contain '-o configfile=/etc/vdoconf.yml'.

I still need to test this on RHEL8, but here is a snapshot of where we are with the generator:

Works with default mount options and nothing else in fstab (i.e. '<device> <mount> xfs defaults 0 0'): 
  RAW->VDO->FSTAB
  RAW->LVM->VDO->FSTAB
Requires additional mount option(s):
  RAW->VDO->LVM->FSTAB
    Requires x-systemd.requires=dev-mapper-<name>.device or x-systemd.requires=vdo-<name>.service
  RAW->LVM->VDO->LVM->FSTAB
    Requires x-systemd.requires=dev-mapper-<name>.device or x-systemd.requires=vdo-<name>.service
  RAW->NBDE->VDO->LVM->FSTAB
    Requires _netdev in the mount options, and also requires x-systemd.requires=dev-mapper-<name>.device or x-systemd.requires=vdo-<name>.service
  iSCSI->LVM->VDO->FSTAB
    Requires _netdev.  I saw some warnings in the bootup logs, but the volume does end up getting mounted automatically.
  iSCSI->VDO->FSTAB
    Requires _netdev.
Other:
  RAW->VDO->iSCSI-><initiator>
    This requires a drop-in unit at, say, /etc/systemd/system/target.service.d/vdo.conf that sets BindsTo=dev-mapper-<name>.device.

I will attach the proposed generator script and the modified version of the vdo.service unit to this ticket so that it can be checked out.

We do have a few questions related to the systemd configuration:
  - Is it reasonable to have a systemd-generator generate both a service and a device unit?
  - Is it reasonable to have a .device unit depend on a .service unit?
  - Is it reasonable to have an fstab entry depend on a .device unit?  Or should we typically point at .service units?

Comment 31 Andy Walsh 2019-10-30 14:57:03 UTC

Created attachment 1630623 [details]
vdo.generator script

Comment 32 Andy Walsh 2019-10-30 14:57:43 UTC

Created attachment 1630624 [details]
vdo.service unit

Comment 33 Peter Rajnoha 2019-10-31 10:44:08 UTC

(In reply to Andy Walsh from comment #30)
> I've been working through our generator that I mentioned previously. 
> Through this testing, I tried out the patch in comment#29.  It tended to
> work, but I can't say that I noticed a difference in behavior between the
> unmodified and modified versions of the blkdeactivate script.  I'm nto sure
> if that's a good thing or a bad thing.  I did modify the
> blk-availability.service to contain '-o configfile=/etc/vdoconf.yml'.

Well, did it deactivate whole arbitrary stack from top to bottom, including any variation of DM/LVM/MD/VDO/FSTAB layers?


> We do have a few questions related to the systemd configuration:
>   - Is it reasonable to have a systemd-generator generate both a service and
> a device unit?

Surely it can generate the service unit. But I thought the device units should always reflect the actual device as found in sysfs - and so the device unit is created automatically by systemd based on incoming udev events for the device (...and after it has switched from SYSTEMD_READY=0 to SYSTEMD_READY=1 within uevent environment).


>   - Is it reasonable to have a .device unit depend on a .service unit?

Usually, it's the other way round - service waiting for a device so the service can start up and since I think device units are created automatically to reflect the sysfs state based on incoming uevents, I'm not quite sure this is reasonable. But let's ask systemd guys to confirm for sure (adding Michal Sekletar to CC).


>   - Is it reasonable to have an fstab entry depend on a .device unit?  Or
> should we typically point at .service units?

Actually, this should be already done automatically by systemd - there's systemd-fstab-generator which reads the fstab content and creates the mount unit for each line which in turn depends on a device unit (...so once the device unit appears/is started == device appears, the mounting happens).

Comment 34 Sweet Tea Dorminy 2019-11-01 17:59:03 UTC

(In reply to Peter Rajnoha from comment #33)
> (In reply to Andy Walsh from comment #30)
> > I've been working through our generator that I mentioned previously. 
> > Through this testing, I tried out the patch in comment#29.  It tended to
> > work, but I can't say that I noticed a difference in behavior between the
> > unmodified and modified versions of the blkdeactivate script.  I'm nto sure
> > if that's a good thing or a bad thing.  I did modify the
> > blk-availability.service to contain '-o configfile=/etc/vdoconf.yml'.
> 
> Well, did it deactivate whole arbitrary stack from top to bottom, including
> any variation of DM/LVM/MD/VDO/FSTAB layers?
> 
> 
> > We do have a few questions related to the systemd configuration:
> >   - Is it reasonable to have a systemd-generator generate both a service and
> > a device unit?
> 
> Surely it can generate the service unit. But I thought the device units
> should always reflect the actual device as found in sysfs - and so the
> device unit is created automatically by systemd based on incoming udev
> events for the device (...and after it has switched from SYSTEMD_READY=0 to
> SYSTEMD_READY=1 within uevent environment).

Yes... I figure that the generated device unit will add on to the automatically-generated-by-systemd device unit. All we really need to do is add some dependencies so things that need the device will wait till the device exists by action of its dependencies...

> 
> 
> >   - Is it reasonable to have a .device unit depend on a .service unit?
> 
> Usually, it's the other way round - service waiting for a device so the
> service can start up and since I think device units are created
> automatically to reflect the sysfs state based on incoming uevents, I'm not
> quite sure this is reasonable. But let's ask systemd guys to confirm for
> sure (adding Michal Sekletar to CC).
> 
> 
> >   - Is it reasonable to have an fstab entry depend on a .device unit?  Or
> > should we typically point at .service units?
> 
> Actually, this should be already done automatically by systemd - there's
> systemd-fstab-generator which reads the fstab content and creates the mount
> unit for each line which in turn depends on a device unit (...so once the
> device unit appears/is started == device appears, the mounting happens).

The idea is that the fstab line says something like "/dir ... /dev/mapper/something ... x-systemd.requires=/dev/under/something" -- if /dev/mapper/something is a LVM or so forth volume, it's not going to innately depend on /dev/under/something, so we need to add the x-systemd.requires= line to make sure /dev/under/something exists, at which point LVM will create /dev/mapper/something which is the actual dependency of /dir. It might be better to append to the device unit for /dev/mapper/something the information that it depends on /dev/under/something, but I don't think we can do that in fstab, and LVM doesn't currently have the dependency information between devices encoded into unit files so systemd knows how to do stuff...

Comment 35 Peter Rajnoha 2019-11-04 12:20:35 UTC

(In reply to Sweet Tea Dorminy from comment #34)
> The idea is that the fstab line says something like "/dir ...
> /dev/mapper/something ... x-systemd.requires=/dev/under/something" -- if
> /dev/mapper/something is a LVM or so forth volume, it's not going to
> innately depend on /dev/under/something, so we need to add the
> x-systemd.requires= line to make sure /dev/under/something exists, at which
> point LVM will create /dev/mapper/something which is the actual dependency
> of /dir. It might be better to append to the device unit for
> /dev/mapper/something the information that it depends on
> /dev/under/something, but I don't think we can do that in fstab, and LVM
> doesn't currently have the dependency information between devices encoded
> into unit files so systemd knows how to do stuff...

I think it should be sufficient for systemd to know about the top of the stack only. For example, if we have a stack like "scsi -> lvm -> vdo -> fs", then fstab has information about fs - vdo binding and so it sets up the mount unit with the dependency on vdo device and waiting for it to appear (with certain timeout). Then the rest of the stack is simply instantiated based on incoming udev events:

  1) scsi device appears on the system, blkid is executed in udev (finds lvm layer)
  2) lvm layer activated then (based on scsi device's blkid result in step 1), blkid is executed in udev (finds vdo layer)
  3) vdo layer activated then (based on lvm device's blkid result in step 2), blkid is executed in udev (finds fs layer)
  4) systemd sees the device activated in 3) as well as the fs identified and so it mounts it...

I mean, the direct information about binding of layer 1 - 2 - 3 is not necessary for systemd as that is already known from the blkid results as each layer is scanned gradually. Systemd only waits for the device on top (which can be mounted) to appear. The rest is udev event-based activation in each layer.

Comment 36 Andy Walsh 2019-11-04 15:42:14 UTC

(In reply to Peter Rajnoha from comment #35)
> I think it should be sufficient for systemd to know about the top of the
> stack only. For example, if we have a stack like "scsi -> lvm -> vdo -> fs",
> then fstab has information about fs - vdo binding and so it sets up the
> mount unit with the dependency on vdo device and waiting for it to appear
> (with certain timeout). Then the rest of the stack is simply instantiated
> based on incoming udev events:
> 
>   1) scsi device appears on the system, blkid is executed in udev (finds lvm
> layer)
>   2) lvm layer activated then (based on scsi device's blkid result in step
> 1), blkid is executed in udev (finds vdo layer)
>   3) vdo layer activated then (based on lvm device's blkid result in step
> 2), blkid is executed in udev (finds fs layer)
>   4) systemd sees the device activated in 3) as well as the fs identified
> and so it mounts it...
> 
> I mean, the direct information about binding of layer 1 - 2 - 3 is not
> necessary for systemd as that is already known from the blkid results as
> each layer is scanned gradually. Systemd only waits for the device on top
> (which can be mounted) to appear. The rest is udev event-based activation in
> each layer.

The situation where I've found (using the generator attached in comment#31) that you need to specify some form of 'x-systemd.requires=<something>' is where VDO is under the other layers.  something like (scsi)->vdo->lvm->fs, because the system doesn't identify that the fourth layer depends on the VDO device.

Comment 37 Peter Rajnoha 2019-11-04 15:57:20 UTC

(In reply to Andy Walsh from comment #36)
> The situation where I've found (using the generator attached in comment#31)
> that you need to specify some form of 'x-systemd.requires=<something>' is
> where VDO is under the other layers.  something like (scsi)->vdo->lvm->fs,
> because the system doesn't identify that the fourth layer depends on the VDO
> device.

Hmm, that shouldn't be necessary:

  - (systemd waits for the LVM device from fstab)
  - scsi device appears, blkid scan is executed in udev (finds vdo layer)
  - vdo is activated, blkid scan is executed in udev (finds lvm layer)
  - lvm is activated, blkid scan is executed in udev (finds fs)
  - systemd sees the fs identified, mounts it


But I'll play with the generator a little bit more and see...

Comment 38 Peter Rajnoha 2019-11-15 11:39:26 UTC

Created attachment 1636449 [details]
All scripts for systemd/udev to test VDO startup and shutdown

Well, honestly, I don't quite follow why we need the generator - it seems too complex to me.


Instead, I've tried (and fixed a little so it works) Marius' udev rule and systemd instantiated service from comment #12 and then blkdeactive from comment #28. Then I tried that with scsi --> vdo --> lvm --> fs layout:

[0] rhel7-a/~ # lsblk /dev/sdb
NAME         MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sdb            8:16   0  10G  0 disk 
└─vdo1       253:2    0   1T  0 vdo  
  └─vg-lvol0 253:3    0   1G  0 lvm  /mnt/temp


[0] rhel7-a/~ # cat /etc/fstab | grep lvol0
/dev/vg/lvol0                             /mnt/temp               xfs     defaults        0 0


[0] rhel7-a/~ # systemctl status vdo-start-by-dev
● vdo-start-by-dev - Start VDO volume backed by /dev/sdb
   Loaded: loaded (/etc/systemd/system/vdo-start-by-dev@.service; static; vendor preset: disabled)
   Active: active (exited) since Fri 2019-11-15 12:19:50 CET; 3min 13s ago
  Process: 671 ExecStart=/usr/bin/vdo-by-dev %I start (code=exited, status=0/SUCCESS)
 Main PID: 671 (code=exited, status=0/SUCCESS)

Nov 15 12:19:49 rhel7-a.virt systemd[1]: Starting Start VDO volume backed by /dev/sdb...
Nov 15 12:19:50 rhel7-a.virt vdo-by-dev[671]: Starting VDO vdo1
Nov 15 12:19:50 rhel7-a.virt vdo-by-dev[671]: Starting compression on VDO vdo1
Nov 15 12:19:50 rhel7-a.virt vdo-by-dev[671]: VDO instance 0 volume is ready at /dev/mapper/vdo1
Nov 15 12:19:50 rhel7-a.virt systemd[1]: Started Start VDO volume backed by /dev/sdb.


[0] rhel7-a/~ # blkdeactivate -u -l wholevg
Deactivating block devices:
  [UMOUNT]: unmounting vg-lvol0 (dm-3) mounted on /mnt/temp... done
  [LVM]: deactivating Volume Group vg... done
  [VDO]: deactivating VDO volume vdo1... done


(Or alternatively, just shutdown and see if all the stack is deactivated properly from the logs, or break at shutdown initramfs hook to check if all devs are deactivated there...)


Could you please try?

Comment 39 Peter Rajnoha 2019-11-15 11:46:47 UTC

This issue is both about activating and deactivating VDO devices. For activation, I'd use that "vdo-start-by-dev@.service" (the original one from Marius) and for deactivation, I'd lean to blk-availability/blkdeactivate use so the whole stack is considered on deactivation.

This bug is currently assigned to me, so I'm resetting that to you guys. The startup is on your shoulders to include in the vdo package (whatever the selected solution would be there). As for the shutdown - if you take the blkdeactivate solution there, please, open a separate BZ for device-mapper package so we can include that part too.

Comment 40 Zdenek Kabelac 2019-11-15 14:52:09 UTC

I do not want to sidetrack this discussion - but couple things that might appear helpful.

1st. note:
Whenever tracking 'shutdown' delay - it might be worth to enable  systemd debug shell  (Alt+F9) and quickly check what is blocking progress during shutdown.

2nd. note: 
For 7.8 we've been fixing racing timeout on shutdown:
https://www.redhat.com/archives/lvm-devel/2019-September/msg00106.html
likely not related to VDO issue - but might cause blocking shutdown on lvmetad.service stopping...

Comment 41 Andy Walsh 2020-01-24 23:39:52 UTC

I started testing the suggested udev rule, startup python script, and updated blk-deactivate script in comment#38 and with some modifications to the udev rule I was able to get the system started with default mount options on a couple configurations so far.  It's looking promising.

First, the changes make the the udev rule look like this, now:
$ cat /etc/udev/rules.d/69-vdo-start-by-dev.rules
ENV{ID_FS_TYPE}=="vdo", PROGRAM="/usr/bin/systemd-escape -p $env{DEVNAME}", ENV{SYSTEMD_WANTS}+="vdo-start-by-dev@%c"

The change required that I call the systemd-escape program and provide the output of that to the vdo-start-by-dev unit, since the %N was providing a device name of 'dm/3' which the python script was unable to relate to a block device (should have been dm-3).

I have so far tested this on:
  RAW->VDO->FSTAB
  RAW->LVM->VDO->FSTAB

I will be testing on further configurations when I have a little bit more time.  I will report on the other configurations tested in comment#30 when I've done the testing.

Comment 42 Andy Walsh 2020-01-25 22:05:09 UTC

I have now tested these configurations and they have also passed with no or minimal (required and noted) changes:

RAW->LVM->VDO->LVM->FSTAB
RAW->NBDE->VDO->LVM->FSTAB: Requires _netdev in the mount options
RAW->NBDE->LVM->VDO->FSTAB: Requires _netdev in the mount options
RAW->NBDE->LVM->VDO->LVM->FSTAB: Requires _ndetdev in the mount options

I will be trying it out with iSCSI next.

Comment 43 Andy Walsh 2020-01-26 21:00:15 UTC

I've now tested these configurations (all of these require _netdev in the mount options):

RAW->iSCSI->-/ /->VDO->FSTAB: rebooting the initiator re-mounts the vdo with no manual intervention
RAW->VDO->iSCSI->-/  /->FSTAB: rebooting the target re-builds the stack as configured
RAW->VDO->iSCSI->-/ /->LVM->VDO->FSTAB: This is unsupported since it's VDO on VDO, but I wanted to try a complicated stack.  Rebooting the target and/or initiator re-builds the stack and mounts the fs with no manual intervention.
RAW->VDO->iSCSI->-/ /->LVM->VDO->LVM->FSTAB: This is unsupported since it's VDO on VDO, but I wanted to try a complicated stack.  Rebooting the target and/or initiator re-builds the stack and mounts the fs with no manual intervention.

I have not been able to find a configuration which this solution does not simply work with minimal (expected) modifications.  Every attempt I have made has just worked quite reliably.

I will be next looking to see what is needed for the systemd unit to work on RHEL8, since the provided script specifically uses /usr/bin/python2 in the shebang.  It looks pretty simple, so I'll be trying to just change that out to use python3 first.

Comment 44 Andy Walsh 2020-01-29 12:30:29 UTC

I'm pretty confident that this new configuration is pretty solid.  Are there any other stacks that someone is interested in before I decide whether we move forward with this?

I would like to see how the existing vdo.service implementation interacts with this.  All of my tests in the last few comments have been with the vdo.service disabled.  If customers are using x-systemd.requires=vdo.service or some other kind of dependency on the vdo.service being around, then I'd like to be sure we're not breaking their configurations by changing a default behavior.

Comment 45 Andy Walsh 2020-01-29 18:53:56 UTC

(In reply to Andy Walsh from comment #43)
> I will be next looking to see what is needed for the systemd unit to work on
> RHEL8, since the provided script specifically uses /usr/bin/python2 in the
> shebang.  It looks pretty simple, so I'll be trying to just change that out
> to use python3 first.

I just tried the same configuration on a RHEL8 host but with the shebang in /usr/bin/vdo-by-dev from #!/usr/bin/python2 to #!/usr/libexec/platform-python (like what is in the /usr/bin/vdo script), and it works with a simple stack with no other changes necessary.

Comment 49 corwin 2020-05-12 19:19:55 UTC

*** Bug 1664662 has been marked as a duplicate of this bug. ***

Comment 50 Filip Suba 2020-07-10 13:20:26 UTC

Verified with vdo-6.1.3.23-5.el7, kmod-kvdo-6.1.3.23-5.el7. Regression testing passed.

Comment 55 errata-xmlrpc 2020-09-29 20:10:53 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (kmod-kvdo bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3965

Note You need to log in before you can comment on or make changes to this bug.