Commit: commit 816197aaabbd41dd85d7728a6fd2992c148e124f Author: Petr Rockai <me> Date: Fri Mar 14 03:04:09 2014 +0100 lvmetad: Indicate whether pv_found caused the VG to change. Causes systemD to timeout on a 'swap device' altough my LVM backed swap device is just sitting there (happily). This is on ARCH. I had to file it under a 'product', I hope that's okay. Odd thing, here is a pastebin of systemctl output under lvm2 *.105 vs *.106. http://pastebin.com/raw.php?i=LBHEFNfD I don't know why all the other devices are displayed (maybe it's because one failed). I see this on two wildly different boxes: UP vs single hyperthreading core. i686 vs x86_64. Stock kernel vs custom kernel.
Created attachment 886226 [details] Output of systemctl -a under a working (v105) and a broken (v106) state after booting.
kabi_ on IRC suggested to refile this bug under Fedora Rawhide. To put the pastebin as attachment and to provide the output of the following command. I'm posting the diff here: lvs -a -o+devices --- v105 2014-04-14 20:16:31.567032305 +0200 +++ v106 2014-04-14 20:23:16.623546918 +0200 @@ -12,6 +12,6 @@ home main -wi-ao---- 63.46g /dev/sda3(138) linux-src main -wi-ao---- 3.00g /dev/sda3(3085) opt main -wi-ao---- 500.00m /dev/sda3(13) - swap main -wi-ao---- 2.00g /dev/sda2(0) + swap main -wi-a----- 2.00g /dev/sda2(0) usr main -wi-ao---- 3.42g /dev/sda3(1293) var main -wi-ao---- 2.00g /dev/sda3(3853)
As per suggestion by kabi_ I have reinstalled lvm2 and then rebuild initrd.gz with mkinitcpio (ARCH does not have dracut). Does not work. Fails the same way as above.
What's the device layout? Is there any stack underneath LVM? (MD, encrypted devices...)
Can you get the status of all "lvm2-pvscan@<major>:<minor>.service" units? (replace major:minor with exact numbers). You can find the list by issuing "systemctl -a | grep lvm2-pvscan".
(In reply to Peter Rajnoha from comment #7) > Can you get the status of all "lvm2-pvscan@<major>:<minor>.service" units? systemctl status lvm2-pvscan@<major>:<minor>.service
Below LVM is an old partition table. I was too lazy to fuse it all into one partition (forgot why I did not do that). There are like 3 or 4 primary (not extended partitions below LVM). I only see entry's corresponding to devices in systemctl -a output. Furthermore, lvmetad is mentioned twice. One for the service and one for the socket. [gebruiker@Golf ~]$ find /usr/lib/systemd /etc/systemd -iname \*lvm\* /usr/lib/systemd/system/sysinit.target.wants/lvmetad.socket /usr/lib/systemd/system/lvm-monitoring.service /usr/lib/systemd/system/lvmetad.service /usr/lib/systemd/system/lvmetad.socket [gebruiker@Golf ~]$ systemctl status lvm2-pvscan ● lvm2-pvscan.service Loaded: not-found (Reason: No such file or directory) Active: inactive (dead) [gebruiker@Golf ~]$ systemctl status lvm2-pvscan@254:0 ● lvm2-pvscan@254:0.service Loaded: not-found (Reason: No such file or directory) Active: inactive (dead)
(In reply to Ronald from comment #9) > Below LVM is an old partition table. I was too lazy to fuse it all into one > partition (forgot why I did not do that). There are like 3 or 4 primary (not > extended partitions below LVM). > > I only see entry's corresponding to devices in systemctl -a output. > Furthermore, lvmetad is mentioned twice. One for the service and one for the > socket. > > [gebruiker@Golf ~]$ find /usr/lib/systemd /etc/systemd -iname \*lvm\* > /usr/lib/systemd/system/sysinit.target.wants/lvmetad.socket > /usr/lib/systemd/system/lvm-monitoring.service > /usr/lib/systemd/system/lvmetad.service > /usr/lib/systemd/system/lvmetad.socket > So "/usr/lib/systemd/system/lvm2-pvscan@.service" is missing. This one causes the "lvm2-pvscan@<major>:<minor>.service" to be instantiated for each PV found in udev based on incoming events. It's instantiated via /lib/udev/rules.d/69-dm-lvm-metad.rules and this line exactly: ENV{SYSTEMD_WANTS}="lvm2-pvscan@$major:$minor.service" Please, make sure you have: - /usr/lib/systemd/system/lvm2-pvscan@.service installed - /lib/udev/rules.d/69-dm-lvm-metad.rules installed - use_lvmetad=1 in /etc/lvm/lvm.conf - lvm2-lvmetad.socket enabled (check the status by calling "systemctl status lvm2-lvmetad.socket") - lvmetad binary installed (check "which lvmetad") This is important to make event-based LVM autoactivation to work. If anything is missing, you need to report that against the distribution you use (this is Arch Linux, right?)
The sources should also be configured with: configure ... --enable-udev-systemd-background-jobs ... Since without this, there could be a problem with systemd killing the udev process that handles LVM autoactivation prematurely. Please, make sure this is used (I should probably change that to be used by default in upstream). Also, when installing, this should be used: make install_systemd_units make install_systemd_generators make install_tmpfiles_configuration
> So "/usr/lib/systemd/system/lvm2-pvscan@.service" is missing. This one > causes the "lvm2-pvscan@<major>:<minor>.service" to be instantiated for each > PV found in udev based on incoming events. It's instantiated via > /lib/udev/rules.d/69-dm-lvm-metad.rules and this line exactly: > > ENV{SYSTEMD_WANTS}="lvm2-pvscan@$major:$minor.service" The file in ARCH only contains this: RUN+="/usr/bin/lvm pvscan --background --cache --activate ay --major $major --minor $minor", ENV{LVM_SCANNED}="1" > > Please, make sure you have: > > - /usr/lib/systemd/system/lvm2-pvscan@.service installed Not there. [root@delta rules.d]# find /usr/lib/systemd -iname \*lvm\* /usr/lib/systemd/system/sysinit.target.wants/lvmetad.socket /usr/lib/systemd/system/lvmetad.socket /usr/lib/systemd/system/lvmetad.service /usr/lib/systemd/system/lvm-monitoring.service > - /lib/udev/rules.d/69-dm-lvm-metad.rules installed It's there. > - use_lvmetad=1 in /etc/lvm/lvm.conf It's there. > - lvm2-lvmetad.socket enabled (check the status by calling "systemctl > status lvm2-lvmetad.socket") lvmetad.service loaded active running LVM2 metadata daemon lvmetad.socket loaded active running LVM2 metadata daemon socket > - lvmetad binary installed (check "which lvmetad") [root@delta rules.d]# which lvmetad /usr/bin/lvmetad > > This is important to make event-based LVM autoactivation to work. CONFIG_DM_UEVENT was disabled in my custom kernel. But the stock kernel has it enabled and it failed there too. > If anything is missing, you need to report that against the distribution you > use (this is Arch Linux, right?) Well ... you mention something below. > The sources should also be configured with: > > configure ... --enable-udev-systemd-background-jobs ... > In ARCH, PKGBUILD are used to define how binary packages are generated. The one for lvm2 is below. https://projects.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?h=packages/lvm2 ./configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var --sbindir=/usr/bin \ --with-udev-prefix=/usr --with-systemdsystemunitdir=/usr/lib/systemd/system \ --with-default-pid-dir=/run --with-default-dm-run-dir=/run --with-default-run-dir=/run/lvm \ --enable-pkgconfig --enable-readline --enable-dmeventd --enable-cmdlib --enable-applib \ --enable-udev_sync --enable-udev_rules --with-default-locking-dir=/run/lock/lvm \ --enable-lvmetad --with-thin=internal I don't see it. > Since without this, there could be a problem with systemd killing the udev > process that handles LVM autoactivation prematurely. Please, make sure this > is used (I should probably change that to be used by default in upstream). Okay. > Also, when installing, this should be used: > > make install_systemd_units > make install_systemd_generators > make install_tmpfiles_configuration These are not visible in the PKGBUILD used to generate the packages. I'll open a ticker @ arch. Thank you for your time and help.
(In reply to Peter Rajnoha from comment #11) > The sources should also be configured with: > > configure ... --enable-udev-systemd-background-jobs ... I've made it default now (will be in lvm2 v2.02.107 and later): https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=702180b30c83183722d36500d05da20966759ecf
(In reply to Ronald from comment #12) > > So "/usr/lib/systemd/system/lvm2-pvscan@.service" is missing. This one > > causes the "lvm2-pvscan@<major>:<minor>.service" to be instantiated for each > > PV found in udev based on incoming events. It's instantiated via > > /lib/udev/rules.d/69-dm-lvm-metad.rules and this line exactly: > > > > ENV{SYSTEMD_WANTS}="lvm2-pvscan@$major:$minor.service" > > The file in ARCH only contains this: > > RUN+="/usr/bin/lvm pvscan --background --cache --activate ay --major $major > --minor $minor", ENV{LVM_SCANNED}="1" > Yeah, that's the older way of runnning the pvscan. However, we found out later that systemd can cause the pvscan process to be killed. At first, we used even the pure pvscan call in the udev rule, but this caused it to timeout after about 30s (a timeout for process run in udev). In systemd with lots and lots of devices, this was hit very easily. So we made it backgrounded. However, this does not play well with systemd as systemd controlls the cgroup, udev included. And once the udev process is finished, systemd killed any processes run in the background (and detached from the main udev process). So we ended up with the SYSTEMD_WANTS="lvm2-pvscan@$major:$minor" in the end. This will cause the systemd service to be instantiated and it's not prematurely killed like it's in the case of "pvscan --background" way. Anyway, if you encounter any problems even with this change (using the "configure --enable-udev-systemd-background-jobs ..."), please let me know. I can help if needed (this udev-systemd part can become tricky sometimes).
(In reply to Ronald from comment #12) > CONFIG_DM_UEVENT was disabled in my custom kernel. But the stock kernel has > it enabled and it failed there too. > (Not necessary - the kernel's CONFIG_DM_UEVENT is to enable events for multipath. General events for device-mapper devices is not bound to this kernel's configuration setting and they will work even with this config disabled. It's not quite the best name for the config, I have to admit...)
(In reply to Peter Rajnoha from comment #14) > At first, we used even the pure pvscan call in the udev rule, but this > caused it to timeout after about 30s (a timeout for process run in udev). In > systemd with lots and lots of devices, this was hit very easily. > > So we made it backgrounded. However, this does not play well with systemd as > systemd controlls the cgroup, udev included. And once the udev process is > finished, systemd killed any processes run in the background (and detached > from the main udev process). (We had exact bugs reported for this - I'd add a reference here but I can't find the bug reports quickly at the moment...)
(In reply to Ronald from comment #12) > I'll open a ticker @ arch. (...if possible, please, add me to CC list)
Closing this one as this will be tracked by external bug report.
I tried to rebuild the binary package. But that seems to more damage than good. It turns out ARCH has a lot of distribution specific changes to these configuration files. This probably explains why things went south here. For example: [gebruiker@delta ~]$ diff -u /home/gebruiker/lvm2-install/usr/lib/systemd/system/dm-event.socket /usr/lib/systemd/system/dmeventd.service --- /home/gebruiker/lvm2-install/usr/lib/systemd/system/dm-event.socket 2014-04-16 11:41:43.253056684 +0200 +++ /usr/lib/systemd/system/dmeventd.service 2014-04-11 23:51:03.000000000 +0200 @@ -1,12 +1,14 @@ [Unit] -Description=Device-mapper event daemon FIFOs +Description=Device-mapper event daemon Documentation=man:dmeventd(8) +Requires=dmeventd.socket +After=dmeventd.socket DefaultDependencies=no -[Socket] -ListenFIFO=/run/dmeventd-server -ListenFIFO=/run/dmeventd-client -SocketMode=0600 - -[Install] -WantedBy=sockets.target +[Service] +Type=forking +ExecStart=/usr/bin/dmeventd +ExecReload=/usr/bin/dmeventd -R +Environment=SD_ACTIVATION=1 +PIDFile=/run/dmeventd.pid +OOMScoreAdjust=-1000 I have filed an ARCH bug. Thank you for your time.
(In reply to Peter Rajnoha from comment #17) > (In reply to Ronald from comment #12) > > I'll open a ticker @ arch. > > (...if possible, please, add me to CC list) That does not appear to be possible. The bug is located here: https://bugs.archlinux.org/task/39896
Well, yes, all these udev/systemd things are very sensitive to changes. We've spent tons of time debugging this so it finally works as expected (in Fedora/RHEL). The best would be if upstream version of these udev and systemd parts are used directly all around distros without any changes. And if anything needs changing, the distribution should consult with us (so we can fix that in a way that the upstream version is usable all around different distros).
(In reply to Ronald from comment #20) > The bug is located here: > > https://bugs.archlinux.org/task/39896 Closing this report here then as it tracks bugs in packages relased in Fedora/RHEL only.
Downstream was kind enough to adapt the configuration files from upstream. The released package did not alleviate the bug. https://bugs.archlinux.org/task/39896 Current build information is located here: https://projects.archlinux.org/svntogit/packages.git/tree/trunk?h=packages/lvm2 (In reply to Peter Rajnoha from comment #10) > (In reply to Ronald from comment #9) > Please, make sure you have: > > - /usr/lib/systemd/system/lvm2-pvscan@.service installed yes, it's there now > - /lib/udev/rules.d/69-dm-lvm-metad.rules installed yes, it's there now > - use_lvmetad=1 in /etc/lvm/lvm.conf yes, it's there now > - lvm2-lvmetad.socket enabled (check the status by calling "systemctl > status lvm2-lvmetad.socket") yes, it's active and running > - lvmetad binary installed (check "which lvmetad") lvmetad exists (In reply to Peter Rajnoha from comment #11) > The sources should also be configured with: > > configure ... --enable-udev-systemd-background-jobs ... It's there for the main build, not for initramfs > Since without this, there could be a problem with systemd killing the udev > process that handles LVM autoactivation prematurely. Please, make sure this > is used (I should probably change that to be used by default in upstream). > > Also, when installing, this should be used: > > make install_systemd_units yes, it's being used > make install_systemd_generators yes, it's being used > make install_tmpfiles_configuration no, that is not used. /etc/tmpfiles.d is empty. Maybe that is intended? So, all of your requirements have been implemented. All I can see is the small legacy part for initramfs. I guess that could be the issue, right?
It seems I forgot the installation of the tmpfiles configuration, I will correct that. Anyway, the necessary paths still get created and lvmetad and pvscan work. As for the initramfs, the "legacy" initramfs (which is still the default) doesn't use systemd and thus needs the old-style udev rule. It is possible to use systemd in initramfs, in which case the new-style udev rule is used. Peter, if this tracker only considers RHEL+Fedora bugs, where should we report lvm2 bugs then? I can't see this being a packaging problem anymore.
(In reply to Thomas Bächler from comment #24) > It seems I forgot the installation of the tmpfiles configuration, I will > correct that. Anyway, the necessary paths still get created and lvmetad and > pvscan work. > > As for the initramfs, the "legacy" initramfs (which is still the default) > doesn't use systemd and thus needs the old-style udev rule. It is possible > to use systemd in initramfs, in which case the new-style udev rule is used. > > Peter, if this tracker only considers RHEL+Fedora bugs, where should we > report lvm2 bugs then? I can't see this being a packaging problem anymore. For upstream bugs, the best place is probably linux-lvm or lvm2-devel mailing list. We could track such problems with Fedora/rawhide lvm2 bug report, but thing is that the versions used might be different in each distro (also including different set of additional patches) and then it can become unclear and messy of what is tracked exactly. So it's better to solve this individually on the mailing list or directly on bug trackers for the exact distro (and CC someone from the team). I've just responded the mail you sent me. I'm just having a look at the problematic part you described...
So based on the email from Thomas, the problem is: - running rescue.target - rescue.target *does not* pull sockets.target - lvmetad is socket-activated and hence it does not start if sockets.target is not pulled - the swap is on LV - the LV is not activated - timeout for swap LV
(btw, the thing that this rescue.target works in Fedora/rawhide is that the sockets.target is pulled in by some other service that is part of the rescue.target... so it works by chance actually in rawhide)
So the timeout would happen on each fs on LV that is mentioned in the fstab. Therefore the timeout as systemd tries to mount the fs and waits for the underlying device.
Patched: https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=81b096af34283cf91205b0e63f598ca92d625051
That does not seem to be Ronald's problem. In his case, some LVs are activated and some aren't (on the same PV). I am still waiting for some debug information from him. Anyway, as per your recommendation, we should move this to the mailing list.
Thomas mentioned that I failed to mention something important: I'm using a custom build script to populate the /sysroot equivalent of the old style initrd. This script did not handle swap, which was the only LV that was handled by systemd. Currently, initrd is build using systemd. This new setup works around this bug. However, there are quite some other users who are suffering from the 105->106 upgrade. They appear in the bug and on the forums. So I hope this bug is still useful as a pointer to other breakage.