Description of problem: during shutdown/reboot the following msgs appear amongst many msgs: Series of normal shutdown/reboot msgs, then: ... lvm2-monitor.service: control process exited, code=exited status=1 ... lvm[pid]: There are still devices being monitored lvm[pid]: Refusing to exit ... Version-Release number of selected component (if applicable): lvm2-2.02.84-1.fc15.x86_64 systemd-19-1.fc15.x86_64 How reproducible: every boot/reboot Steps to Reproduce: 1. selected reboot or boot 2. 3. Actual results: after the refusing to exit msg, some other unrelated normal msgs appear and then the machine hangs for up to one minute plus. Expected results: normal fast shutdown/reboot Additional info: could this be a systemd issue? only one snapshot of /home LV created using system-config-lvm $ sudo lvscan ACTIVE '/dev/VolGroup00/rawhide' [15.00 GiB] inherit ACTIVE '/dev/VolGroup00/fedora14' [13.67 GiB] inherit ACTIVE '/dev/VolGroup00/fedora13' [11.72 GiB] inherit ACTIVE '/dev/VolGroup00/fedora12' [11.72 GiB] inherit ACTIVE '/dev/VolGroup00/fedora15' [14.65 GiB] inherit ACTIVE '/dev/VolGroup04/fed14btrfs' [11.72 GiB] inherit ACTIVE '/dev/VolGroup03/debian60' [13.97 GiB] inherit ACTIVE Original '/dev/VolGroup01/clydehome' [19.53 GiB] inherit ACTIVE '/dev/VolGroup01/secondhome' [20.00 GiB] inherit ACTIVE Snapshot '/dev/VolGroup01/clydesnap' [20.00 GiB] inherit ACTIVE '/dev/VolGroup02/omega11' [20.00 GiB] inherit ACTIVE '/dev/VolGroup02/centos' [20.00 GiB] inherit ACTIVE '/dev/VolGroup02/suse' [20.00 GiB] inherit ACTIVE '/dev/VolGroup02/fedora13' [20.00 GiB] inherit ACTIVE '/dev/VolGroup02/ubuntu' [15.00 GiB] inherit ACTIVE '/dev/VolGroup02/rhel6' [20.00 GiB] inherit ACTIVE '/dev/VolGroup02/mandriva' [12.00 GiB] inherit ACTIVE '/dev/VolGroup02/downloads' [40.00 GiB] inherit
> lvm[pid]: There are still devices being monitored > lvm[pid]: Refusing to exit It means that some devices were not deactivated, here probably these with snapshot. I guess systemd should deactivate all devices before stopping this service and if it is root volume (with possibly snapshot, which is being monitored) it should hanle this situation specially? (Ignore it?) Reassingning to systemd, but it is possible that something is still missing on lvm side - but then we need discuss how it should work. Monitoring should do two things: - for lvm mirrors it handles device failures etc - for lvm snapshots it monitors free space in snapshot and allows automatic extension on some threshold. So blindly stop monitoring for the active device is not an option. But for system root device is it problem (another one in shutdown mix).
Just tried re-enabling lvm2-monitor via systemctl enable lvm2-monitor.service. On reboot, system hung with same msgs. snapshot is of home directory. / is on lv fedora15. Now disabling lvm2-monitor. BTW, works ok on fedora 14 without systemd.
I think I explained that at various occasions, but LVM is really broken here. If the monitor needs to run until after all FS are unmounted, then we'd need to unmount /remount read-only first, and kill the monitor after. That would require however that the monitor is not residing on the disk itself, because otherwise the monitor will block unmounting and remounting (the latter in case an upgrade was done and the monitor binary itself or a library it used (like libc) was replaced. Unix filesystems where a file was deleted which is still access cannot be mounted read-only until the access ends). However, the monitor resides on the root fs, hence it is not possible to always cleanly unmount/remount /. Hence, the monitor really should be killed at shutdown like any other process, because otherwise it is impossible to unmount/remount everything cleanly. If the monitor needs to sync meta data to disk after unmount then it should place a tiny binary in /lib/systemd/system-shutdown which is executed at the very end of everything, after unmounting and detaching all devices, right before we call reboot(). That binary should synchronously sync all meta data to disk and exit. systemd will wait for it to terminate. If such a tiny binary is used, then we can be sure that during the unmount/remount step not a single process is referencing any data on any fs anymore and hence we can reliable unmount/remount all file systems.
I could image a side step - to restart monitor before doing such shutdown - hence it should be possible then to remount filesystem 'ro' - as the updated libs/execs would be loaded from existing inodes ? I don't see a big difference in make a special next small daemon - as it would need all the wisdom from lvm library anyway....
Well, that tiny binary could be a shell script that just starts the monitor, waits for it to sync and kills it again then. Would that work?
Should this be a blocker for Fedora 15?
(In reply to comment #6) > Should this be a blocker for Fedora 15? I think this qualifies as a NTH for Fedora 15 Final. We have no existing criteria that cover the use of LVM snapshots.
Discussed at the 2011-04-15 blocker bug review meeting. While this doesn't hit any release criteria, it is a pain and could cause a user to think that their system was broken - accepted as NTH for F15.
I'm also seeing the 5-7 minute hangs on f15 shutdown a few lines after the lvm monitor complains as above. In my case (and perhaps most?) the snapshots are just for making backups. If we are rebooting the snapshots might as well be deleted and the backup restarted from the beginning. I'll just hack that into my /etc/rc.local .
+1 I have a mirrored LVM volume and root on LVM. After upgrading to F15, my shutdown "appeared" to be broken. I'm not sure what the best workaround is for this. I don't see this issue in the Release Notes or CommonBugs. I suggest it needs to have CommonBugs keyword.
Took me several hours to find this bug report, as it's not in the release notes or anything, and I did not know it was LVM + systemd related. This is a severe bug for us, as effectively, systems can neither be shutdown nor booted. Even as I've researched this issue for the last hour, my system has not shut down. Now I don't know what I must do to get it to boot past the LVM mounts. Can someone please post a clear workaround so that a system with LVM can start and stop cleanly? Thanks!
The system should eventually shutdown and then can reboot. The only workaround I know of until this is fixed is to: 1) reboot; 2) systemctl disable lvm2-monitor.service; 3) systemctl stop lvm2.service. Now, unfortunately, wait again a long time. But, with the service disabled, it will not be started on subsequent boots. Downside is no monitoring of snapshots and lvs. It looks like there a peeing contest about who or what is the culprit.
This problem is still present in rawhide. It really needs to be fixed. Thanks.
$ time sudo systemctl stop lvm2-monitor.service real 5m0.167s user 0m0.003s sys 0m0.034s seems excessive for a single snapshot
*** Bug 716737 has been marked as a duplicate of this bug. ***
I am expiriencing the same problem with long soft reboots and power offs. I have no snapshots, but have mirroring and striping using lvm internal capabilities.
20 minutes to boot? More than 45 minutes to shutdown? This should be a HIGH severity bug. Who has 20 minutes to get their laptop to boot? Who has over 45 minutes of reserve power on their server's UPS? 45 min to power off means LOSS OF DATA, which is the criteria for a HIGH severity bug. Nobody can sell anything that takes 20 min of doing nothing before it can start, and I cannot imagine anyone who would want it. The fact that this bug leaves a system in a completely unusable state, for extended periods of time, should make this a HIGH priority bug.
I agree with the previous poster. I don't have a server or a UPS, but waiting 5 minutes to reboot annoys and sometimes I just do hard reset risking my data. This is really bad and need to be fixed asap.
We'll provide native systemd unit files soon that will disable monitoring at the very possible end. This should also prevent this bug that happens with the legacy SysV init scripts when run by systemd. In review.
We have a rawhide package now (bug #714698). If we don't hit any problem, we'll port the patch for F15 as well.
Peter, if you are going to fix this bug by migrating to a native systemd service, please note that in general it is not acceptable: Packages are strictly forbidden from migrating to systemd within updates to a Fedora release. The migration is only allowed between Fedora releases. -- http://fedoraproject.org/wiki/Packaging:ScriptletSnippets#Packages_migrating_to_a_systemd_unit_file_from_a_SysV_initscript It would be nice to have this bug fixed in F15 though, and if no other fix is possible, you may be able to get an exception from FESCo in order to ignore this guideline.
The other solution would be to keep the old SysV init script and call force-stop instead of stop (alternatively, we could patch it so the "stop" functionality would be equal to "force-stop"). An important thing to mention here is that lvm2-monitor will stop too early this way. But this will get resolved eventually with F16 and late shutdown ramfs that is in plan there.
lvm2-2.02.86-2.fc16.x86_64 fixes this problem in rawhide. However, since it still exists in F15, will defer closing this bz.
lvm2 >= 2.02.86-2 fixes the problem for me, however, on my two real machines, i can no longer use lv* commands to create/remove snapshots or mirrors (or possibly anything at all). It works fine in a VM with a fresh PV set up by the installer, but in my two 'real world' examples, using PVs/VGs that were originally created in F12, the toolstack from rawhide is nonfunctional. Reverting to F15 lvm2 packages works fine. Was there a change (for example, in the metadata format) that may have occurred since F12 that would prevent me from using F16+ lvm tools?
Nothing should have broken format-wise. Run the simplest failing command you can find with -vvvv and see what the errors really are? (Probably best on a different bugzilla to avoid making this one drift too far off track.)
(In reply to comment #25) > Nothing should have broken format-wise. Run the simplest failing command you > can find with -vvvv and see what the errors really are? (Probably best on a > different bugzilla to avoid making this one drift too far off track.) Posted as bug 727925
Well, I've just noticed that the $runlevel variable is not set at all by systemd during shutdown/reboot. This variable is checked and to differentiate between a situation in which a proper shutdown/reboot is called and user calling the 'stop' script directly. I think that variable should be set if systemd runs the scripts in the SysV compatibility mode. So I hope this will be fixed in systemd directly (filed as bug #728955). If that's going to be fixed in systemd, I'll close this bug then.
*** Bug 736620 has been marked as a duplicate of this bug. ***
To summarise: The root cause of this turned out to be the withdrawal of the $runlevel variable in F15. This is not going to get reinstated. (bug #728955) The best fix in our opinion was to switch from a SysV initscript to the systemd equivalent immediately. However, this option had to be ruled out because Fedora strictly forbids it: http://fedoraproject.org/wiki/Packaging:ScriptletSnippets#Packages_migrating_to_a_systemd_unit_file_from_a_SysV_initscript The problem is resolved in F16 as far as we know, so people wanting to deal with it should choose a 'least bad' workaround until they upgrade. If you decide your 'least bad' option is to break the rules and switch to systemd early, we will attach a package that uses systemd to this bugzilla for you shortly.
please make a update for F15 this packaging-rules are useless if broken components are released as GA and any rule forbid to fix this - in this case the conversion had to be done BEFORE GA or has to be fixed in the lifetime-cycle
(In reply to comment #29) > The best fix in our opinion was to switch from a SysV initscript to the > systemd equivalent immediately. However, this option had to be ruled out > because Fedora strictly forbids it I recommend filing a request for an exception from the strict rule with FESCo: https://fedorahosted.org/fesco/ Fixing this bug should be a sufficient rationale for getting the exception. > If you decide your 'least bad' option is to break the rules and switch to > systemd early, we will attach a package that uses systemd to this bugzilla for > you shortly. Yes, please attach the package, so that the reporters can verify it works as expected.
The policy is quite clear - strictly forbidden. That doesn't allow for exceptions, surely? You'd need to lobby to get that bit of the policy changed first.
Anyway, let's get a possible package sorted out first and see if breaking this rule causes any technical problems for anyone or not.
I would be happy to test a potential fix for this F15 problem while the bureaucrats resolve the policy issues.
(In reply to comment #32) > The policy is quite clear - strictly forbidden. That doesn't allow for > exceptions, surely? Surely FESCo has the authority to grant an exception to any rule. Actually, there is a generic exception clause at the beginning of the Guidelines, http://fedoraproject.org/wiki/Packaging:Guidelines#Packaging_Guidelines: If you think that your package should be exempt from part of the Guidelines, please bring the issue to the Fedora Packaging Committee. So it seems the correct body to ask is FPC, not FESCo. https://fedorahosted.org/fpc/
You can find unofficial builds with systemd support backport here: x86_64: http://koji.fedoraproject.org/koji/taskinfo?taskID=3335365 i686: http://koji.fedoraproject.org/koji/taskinfo?taskID=3335366
Created attachment 522142 [details] F15 lvm2 build with systemd support Attaching x86_64 build here as well.
Created attachment 522143 [details] F15 lvm2 build with systemd support (i686) Attaching i686 build here as well.
Please, try these unofficial packages and if you hit any problems while doing an update, report them here in this bug report. Thanks.
Created attachment 522145 [details] F15 lvm2 with systemd support - src rpm
$ rpm -qa | grep lvm2 lvm2-2.02.84-4.fc15.x86_64 lvm2-libs-2.02.84-4.fc15.x86_64 lvm2-sysvinit-2.02.84-4.fc15.x86_64 $ systemctl status lvm2-monitor.service lvm2-monitor.service - Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling Loaded: loaded (/lib/systemd/system/lvm2-monitor.service) Active: active (exited) since Thu, 08 Sep 2011 13:07:19 -0400; 6min ago Process: 1250 ExecStart=/sbin/lvm vgchange --monitor y (code=exited, status=0/SUCCESS) CGroup: name=systemd:/system/lvm2-monitor.service Working here now on F15. Boot/reboot as fast as before. Suggest this bz be left open in case an official release to F15 is denied so anyone running into this will find more easily. Even tho F15 will be EOL first half of next year anyway, there will be folks who continue to use it and may run into this.
(In reply to comment #41) > $ rpm -qa | grep lvm2 > lvm2-2.02.84-4.fc15.x86_64 > lvm2-libs-2.02.84-4.fc15.x86_64 > lvm2-sysvinit-2.02.84-4.fc15.x86_64 Just a note: You don't need to install the sysvinit subpackage. This one is just for the ones who still want to use the old init script for some reason so they can find it here. Anyway, if you install lvm2 package together with lvm2-sysvinit subpackage, the systemd unit installed with lvm2 will prevail and it will be used primarily instead of SysV init script (systemd takes care of this).
*** Bug 739867 has been marked as a duplicate of this bug. ***
Peter, please can we build workaround package for F15? It is impossible to switch to systemd in F15 and this problem hits many users.
and that is the reason for my repeated question why damned was the users forced to use systemd with F15 before the distribution is ready and why in the world it is forbidden to fix such things in the life-cycle of F15? i have rebuilt the F16 package for F15 and it is runnign well, solves the problem and only a idiotic policy prevents to fix this for all - since Fedora defaults to LVM this is a core-package and must not have been pushed without converting BEFORE the release such things have to be a blocker by making large chnages like introduce systemd
(In reply to comment #45) > is ready and why in the world it is forbidden to fix such things in the > life-cycle of F15? I'll create Fesco ticket if needed. But in this case we have simple not intrusive workaround, so I would better see to use that for now.
A very very simple update would be (however, if people will call "lvm2-monitor stop" directly, it won't deny the action as is now in F15 - it'll just do it. Normally, you would need to use "lvm2-monitor force-stop" instead): scripts/lvm2_monitoring_init_red_hat.in | 3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/scripts/lvm2_monitoring_init_red_hat.in b/scripts/lvm2_monitoring_init_red_hat.in index 0988511..087f36c 100644 --- a/scripts/lvm2_monitoring_init_red_hat.in +++ b/scripts/lvm2_monitoring_init_red_hat.in @@ -92,8 +92,7 @@ case "$1" in ;; stop) - test "$runlevel" = "0" && WARN=0 - test "$runlevel" = "6" && WARN=0 + WARN=0 stop rtrn=$? [ $rtrn = 0 ] && rm -f $LOCK_FILE
The stop mechanism was allowed only during restart and shutdown, but we can't test it anymore since systemd does not provide the runlevel info in the $runlevel variable. Also the "runlevel" command is not deterministic, bug #728955 ... If we're going to do an update, I'm voting for this simple patch instead.
lvm2-2.02.84-4.fc15 has been submitted as an update for Fedora 15. https://admin.fedoraproject.org/updates/lvm2-2.02.84-4.fc15
I submitted update to F15 testing with workaround mentioned in comment #47, I hope it helps for now.
(In reply to comment #49) > lvm2-2.02.84-4.fc15 has been submitted as an update for Fedora 15. > https://admin.fedoraproject.org/updates/lvm2-2.02.84-4.fc15 Will this cause a problem if we installed the fix in comment #37? TIA
I would say - if you installed fix from comment #37, stay with it. (It has the same n-v-r so update should not automatically reinstall it. But reinstall should just switch initscripts back to sysv mode, it should not cause big problems (but not tested).
Package lvm2-2.02.84-4.fc15: * should fix your issue, * was pushed to the Fedora 15 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing lvm2-2.02.84-4.fc15' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/lvm2-2.02.84-4.fc15 then log in and leave karma (feedback).
How about adding to the inline comment something like: # This means that in Fedora 15, as a side-effect of facilitating a clean # system shutdown, the script will NOT warn you of the problems you may # cause if you stop this service while the system still requires it to be # running.
lvm2-2.02.84-4.fc15 has been pushed to the Fedora 15 stable repository. If problems still persist, please make note of it in this bug report.
Don't works for me. # lvscan ACTIVE '/dev/VG_Data/LV_Data' [648,00 GiB] inherit ACTIVE '/dev/VG_System/LV_Root' [50,00 GiB] inherit ACTIVE '/dev/VG_System/LV_Home' [30,00 GiB] inherit ACTIVE '/dev/VG_System/LV_Swap' [3,00 GiB] inherit # lvs LV VG Attr LSize Origin Snap% Move Log Copy% Convert LV_Data VG_Data -wi-ao 648,00g LV_Home VG_System mwi-ao 30,00g LV_Home_mlog 100,00 LV_Root VG_System -wi-ao 50,00g LV_Swap VG_System -wi-ao 3,00g
(In reply to comment #56) > Don't works for me. Please, check the content of /etc/rc.d/init.d and see if these lines are commented out: # # Because systemd doesn't support setting of $runlevel # https://bugzilla.redhat.com/show_bug.cgi?id=728955 # we have to use workaround here. # # test "$runlevel" = "0" && WARN=0 # test "$runlevel" = "6" && WARN=0
(In reply to comment #57) > (In reply to comment #56) > > Don't works for me. > > Please, check the content of /etc/rc.d/init.d and see if these lines are > commented out: > > # > # Because systemd doesn't support setting of $runlevel > # https://bugzilla.redhat.com/show_bug.cgi?id=728955 > # we have to use workaround here. > # > # test "$runlevel" = "0" && WARN=0 > # test "$runlevel" = "6" && WARN=0 Yes, it is.
(In reply to comment #58) > (In reply to comment #57) > > (In reply to comment #56) > > > Don't works for me. > > > > Please, check the content of /etc/rc.d/init.d and see if these lines are > > commented out: > > > > # > > # Because systemd doesn't support setting of $runlevel > > # https://bugzilla.redhat.com/show_bug.cgi?id=728955 > > # we have to use workaround here. > > # > > # test "$runlevel" = "0" && WARN=0 > > # test "$runlevel" = "6" && WARN=0 > > Yes, it is. Having the same problem. The lines are commented out here as well. lvm2-2.02.84-4.fc15.x86_64 lvm2-libs-2.02.84-4.fc15.x86_64
(In reply to comment #59) > Having the same problem. The lines are commented out here as well. Does it hang if you call "systemctl stop lvm2-monitor.service" on command line directly?
# systemctl status lvm2-monitor.service lvm2-monitor.service - LSB: Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling Loaded: loaded (/etc/rc.d/init.d/lvm2-monitor) Active: active (running) since Thu, 20 Oct 2011 11:36:36 +0200; 2s ago Process: 12706 ExecStop=/etc/rc.d/init.d/lvm2-monitor stop (code=exited, status=0/SUCCESS) Process: 12797 ExecStart=/etc/rc.d/init.d/lvm2-monitor start (code=exited, status=0/SUCCESS) Main PID: 12805 (dmeventd) CGroup: name=systemd:/system/lvm2-monitor.service └ 12805 /sbin/dmeventd # time systemctl stop lvm2-monitor.service real 5m0.525s user 0m0.000s sys 0m0.004s # systemctl status lvm2-monitor.service lvm2-monitor.service - LSB: Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling Loaded: loaded (/etc/rc.d/init.d/lvm2-monitor) Active: failed since Thu, 20 Oct 2011 11:41:44 +0200; 6min ago Process: 12812 ExecStop=/etc/rc.d/init.d/lvm2-monitor stop (code=exited, status=0/SUCCESS) Process: 12797 ExecStart=/etc/rc.d/init.d/lvm2-monitor start (code=exited, status=0/SUCCESS) Main PID: 12805 (code=killed, signal=KILL) CGroup: name=systemd:/system/lvm2-monitor.service
I have to say it sometimes shutdowns fine and sometimes I have to use SysRq to turn off the computer at all (in reasonable time, I never tried waiting 5 minutes). So it's not 100% reproducible. But stopping the service manually took very long in 2 out of 2 attempts, so the reproducibility should be high enough.
(In reply to comment #62) > I have to say it sometimes shutdowns fine and sometimes I have to use SysRq to > turn off the computer at all (in reasonable time, I never tried waiting 5 > minutes). So it's not 100% reproducible. But stopping the service manually took > very long in 2 out of 2 attempts, so the reproducibility should be high enough. I've looked at this with Kamil on his machine and it seems the script is executed properly. Debug log of the lvm2-monitor sysv script execution - "systemctl stop lvm2-monitor.service": + echo -n 'Stopping monitoring for VG encvg: ' Stopping monitoring for VG encvg: + shift + /sbin/vgchange --monitor n encvg 3 logical volume(s) in volume group "encvg" unmonitored + success 'Stopping monitoring for VG encvg:' ... + echo -n ' OK ' ... + return 0 ... + echo -n 'Stopping monitoring for VG vg: ' Stopping monitoring for VG vg: + shift + /sbin/vgchange --monitor n vg 12 logical volume(s) in volume group "vg" unmonitored + success 'Stopping monitoring for VG vg:' ... + echo -n ' OK ' ... + return 0 So it seems the hang occurs somewhere in systemd/systemctl. I'll open a separate bug for systemd.
(In reply to comment #63) > So it seems the hang occurs somewhere in systemd/systemctl. I'll open a > separate bug for systemd. bug #747582
(In reply to comment #60) > (In reply to comment #59) > > Having the same problem. The lines are commented out here as well. > > Does it hang if you call "systemctl stop lvm2-monitor.service" on command line > directly? Yes, takes forever. I just Ctrl+C stop. If I call force-stop, it just stops well. I don't know if it is related, look piece of my mount list: . . . /dev/mapper/VG_Data-LV_Data on /Data type ext3 (rw,nosuid,relatime,seclabel,errors=continue,barrier=0,data=ordered) /dev/mapper/VG_System-LV_Home on /home type ext3 (rw,relatime,seclabel,errors=continue,user_xattr,barrier=0,data=ordered) /dev/mapper/VG_System-LV_Root on /tmp type ext3 (rw,relatime,seclabel,errors=continue,barrier=0,data=ordered) /dev/mapper/VG_System-LV_Root on /var/tmp type ext3 (rw,relatime,seclabel,errors=continue,barrier=0,data=ordered) /dev/mapper/VG_System-LV_Home on /home type ext3 (rw,relatime,seclabel,errors=continue,user_xattr,barrier=0,data=ordered) . . . It's don't looks right.