Bug 1026119 - fails to unmount encrypted filesystem (/dev/mapper/luks-partition) containing /var/log on every shutdown
Summary: fails to unmount encrypted filesystem (/dev/mapper/luks-partition) containing...
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: 31
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: systemd-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: RejectedBlocker
Depends On:
Blocks: 1055570
TreeView+ depends on / blocked
 
Reported: 2013-11-03 22:56 UTC by Christian Stadelmann
Modified: 2019-09-13 11:23 UTC (History)
28 users (show)

Fixed In Version: systemd-243~rc1-1.fc31
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-09-12 14:26:38 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
$ journalctl --full --all (29.16 KB, text/plain)
2013-11-03 22:56 UTC, Christian Stadelmann
no flags Details
journalctl -a (shutdown + reboot part) (191.33 KB, text/x-log)
2014-02-02 16:33 UTC, moon300web
no flags Details
output of $ journalctl (14.19 KB, text/plain)
2014-02-16 15:23 UTC, Christian Stadelmann
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github systemd systemd issues 867 0 None closed /var stays busy at shutdown due to journald 2021-11-05 17:09:27 UTC

Description Christian Stadelmann 2013-11-03 22:56:41 UTC
Created attachment 818832 [details]
$ journalctl --full --all

Description of problem:
Every time I shut down Fedora 20 (all updates from @fedora, @updates and @updates-testing installed), I get some warnings about unmounting failed on shutdown.

Version-Release number of selected component (if applicable): Fedora 20, I don't exactly know which component to blame

How reproducible:
On this PC: always

Steps to Reproduce:
shutdown Fedora

Additional info:
I am not sure whether systemd is the right component to report this bug against.
partition layout:
/dev/sda2 mounted as /boot
/dev/sda3 LUKS container with /dev/mapper/luks-27d6d376-a841-4804-9d08-c04cb959a223 mounted as /home
/dev/sda4 extended partition containing all other partitions
/dev/sda5 LUKS container with /dev/mapper/luks-6ba90457-13bf-4a36-acfb-355e0eed063d mounted as /

On a second machine I have a slightly different layout (no separate /home partiton) with the same problem

As a result to this bug fsck runs on every boot. As far as I can tell I had no data loss so far.

This bug may be related with https://bugzilla.redhat.com/show_bug.cgi?id=996475 (different layout: RAID with mdadm)

Comment 1 tim 2014-01-25 15:13:51 UTC
I have two Thinkpads, the first with LUKS and LVM (like your setting) and the second with an "classic" partition scheme (/ , /home, /swap). On both, I installed Archlinux (1. Gnome 3, 2. KDE).

$ journalctl -b -1 -r

shows me, that /home and /tmp failed to unmount. But a few lines above i can find "Reached target Unmount All Filesystems", so I think this is ok. It seems to be a systemd-bug.

Archlinux runs fsck on every boot. Perhaps Fedora shows the same behaviour and it isn't related to your problem.

Comment 2 Christian Stadelmann 2014-02-01 18:05:59 UTC
Still having the same issue. I noticed that when logging in (tty) right after boot I get this warning (also in dmesg):

[<time>] systemd-journald[<pid>]: File /var/log/journal/<id>/user-1000.journal corrupted or uncleanly shut down, renaming and replacing.

Comment 3 Christian Stadelmann 2014-02-01 19:01:29 UTC
I think that there is a ordering/dependency issue causing this problem:
In order to work fine (and provide logging at late shutdown) systemd-journald keeps logging for a long time. As long as systemd-journald keeps logging, we can't unmount the partition containing /var/log/journal/. Since I keep getting logs from after this partition should be unmounted, something is wrong here. Maybe the shutdown.target should work more like this:

1. shut down everything you can (except unmounting / and everything above /var/log/journal/<id> and stopping journal)
2. stop journal
3. unmount /
4. do the "real" shutdown

As of now it does this when running `$ systemctl poweroff`:
1. Stop network.target
2. stop default.target
3. stop multi-user.target
4. stop basic.target
5. stop slices.target
6. stop paths.target
7. stop timers.target
8. stop sockets.target
9. stop sysinit.target
10. stop cryptsetup.target
11. stop swap
12. stop remote-fs.target
13. stop auditd.service
14. stop local-fs.target
15. stop local-fs-pre.target
16. reach umount.target
17. stop systemd-readahead-replay.service
18. stop systemd-readahead-collect.service
19. reach shutdown.target
20. reach final.target
21. stop systemd-journald

Which must break in any case since there is no local FS available when systemd-journald stops (and it wants to log that).

Comment 4 moon300web 2014-02-02 16:31:56 UTC
I have the same problem: system shutdown hangs with last line on screen:
unmounting /sys/fs/fuse/connections

Looking at the shutdown messages, system seems to fail to unmount several partitions:
Feb 02 15:42:30  umount[6150]: umount: /var: target is busy
Feb 02 15:42:30  umount[6151]: umount: /tmp: target is busy
Feb 02 15:42:30  umount[6152]: umount: /boot: target is busy
Feb 02 15:42:30  umount[6158]: umount: /home: target is busy
Feb 02 15:42:30  systemd[1]: Failed unmounting /var.
Feb 02 15:42:30  systemd[1]: Failed unmounting /tmp.
Feb 02 15:42:30  systemd[1]: Failed unmounting /boot.
Feb 02 15:42:30  systemd[1]: Failed unmounting /home.

I will try to attach some output of journalctl -a (shutdown + reboot).

Adding 'rd.dm=0' to 'kernel: Command line: BOOT_IMAGE=...' did not help (was thinking that perhaps MD raid and DM raid were both taking control of the RAID).

Anyone an idea of what might be wrong?

Comment 5 moon300web 2014-02-02 16:33:44 UTC
Created attachment 858282 [details]
journalctl -a (shutdown + reboot part)

Comment 6 Christian Stadelmann 2014-02-16 15:23:20 UTC
Created attachment 863749 [details]
output of $ journalctl

More detailled journal. Bug still present with 3.13.3-200.fc20 kernel.

Comment 7 Christian Stadelmann 2014-07-04 23:26:42 UTC
I can still reproduce with latest F20 including all updates.
I can not reproduce #4, my system shutdown does not hang (at least not noticeably long, i.e. less than ~5 seconds)

I saw some changes (at least a month ago):
systemd now retries to unmount those partitions several times. As a result, it successes for every partition except the luks-container with / (including the /var folder).

It seems like the readahead services need to be stopped before unmounting the root filesystem.

Comment 8 Christian Stadelmann 2014-10-01 10:29:59 UTC
Present on F21 alpha. Only affects the filesystem containing /var/log (journal location).

Comment 9 Diego 2015-02-10 22:31:01 UTC
Any update on this?

I'm having the same problem.

I have /var/log mounted in a partition.

I'm using systemd 211 in an embedded system generated with Yocto Project.

Comment 10 Christian Stadelmann 2015-03-24 22:15:29 UTC
Present on F22 alpha. Still the same as in comment #8.

Comment 11 Christian Stadelmann 2015-09-03 15:51:12 UTC
Present in F23 Alpha.

Comment 12 Christian Stadelmann 2016-01-26 13:04:20 UTC
Upstream bug: https://github.com/systemd/systemd/issues/867

Comment 13 Christian Stadelmann 2016-08-29 08:09:07 UTC
Proposing as beta blocker bug because this bug violates the beta release criterion "Shutdown, reboot, logout":¹
> Similar to the Alpha criterion for shutting down, shutdown and reboot mechanisms must take storage volumes down cleanly and correctly request a shutdown or reboot from the system firmware.

¹ https://fedoraproject.org/wiki/Fedora_25_Beta_Release_Criteria#Shutdown.2C_reboot.2C_logout

Comment 14 Adam Williamson 2016-08-29 15:25:37 UTC
It looks like this only happens if you have /var as a separate partition and it's encrypted (or, according to the upstream bug, a bind mount, which...why would you even). So I'm not sure if it's worth blocking on that basis. Especially since it's been around for two years; why would it suddenly be a blocker now? Also, Lennart claims "This specific issue is pretty cosmetic actually, as all you see is the EBUSY error messages during shutdown, but the file system will be unmounted during the final killing spree, so all should be good." in the upstream issue, also. I'm probably -1.

Comment 15 Christian Stadelmann 2016-08-29 16:46:13 UTC
(In reply to Adam Williamson from comment #14)
> It looks like this only happens if you have /var as a separate partition and
> it's encrypted

No, this issue is present even if /var is just on the same partition as /, as long as / is encrypted. So this issue should affect any sane full-disk-encryption setup.

> (or, according to the upstream bug, a bind mount, which...why would you even).

No bind mounts or other fancy stuff required. Just a simple / on a LUKS container. No LVM required though I guess LVM partitions should have the same issue.

> So I'm not sure if it's worth blocking on that basis.

You've got a point on that.

> Also, Lennart claims "This specific issue is pretty cosmetic
> actually, as all you see is the EBUSY error messages during shutdown, but
> the file system will be unmounted during the final killing spree, so all
> should be good." in the upstream issue, also.

Thanks for the hint on this.

Comment 16 Geoffrey Marr 2016-08-29 19:46:36 UTC
Discussed during the 2016-08-29 blocker review meeting: [1]

The decision to classify this bug as a RejectedBlocker was made due to the fact that this bug only affects encrypted and separate /var partitions and even when it does appear, it does not have serious consequences.

[1] https://meetbot.fedoraproject.org/fedora-blocker-review/2016-08-29/f25-blocker-review.2016-08-29-16.00.txt

Comment 17 Bojan Smojver 2016-11-26 07:27:27 UTC
(In reply to Adam Williamson from comment #14)
> It looks like this only happens if you have /var as a separate partition and
> it's encrypted (or, according to the upstream bug, a bind mount, which...why
> would you even).

My /var is on a separate partition, but not encrypted and I'm seeing unmount failures on shutdown as well. Currently on F25, but was a problem in F24 as well.

Comment 18 Dominik 'Rathann' Mierzejewski 2017-01-18 08:52:47 UTC
My /var is a separate volume on an encrypted LVM VG and I get this as well.

Comment 19 Chris McCabe 2017-01-18 17:28:28 UTC
I suspect some people may actually be experiencing this bug:

https://bugzilla.redhat.com/show_bug.cgi?id=1294415

which appears to be a problem with unmounting, but is actually a problem with firewalld not shutting down, and the partitions wait for it indefinitely before unmounting.

A workaround is to set CleanupOnExit=no in /etc/firewalld/firewalld.conf

Comment 20 Fedora End Of Life 2017-11-16 19:53:18 UTC
This message is a reminder that Fedora 25 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 25. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '25'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 25 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged  change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Comment 21 Christian Stadelmann 2018-06-14 22:14:37 UTC
Nominating as PrioritizedBug because this issue causes fatal errors on every shutdown on a simple setup.

Comment 22 Ben Cotton 2018-08-01 20:42:02 UTC
Christian, before this is accepted as a PrioritizedBug, the team would like you to try to reproduce it on a fresh Fedora 28 install. Several of us have /var/log on an encrypted partition and haven't observed this behavior. Can you also confirm that you tried the firewalld setting in comment #19?

Comment 23 Christian Stadelmann 2018-08-01 21:03:11 UTC
(In reply to Ben Cotton from comment #22)
> Christian, before this is accepted as a PrioritizedBug, the team would like
> you to try to reproduce it on a fresh Fedora 28 install. Several of us have
> /var/log on an encrypted partition and haven't observed this behavior. Can
> you also confirm that you tried the firewalld setting in comment #19?

I'm sorry I won't be able to test on a fresh installation for at least two weeks.

The situation changed a bit, I've updated both affected systems to Fedora 28 and some more updates have been installed:

On one of the systems previously affected, the issue is gone. Due to unrelated stuff spamming my log files, I can't tell when or why this happens, but it was at least 4 weeks ago. This system is running firewalld *without* the setting changed, i.e. with defaults. This system has /var on a separate mountpoint ("subvolume") as /, both in the same btrfs partition.

On another system previously affected, the issue is still present. It is not affected by the "workaround" proposed in comment #19. (Yes, I did reboot twice). This system has only a mount point for `/` containing `/var` and everything else except `/home`.

So: comment #19 seems completely unrelated to me.

Comment 24 Christian Stadelmann 2018-08-01 21:09:17 UTC
(In reply to Christian Stadelmann from comment #23)
> (In reply to Ben Cotton from comment #22)
> > Can you also confirm that you tried the firewalld setting in comment #19?
> 
> […]
> So: comment #19 seems completely unrelated to me.

Also, I cannot reproduce bug 1294415 comment 46 and I haven't had any of the call traces or rmmod issues in my syslog or nf_conntrack stack trace.

Comment 25 Zbigniew Jędrzejewski-Szmek 2018-08-22 13:36:08 UTC
I set up a F30 VM with /var/log on an encrypted partition.
Indeed, the partition is not unmounted successfully during shutdown (the main phase), but it is unmounted successfully after transitioning to the initramfs.

[  OK  ] Stopped Flush Journal to Persistent Storage.
[  OK  ] Stopped target Local File Systems.
         Unmounting /boot...
         Unmounting /var/log...
         Unmounting /home...
         Unmounting Temporary Directory (/tmp)...
[FAILED] Failed unmounting /var/log.
[  OK  ] Unmounted /boot.
[  OK  ] Unmounted /home.
[  OK  ] Unmounted Temporary Directory (/tmp).
[  OK  ] Stopped target Swap.
         Deactivating swap Swap Partition...
[  OK  ] Stopped File System Check on /dev/d…d534d-7a0b-4a77-aa08-032f7a74b911.
[  OK  ] Removed slice system-systemd\x2dfsck.slice.
[  OK  ] Stopped target Local File Systems (Pre).
         Stopping Monitoring of LVM2 mirrors…ng dmeventd or progress polling...
[  OK  ] Stopped Remount Root and Kernel File Systems.
[  OK  ] Stopped Create Static Device Nodes in /dev.
[  OK  ] Reached target Shutdown.
[  OK  ] Deactivated swap Swap Partition.
[  OK  ] Reached target Unmount All Filesystems.
[  OK  ] Reached target Final Step.
         Starting Power-Off...
[  101.801603] systemd-shutdow: 814 output lines suppressed due to ratelimiting
[  101.826750] systemd-shutdown[1]: Syncing filesystems and block devices.
[  101.971341] systemd-shutdown[1]: Sending SIGTERM to remaining processes...
[  102.091616] systemd-journal: 6892 output lines suppressed due to ratelimiting
[  102.096947] systemd-shutdown[1]: Sending SIGKILL to remaining processes...
[  102.134252] systemd-shutdown[1]: Unmounting file systems.
[  102.144014] [1119]: Remounting '/var/log' read-only in with options '(null)'.
[  102.188583] EXT4-fs (dm-0): re-mounted. Opts: (null)
[  102.205987] [1120]: Unmounting '/var/log'.
[  102.261118] [1121]: Remounting '/' read-only in with options 'space_cache,subvolid=258,subvol=/root'.
[  102.291689] BTRFS info (device vdb3): disk space caching is enabled
[  102.339081] systemd-shutdown[1]: All filesystems unmounted.
[  102.755480] ACPI: Preparing to enter system sleep state S5
[  102.758827] reboot: Power down

That last part of the log is available because I have a serial console hooked up. It would not be easy to capture on real machine and would scroll by very quickly.

So the comment from https://bugzilla.redhat.com/show_bug.cgi?id=1026119#c14 appears to be true — this is mostly a cosmetic issue that does not result in any file systems being left mounted rw. That said, there is some bug, since *something* is blocking the file system, even though it shouldn't be.

Comment 26 Zbigniew Jędrzejewski-Szmek 2018-08-22 14:17:27 UTC
So the reason for the failed unmount is that systemd does not "know" that journald requires /var/log/journal. If RequiresMountsFor=/var/log/journal is added to systemd-journald.service, the shutdown is clean. The main reason we don't add RequiresMountsFor=/var/log/journal like that is that the journal is started early, and then the logs are moved from /run to /var/log once that's available. RMF= is not appropriate because it introduces a strict ordering dependency.

The second reason why we don't add RequiresMountsFor=/var/log/journal is that it is not needed in the general case, because permanent logging might not be enabled. Permanent logging can be enabled in two ways: a switch in /etc/systemd/journald.conf, or just by creating /var/log/journal. This second method is more popular, but has the issue that we don't know if it is enabled until after /var/log has been mounted. Once we know that /var/log/journal is needed, we could dynamically tell systemd that systemd-journald.service should be stopped before /var/log/journal.

So to solve this bug, we would need an ordering similar to RequiresMountsFor=/var/log/journal, but only one way — a stop ordering but not a start ordering. Currently systemd does not have a mechanism like that. The second step would be a mechanism to dynamically use it for systemd-journald.service if permanent logging is enabled. Doable, but not trivial ;) Such an ordering would be useful for other things of course.

Comment 27 Ben Cotton 2018-08-29 15:06:17 UTC
At today's Prioritized Bugs and Issues meeting, we agreed to reject this as a Prioritized Bug. The effects are cosmetic, but it's still a real bug. Unfortunately, the fix is non-trivial. systemd developers recommend leaving bug open so it can be tracked and eventually addressed.

https://meetbot.fedoraproject.org/fedora-meeting-2/2018-08-29/fedora_prioritized_bugs_and_issues.2018-08-29-14.00.log.html#l-70

Comment 28 Ben Cotton 2019-05-02 21:00:48 UTC
This message is a reminder that Fedora 28 is nearing its end of life.
On 2019-May-28 Fedora will stop maintaining and issuing updates for
Fedora 28. It is Fedora's policy to close all bug reports from releases
that are no longer maintained. At that time this bug will be closed as
EOL if it remains open with a Fedora 'version' of '28'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 28 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 29 Ben Cotton 2019-08-13 17:09:05 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle.
Changing version to '31'.

Comment 30 Ben Cotton 2019-08-13 19:23:18 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle.
Changing version to 31.

Comment 31 Zbigniew Jędrzejewski-Szmek 2019-09-12 14:26:38 UTC
Fixed in https://github.com/systemd/systemd/commit/1e187d2dd5.

Comment 32 udo 2019-09-12 15:45:38 UTC
(from the linik in commet31) The effects of BZ 1026119 are just cosmetic, but it's still a real bug. Unfortunately, the fix is non-trivial. systemd developers recommend leaving bug open so it can be tracked and eventually addressed, but we're removing this from the prioritized bug list.

- Of course *we* decide what is cosmetic. At the time and now still systemd is still in need of getting used to. At such non-pid1-deserving software (it is way more than waht init really did and what needs to go there?) one should expect a bit more leeway
- Trivial: trivial bugs take 5 years? Then what about the `real` bugs? (do not start about volunteers).
- Fixed? Can we test without special circumstances? If it is so trivial why do we not get chance to agree on fixedness beforehand? Or does one now know how my PC's were installed?
- Much much more available on request. (!!!)


Note You need to log in before you can comment on or make changes to this bug.