Bug 1026119 - fails to unmount encrypted filesystem (/dev/mapper/luks-partition) containing /var/log on every shutdown
fails to unmount encrypted filesystem (/dev/mapper/luks-partition) containing...
Status: NEW
Product: Fedora
Classification: Fedora
Component: systemd (Show other bugs)
25
x86_64 Linux
unspecified Severity unspecified
: ---
: ---
Assigned To: systemd-maint
Fedora Extras Quality Assurance
RejectedBlocker
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-11-03 17:56 EST by Christian Stadelmann
Modified: 2017-06-01 19:29 EDT (History)
24 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
$ journalctl --full --all (29.16 KB, text/plain)
2013-11-03 17:56 EST, Christian Stadelmann
no flags Details
journalctl -a (shutdown + reboot part) (191.33 KB, text/x-log)
2014-02-02 11:33 EST, moon300web
no flags Details
output of $ journalctl (14.19 KB, text/plain)
2014-02-16 10:23 EST, Christian Stadelmann
no flags Details

  None (edit)
Description Christian Stadelmann 2013-11-03 17:56:41 EST
Created attachment 818832 [details]
$ journalctl --full --all

Description of problem:
Every time I shut down Fedora 20 (all updates from @fedora, @updates and @updates-testing installed), I get some warnings about unmounting failed on shutdown.

Version-Release number of selected component (if applicable): Fedora 20, I don't exactly know which component to blame

How reproducible:
On this PC: always

Steps to Reproduce:
shutdown Fedora

Additional info:
I am not sure whether systemd is the right component to report this bug against.
partition layout:
/dev/sda2 mounted as /boot
/dev/sda3 LUKS container with /dev/mapper/luks-27d6d376-a841-4804-9d08-c04cb959a223 mounted as /home
/dev/sda4 extended partition containing all other partitions
/dev/sda5 LUKS container with /dev/mapper/luks-6ba90457-13bf-4a36-acfb-355e0eed063d mounted as /

On a second machine I have a slightly different layout (no separate /home partiton) with the same problem

As a result to this bug fsck runs on every boot. As far as I can tell I had no data loss so far.

This bug may be related with https://bugzilla.redhat.com/show_bug.cgi?id=996475 (different layout: RAID with mdadm)
Comment 1 tim 2014-01-25 10:13:51 EST
I have two Thinkpads, the first with LUKS and LVM (like your setting) and the second with an "classic" partition scheme (/ , /home, /swap). On both, I installed Archlinux (1. Gnome 3, 2. KDE).

$ journalctl -b -1 -r

shows me, that /home and /tmp failed to unmount. But a few lines above i can find "Reached target Unmount All Filesystems", so I think this is ok. It seems to be a systemd-bug.

Archlinux runs fsck on every boot. Perhaps Fedora shows the same behaviour and it isn't related to your problem.
Comment 2 Christian Stadelmann 2014-02-01 13:05:59 EST
Still having the same issue. I noticed that when logging in (tty) right after boot I get this warning (also in dmesg):

[<time>] systemd-journald[<pid>]: File /var/log/journal/<id>/user-1000.journal corrupted or uncleanly shut down, renaming and replacing.
Comment 3 Christian Stadelmann 2014-02-01 14:01:29 EST
I think that there is a ordering/dependency issue causing this problem:
In order to work fine (and provide logging at late shutdown) systemd-journald keeps logging for a long time. As long as systemd-journald keeps logging, we can't unmount the partition containing /var/log/journal/. Since I keep getting logs from after this partition should be unmounted, something is wrong here. Maybe the shutdown.target should work more like this:

1. shut down everything you can (except unmounting / and everything above /var/log/journal/<id> and stopping journal)
2. stop journal
3. unmount /
4. do the "real" shutdown

As of now it does this when running `$ systemctl poweroff`:
1. Stop network.target
2. stop default.target
3. stop multi-user.target
4. stop basic.target
5. stop slices.target
6. stop paths.target
7. stop timers.target
8. stop sockets.target
9. stop sysinit.target
10. stop cryptsetup.target
11. stop swap
12. stop remote-fs.target
13. stop auditd.service
14. stop local-fs.target
15. stop local-fs-pre.target
16. reach umount.target
17. stop systemd-readahead-replay.service
18. stop systemd-readahead-collect.service
19. reach shutdown.target
20. reach final.target
21. stop systemd-journald

Which must break in any case since there is no local FS available when systemd-journald stops (and it wants to log that).
Comment 4 moon300web 2014-02-02 11:31:56 EST
I have the same problem: system shutdown hangs with last line on screen:
unmounting /sys/fs/fuse/connections

Looking at the shutdown messages, system seems to fail to unmount several partitions:
Feb 02 15:42:30  umount[6150]: umount: /var: target is busy
Feb 02 15:42:30  umount[6151]: umount: /tmp: target is busy
Feb 02 15:42:30  umount[6152]: umount: /boot: target is busy
Feb 02 15:42:30  umount[6158]: umount: /home: target is busy
Feb 02 15:42:30  systemd[1]: Failed unmounting /var.
Feb 02 15:42:30  systemd[1]: Failed unmounting /tmp.
Feb 02 15:42:30  systemd[1]: Failed unmounting /boot.
Feb 02 15:42:30  systemd[1]: Failed unmounting /home.

I will try to attach some output of journalctl -a (shutdown + reboot).

Adding 'rd.dm=0' to 'kernel: Command line: BOOT_IMAGE=...' did not help (was thinking that perhaps MD raid and DM raid were both taking control of the RAID).

Anyone an idea of what might be wrong?
Comment 5 moon300web 2014-02-02 11:33:44 EST
Created attachment 858282 [details]
journalctl -a (shutdown + reboot part)
Comment 6 Christian Stadelmann 2014-02-16 10:23:20 EST
Created attachment 863749 [details]
output of $ journalctl

More detailled journal. Bug still present with 3.13.3-200.fc20 kernel.
Comment 7 Christian Stadelmann 2014-07-04 19:26:42 EDT
I can still reproduce with latest F20 including all updates.
I can not reproduce #4, my system shutdown does not hang (at least not noticeably long, i.e. less than ~5 seconds)

I saw some changes (at least a month ago):
systemd now retries to unmount those partitions several times. As a result, it successes for every partition except the luks-container with / (including the /var folder).

It seems like the readahead services need to be stopped before unmounting the root filesystem.
Comment 8 Christian Stadelmann 2014-10-01 06:29:59 EDT
Present on F21 alpha. Only affects the filesystem containing /var/log (journal location).
Comment 9 Diego 2015-02-10 17:31:01 EST
Any update on this?

I'm having the same problem.

I have /var/log mounted in a partition.

I'm using systemd 211 in an embedded system generated with Yocto Project.
Comment 10 Christian Stadelmann 2015-03-24 18:15:29 EDT
Present on F22 alpha. Still the same as in comment #8.
Comment 11 Christian Stadelmann 2015-09-03 11:51:12 EDT
Present in F23 Alpha.
Comment 12 Christian Stadelmann 2016-01-26 08:04:20 EST
Upstream bug: https://github.com/systemd/systemd/issues/867
Comment 13 Christian Stadelmann 2016-08-29 04:09:07 EDT
Proposing as beta blocker bug because this bug violates the beta release criterion "Shutdown, reboot, logout":¹
> Similar to the Alpha criterion for shutting down, shutdown and reboot mechanisms must take storage volumes down cleanly and correctly request a shutdown or reboot from the system firmware.

¹ https://fedoraproject.org/wiki/Fedora_25_Beta_Release_Criteria#Shutdown.2C_reboot.2C_logout
Comment 14 Adam Williamson 2016-08-29 11:25:37 EDT
It looks like this only happens if you have /var as a separate partition and it's encrypted (or, according to the upstream bug, a bind mount, which...why would you even). So I'm not sure if it's worth blocking on that basis. Especially since it's been around for two years; why would it suddenly be a blocker now? Also, Lennart claims "This specific issue is pretty cosmetic actually, as all you see is the EBUSY error messages during shutdown, but the file system will be unmounted during the final killing spree, so all should be good." in the upstream issue, also. I'm probably -1.
Comment 15 Christian Stadelmann 2016-08-29 12:46:13 EDT
(In reply to Adam Williamson from comment #14)
> It looks like this only happens if you have /var as a separate partition and
> it's encrypted

No, this issue is present even if /var is just on the same partition as /, as long as / is encrypted. So this issue should affect any sane full-disk-encryption setup.

> (or, according to the upstream bug, a bind mount, which...why would you even).

No bind mounts or other fancy stuff required. Just a simple / on a LUKS container. No LVM required though I guess LVM partitions should have the same issue.

> So I'm not sure if it's worth blocking on that basis.

You've got a point on that.

> Also, Lennart claims "This specific issue is pretty cosmetic
> actually, as all you see is the EBUSY error messages during shutdown, but
> the file system will be unmounted during the final killing spree, so all
> should be good." in the upstream issue, also.

Thanks for the hint on this.
Comment 16 Geoffrey Marr 2016-08-29 15:46:36 EDT
Discussed during the 2016-08-29 blocker review meeting: [1]

The decision to classify this bug as a RejectedBlocker was made due to the fact that this bug only affects encrypted and separate /var partitions and even when it does appear, it does not have serious consequences.

[1] https://meetbot.fedoraproject.org/fedora-blocker-review/2016-08-29/f25-blocker-review.2016-08-29-16.00.txt
Comment 17 Bojan Smojver 2016-11-26 02:27:27 EST
(In reply to Adam Williamson from comment #14)
> It looks like this only happens if you have /var as a separate partition and
> it's encrypted (or, according to the upstream bug, a bind mount, which...why
> would you even).

My /var is on a separate partition, but not encrypted and I'm seeing unmount failures on shutdown as well. Currently on F25, but was a problem in F24 as well.
Comment 18 Dominik 'Rathann' Mierzejewski 2017-01-18 03:52:47 EST
My /var is a separate volume on an encrypted LVM VG and I get this as well.
Comment 19 Chris McCabe 2017-01-18 12:28:28 EST
I suspect some people may actually be experiencing this bug:

https://bugzilla.redhat.com/show_bug.cgi?id=1294415

which appears to be a problem with unmounting, but is actually a problem with firewalld not shutting down, and the partitions wait for it indefinitely before unmounting.

A workaround is to set CleanupOnExit=no in /etc/firewalld/firewalld.conf

Note You need to log in before you can comment on or make changes to this bug.