Bug 1691589 - transient hang when starting cryptography setup for swap
Summary: transient hang when starting cryptography setup for swap
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: 30
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: systemd-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-22 01:15 UTC by Chris Murphy
Modified: 2020-05-26 16:29 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-26 16:29:10 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
journal full (265.26 KB, text/plain)
2019-03-22 01:18 UTC, Chris Murphy
no flags Details
systemctl (36.76 KB, text/plain)
2019-03-22 01:21 UTC, Chris Murphy
no flags Details
ps aux (8.85 KB, text/plain)
2019-03-22 01:21 UTC, Chris Murphy
no flags Details
journal full with debug (3.34 MB, text/plain)
2019-03-27 01:01 UTC, Chris Murphy
no flags Details
systemctl list-jobs (5.30 KB, text/plain)
2019-03-27 05:24 UTC, Chris Murphy
no flags Details
journal more debug (4.10 MB, text/plain)
2019-03-28 23:26 UTC, Chris Murphy
no flags Details
journal (6.86 MB, text/plain)
2019-06-08 19:09 UTC, Chris Murphy
no flags Details
systemctl list-jobs (4.92 KB, text/plain)
2019-06-08 19:10 UTC, Chris Murphy
no flags Details
systemctl status dev-mapper-cryptswap.device (544 bytes, text/plain)
2019-06-08 19:11 UTC, Chris Murphy
no flags Details

Description Chris Murphy 2019-03-22 01:15:42 UTC
Description of problem:


Version-Release number of selected component (if applicable):
systemd-241-2.gita09c170.fc30.x86_64
systemd-239-12.git8bca462.fc29.x86_64

How reproducible:
non-deterministic, but so far never happens if systemd.log_level=debug; with fc29 it happened perhaps once per week, with fc30 it's happened three times today or roughly 50% of boots

Steps to Reproduce:
1. Boot *shrug*
2.
3.

Actual results:

- the dm-crypt device has been created
- the dm-crypt volume has had mkswap run on it
- seems to be hanging on swapon but I can't tell why
- if in debug shell I manually `swapon /dev/mapper/eswap` the hang is immediately cleared up

Expected results:

should always boot; or at the least should not wait indefinitely for swap to enable

Additional info:


[chris@flap ~]$ sudo cat /etc/crypttab
# <name>       <device>         <password>              <options>
eswap      /dev/disk/by-partuuid/688b193f-3b38-4ca5-8b65-2ef61f27ec83        /dev/urandom            swap,cipher=aes-xts-plain64,size=256


[chris@flap ~]$ sudo cat /etc/fstab
...
/dev/mapper/eswap none swap defaults 0 0

Comment 1 Chris Murphy 2019-03-22 01:18:44 UTC
Created attachment 1546734 [details]
journal full

[    5.745133] flap.local mkswap[736]: Setting up swapspace version 1, size = 8 GiB (8589930496 bytes)
[    5.745133] flap.local mkswap[736]: no label, UUID=ca8696d9-144f-4f5c-8ea2-1c9ef43ba1e6
[    5.745352] flap.local systemd[1]: Started Cryptography Setup for eswap.
[    5.745635] flap.local kernel: audit: type=1130 audit(1553215322.956:74): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-cryptsetup@eswap comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[    5.751618] flap.local audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-cryptsetup@eswap comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
...
[    5.758140] flap.local systemd[1]: Reached target Local Encrypted Volumes.

And then a large 1200+ second gap while I just waited doing nothing; then in early debug shell I ran `swapon /dev/mapper/eswap` and then journal resumes from here

[ 1237.811834] flap.local kernel: Adding 8388604k swap on /dev/mapper/eswap.  Priority:-2 extents:1 across:8388604k SSFS
[ 1237.820584] flap.local systemd[1]: Condition check resulted in /dev/mapper/eswap being skipped.
[ 1237.820890] flap.local systemd[1]: Unnecessary job for /dev/mapper/eswap was removed.
[ 1237.820930] flap.local systemd[1]: Reached target Swap.


But I can't tell why the no longer necessary job was hanging in the first place.

Comment 2 Chris Murphy 2019-03-22 01:21:11 UTC
Created attachment 1546735 [details]
systemctl

# systemctl

This shows
dev-mapper-eswap.device                                                                loaded inactive dead      start /dev/mapper/eswap   

And so:

# systemctl status dev-mapper-eswap.device
● dev-mapper-eswap.device - /dev/mapper/eswap
   Loaded: loaded
  Drop-In: /run/systemd/generator/dev-mapper-eswap.device.d
           └─90-device-timeout.conf
   Active: inactive (dead)


I don't understand why it ends in .d or why it's dead.

Comment 3 Chris Murphy 2019-03-22 01:21:41 UTC
Created attachment 1546736 [details]
ps aux

Comment 4 Chris Murphy 2019-03-27 01:01:42 UTC
Created attachment 1548333 [details]
journal full with debug

Ahha! GOTCHA!
This time with systemd.log_level=debug

This includes boot, startup, hang, a delay, and then manual `swapon` using early debug shell.

Comment 5 Chris Murphy 2019-03-27 01:07:23 UTC
Everything prior to this is where the hand and problem is at... This is the manual invocation.
[  205.059710] flap.local kernel: Adding 8388604k swap on /dev/mapper/eswap.  Priority:-2 extents:1 across:8388604k SSFS

Comment 6 Chris Murphy 2019-03-27 01:39:22 UTC
[   26.889728] flap.local mkswap[716]: Setting up swapspace version 1, size = 8 GiB (8589930496 bytes)
[   26.889728] flap.local mkswap[716]: no label, UUID=f500a932-bccc-4efb-9237-f4e245616bea
[   26.890608] flap.local systemd[1]: Received SIGCHLD from PID 716 (mkswap).
[   26.890671] flap.local systemd[1]: Child 716 (mkswap) died (code=exited, status=0/SUCCESS)


The mkswap succeeds. But I don't see that the swapon ever happens.

Comment 7 Chris Murphy 2019-03-27 05:01:34 UTC
BTW starting at comment 4, I'm using systemd-241-3.gitc1f8ff8.fc30.x86_64.

Comment 8 Chris Murphy 2019-03-27 05:24:15 UTC
Created attachment 1548363 [details]
systemctl list-jobs

Did this in early debug shell during hang.

126 dev-mapper-eswap.device                start running


[chris@flap ~]$ sudo systemctl status dev-mapper-eswap.device
● dev-mapper-eswap.device - /dev/mapper/eswap
   Loaded: loaded
  Drop-In: /run/systemd/generator/dev-mapper-eswap.device.d
           └─90-device-timeout.conf
   Active: inactive (dead)

Mar 26 22:56:44 flap.local systemd[1]: dev-mapper-eswap.device: Merged dev-mapper-eswap.device/start into installed job dev-mapper-eswap.device/start as 126
Mar 26 22:56:44 flap.local systemd[1]: dev-mapper-eswap.device: Merged dev-mapper-eswap.device/start into installed job dev-mapper-eswap.device/start as 126
Mar 26 22:59:27 flap.local systemd[1]: dev-mapper-eswap.device: Collecting job.
Mar 26 22:59:27 flap.local systemd[1]: dev-mapper-eswap.device: Job 126 dev-mapper-eswap.device/start finished, result=collected
Mar 26 22:59:27 flap.local systemd[1]: Unnecessary job for /dev/mapper/eswap was removed.
[chris@flap ~]$ 


There's more than one delay in the startup. I'm not sure if it's related. I do briefly see the stalled eswap job early startup possibly as soon as 10 seconds, with the red cylon eye.

[    9.981358] flap.local systemd[1]: Added job system-getty.slice/start to transaction.
[   14.889369] flap.local systemd[1]: var-tmp.mount: Collecting.

[   15.453500] flap.local systemd[568]: Operating on architecture: x86-64
[   25.807480] flap.local systemd-vconsole-setup[573]: /usr/bin/loadkeys succeeded.

Comment 9 Chris Murphy 2019-03-28 23:26:10 UTC
Created attachment 1549243 [details]
journal more debug

systemd.log_level=debug systemd.debug-shell=1 rd.udev.debug rd.debug

Comment 10 Chris Murphy 2019-03-29 05:01:39 UTC
In both working and non-working cases, mkswap exits with success.
In only the working case:

[   19.930420] flap.local systemd[1]: dev-disk-by\x2did-dm\x2dname\x2deswap.device: Changed dead -> plugged
[   19.930529] flap.local systemd[1]: dev-disk-by\x2duuid-7143e798\x2d105e\x2d4dd2\x2d8e8f\x2d214ffeb54977.device: Changed dead -> plugged
[   19.930582] flap.local systemd[1]: dev-disk-by\x2did-dm\x2duuid\x2dCRYPT\x2dPLAIN\x2deswap.device: Changed dead -> plugged
[   19.930642] flap.local systemd[1]: dev-mapper-eswap.device: Changed dead -> plugged
[   19.930691] flap.local systemd[1]: dev-mapper-eswap.device: Job 131 dev-mapper-eswap.device/start finished, result=done
[   19.930749] flap.local systemd[1]: Found device /dev/mapper/eswap.
[   19.930801] flap.local systemd[1]: dev-dm\x2d0.device: Changed dead -> plugged
[   19.930850] flap.local systemd[1]: sys-devices-virtual-block-dm\x2d0.device: Changed dead -> plugged
[   19.933800] flap.local systemd[1]: dev-mapper-eswap.swap: About to execute: /sbin/swapon /dev/mapper/eswap

The failure case, eswap.device never changes from dead to plugged, and we never get to execute swapon. No idea why it's stuck in between mkswap and swapon.

Comment 11 Chris Murphy 2019-06-08 19:09:59 UTC
Created attachment 1578655 [details]
journal

systemd.log_level=debug rd.udev.debug

Comment 12 Chris Murphy 2019-06-08 19:10:41 UTC
Created attachment 1578656 [details]
systemctl list-jobs

In early debug shell: systemctl list-jobs

Comment 13 Chris Murphy 2019-06-08 19:11:41 UTC
Created attachment 1578658 [details]
systemctl status dev-mapper-cryptswap.device

Comment 14 Chris Murphy 2019-06-08 19:14:44 UTC
Updated all the attachments. One change is the name of the crypto device for swap is 'cryptswap' rather than 'eswap' (so anywhere comment 10 and older where it says eswap, it's the same as cryptswap)

kernel 5.1.7-300.fc30.x86_64
systemd-241-8.git9ef65cb.fc30.x86_64

Comment 15 Chris Murphy 2019-06-08 21:35:35 UTC
In the working case systemd sees the swap device appear:

[   23.310555] flap.local systemd[1]: dev-disk-by\x2duuid-dcae3053\x2d1cc2\x2d4890\x2da33b\x2d6d71b3dc97df.device: Changed dead -> plugged
[   23.310639] flap.local systemd[1]: dev-disk-by\x2did-dm\x2dname\x2dcryptswap.device: Changed dead -> plugged
[   23.310681] flap.local systemd[1]: dev-mapper-cryptswap.device: Changed dead -> plugged
[   23.310724] flap.local systemd[1]: dev-mapper-cryptswap.device: Job 165 dev-mapper-cryptswap.device/start finished, result=done
[   23.310765] flap.local systemd[1]: Found device /dev/mapper/cryptswap.
[   23.312699] flap.local systemd[1]: dev-disk-by\x2did-dm\x2duuid\x2dCRYPT\x2dPLAIN\x2dcryptswap.device: Changed dead -> plugged
[   23.312769] flap.local systemd[1]: dev-dm\x2d0.device: Changed dead -> plugged
[   23.312814] flap.local systemd[1]: sys-devices-virtual-block-dm\x2d0.device: Changed dead -> plugged
[   23.313594] flap.local systemd[1]: dev-mapper-cryptswap.swap: About to execute: /sbin/swapon -o nofail /dev/mapper/cryptswap
[   23.314123] flap.local systemd[1]: dev-mapper-cryptswap.swap: Forked /sbin/swapon as 710
[   23.314457] flap.local systemd[1]: dev-mapper-cryptswap.swap: Changed dead -> activating
[   23.314565] flap.local systemd[1]: Activating swap /dev/mapper/cryptswap...
[   23.314831] flap.local systemd[710]: dev-mapper-cryptswap.swap: Executing: /sbin/swapon -o nofail /dev/mapper/cryptswap
[   23.322581] flap.local kernel: Adding 8388604k swap on /dev/mapper/cryptswap.  Priority:-2 extents:1 across:8388604k SSFS


Whereas in the non-working case, that doesn't happen, and systemd hangs while waiting for it to appear. And yet in an early debug shell, 'blkid' sees the swap device. I don't know why systemd doesn't see it.

Comment 16 Chris Murphy 2019-06-18 14:49:44 UTC
# udevadm info /dev/mapper/cryptswap
P: /devices/virtual/block/dm-0
N: dm-0
L: 0
S: disk/by-id/dm-name-cryptswap
S: mapper/cryptswap
S: disk/by-id/dm-uuid-CRYPT-PLAIN-cryptswap
E: DEVPATH=/devices/virtual/block/dm-0
E: DEVNAME=/dev/dm-0
E: DEVTYPE=disk
E: MAJOR=253
E: MINOR=0
E: SUBSYSTEM=block
E: USEC_INITIALIZED=23080972
E: DM_UDEV_DISABLE_LIBRARY_FALLBACK_FLAG=1
E: DM_UDEV_PRIMARY_SOURCE_FLAG=1
E: DM_ACTIVATION=1
E: DM_NAME=cryptswap
E: DM_UUID=CRYPT-PLAIN-cryptswap
E: DM_SUSPENDED=0
E: DM_UDEV_RULES_VSN=2
E: SYSTEMD_READY=0
E: DEVLINKS=/dev/disk/by-id/dm-name-cryptswap /dev/mapper/cryptswap
/dev/disk/by-id/dm-uuid-CRYPT-PLAIN-cryptswap
E: TAGS=:systemd:

systemctl list-jobs says dev-mapper-cryptswap.device is running,
everything else is waiting, but when I get a systemctl status on
dev-mapper-cryptswap.device I get

● dev-mapper-cryptswap.device - /dev/mapper/cryptswap
   Loaded: loaded
  Drop-In: /run/systemd/generator/dev-mapper-cryptswap.device.d
           └─90-device-timeout.conf
   Active: inactive (dead)

Jun 18 08:19:41 flap.local systemd[1]: dev-mapper-cryptswap.device:
Merged dev-mapper-cryptswap.device/start into installed job
dev-mapper-cryptswap.device/start as 158
Jun 18 08:19:41 flap.local systemd[1]: dev-mapper-cryptswap.device:
Merged dev-mapper-cryptswap.device/start into installed job
dev-mapper-cryptswap.device/start as 158


I can either 'swapon /dev/mapper/cryptsetup' or 'systemctl stop dev-mapper-cryptswap.device' and startup will continue.

Comment 17 Zbigniew Jędrzejewski-Szmek 2019-08-06 18:55:23 UTC
Impressive debugging!

E: SYSTEMD_READY=0
→ so this is not an issue with udev↔systemd communication. udev thinks that the device
is not ready.

The relevant rules are most likely this:
# /usr/lib/udev/rules.d/99-systemd.rules
# Ignore encrypted devices with no identified superblock on it, since
# we are probably still calling mke2fs or mkswap on it.
SUBSYSTEM=="block", ENV{DM_UUID}=="CRYPT-*", ENV{ID_PART_TABLE_TYPE}=="", ENV{ID_FS_USAGE}=="", ENV{SYSTEMD_READY}="0"

Comment 18 Chris Murphy 2019-08-06 23:46:18 UTC
The transient behavior does make me suspect a race; maybe somtimes the mkswap is not finished, or the kernel or libblkid haven't discovered it before the udev rule to ignore comes into play?

Comment 19 Ben Cotton 2020-04-30 21:24:26 UTC
This message is a reminder that Fedora 30 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 30 on 2020-05-26.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '30'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 30 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 20 Ben Cotton 2020-05-26 16:29:10 UTC
Fedora 30 changed to end-of-life (EOL) status on 2020-05-26. Fedora 30 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.