Bug 759402 - systemd does not activate encrypted swap anymore
Summary: systemd does not activate encrypted swap anymore
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: udev
Version: 18
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: udev-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-12-02 10:14 UTC by udo
Modified: 2013-07-23 20:40 UTC (History)
14 users (show)

Fixed In Version: systemd-37-11.fc16
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-12 12:22:33 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
dmesg etc (132.51 KB, text/plain)
2012-01-31 14:23 UTC, udo
no flags Details
dmesg (243.90 KB, text/plain)
2012-07-17 15:58 UTC, udo
no flags Details

Description udo 2011-12-02 10:14:55 UTC
Description of problem:
systemd does not activate encrypted swap anymore since upgrading to F16

Version-Release number of selected component (if applicable):
systemd-37-3.fc16.x86_64

How reproducible:
do preupgrade upgrade to f16 from f15

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
systemd[1]: Found ordering cycle on local-fs.target/start
systemd[1]: Walked on cycle path to cryptsetup/start
systemd[1]: Walked on cycle path to systemd-random-seed-load.service/start
systemd[1]: Walked on cycle path to local-fs.target/start
systemd[1]: Breaking ordering cycle by deleting job cryptsetup/start

Comment 1 udo 2011-12-02 10:16:21 UTC
[root@recorder LiveTV]# mkswap /dev/sda2
/dev/sda2: Device or resource busy
[root@recorder LiveTV]# fdisk -l /dev/sda

Disk /dev/sda: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000eb881

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *          63      409662      204800   83  Linux
/dev/sda2          409663     4603966     2097152   82  Linux swap / Solaris
/dev/sda3         4603967  3907024064  1951210049   8e  Linux LVM
[root@recorder LiveTV]# lsof|grep /dev/sda2
[root@recorder LiveTV]# fuser /dev/sda2
[root@recorder LiveTV]# free
             total       used       free     shared    buffers     cached
Mem:       3797928    1697672    2100256          0      64032    1263844
-/+ buffers/cache:     369796    3428132
Swap:            0          0          0

Comment 2 Michal Schmidt 2011-12-05 13:18:08 UTC
This ordering loop is fixed upstream and the fix should go into the next systemd update in both F15 and F16.

Comment 3 Fedora Update System 2012-01-11 15:02:41 UTC
systemd-37-6.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/systemd-37-6.fc16

Comment 4 Fedora Update System 2012-01-11 20:58:13 UTC
Package systemd-37-6.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing systemd-37-6.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-0409/systemd-37-6.fc16
then log in and leave karma (feedback).

Comment 5 Bert DeKnuydt 2012-01-12 09:36:02 UTC
Solves the problem, but only when udisks is installed.

Comment 6 Michal Schmidt 2012-01-12 10:11:07 UTC
If udisks makes a difference, you're seeing bug 711394.

Comment 7 Fedora Update System 2012-01-16 02:25:47 UTC
Package systemd-37-7.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing systemd-37-7.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-0409/systemd-37-7.fc16
then log in and leave karma (feedback).

Comment 8 Fedora Update System 2012-01-17 20:23:45 UTC
Package systemd-37-8.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing systemd-37-8.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-0409/systemd-37-8.fc16
then log in and leave karma (feedback).

Comment 9 Fedora Update System 2012-01-22 22:55:01 UTC
Package systemd-37-10.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing systemd-37-10.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-0409/systemd-37-10.fc16
then log in and leave karma (feedback).

Comment 10 Fedora Update System 2012-01-26 22:58:47 UTC
Package systemd-37-11.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing systemd-37-11.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-0409/systemd-37-11.fc16
then log in and leave karma (feedback).

Comment 11 udo 2012-01-29 09:39:09 UTC
(...)
EXT4-fs (dm-2): mounted filesystem with ordered data mode. Opts: (null)
RPC: Registered named UNIX socket transport module.
RPC: Registered udp transport module.
RPC: Registered tcp transport module.
RPC: Registered tcp NFSv4.1 backchannel transport module.
EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: (null)
EXT4-fs (dm-1): mounted filesystem with ordered data mode. Opts: (null)
EXT4-fs (dm-4): mounted filesystem with ordered data mode. Opts: (null)
fedora-storage-init[2745]: Setting up Logical Volume Management:   7 logical volume(s) in volume group "myvg" now active
fedora-storage-init[2745]: [  OK  ]
lvm[2790]: 7 logical volume(s) in volume group "myvg" monitored
systemd-cryptsetup[2798]: Set cipher aes, mode cbc-essiv:sha256, key size 256 bits for device /dev/sdb3.
systemd-cryptsetup[2797]: Set cipher aes, mode cbc-essiv:sha256, key size 256 bits for device /dev/sdc3.
systemd-cryptsetup[2799]: Set cipher aes, mode cbc-essiv:sha256, key size 256 bits for device /dev/sda3.
mkswap[2853]: mkswap: /dev/mapper/swapc: warning: don't erase bootbits sectors
mkswap[2853]: on whole disk. Use -f to force.
mkswap[2871]: mkswap: /dev/mapper/swapa: warning: don't erase bootbits sectors
mkswap[2871]: on whole disk. Use -f to force.
mkswap[2867]: mkswap: /dev/mapper/swapb: warning: don't erase bootbits sectors
mkswap[2867]: on whole disk. Use -f to force.
mkswap[2853]: Setting up swapspace version 1, size = 1036188 KiB
mkswap[2853]: no label, UUID=22062731-57a5-4436-af25-cfcc0085e0e6
mkswap[2871]: Setting up swapspace version 1, size = 1036188 KiB
mkswap[2871]: no label, UUID=c5a8366e-4a78-4bc6-b4da-1ebf16f29857
mkswap[2867]: Setting up swapspace version 1, size = 1036188 KiB
mkswap[2867]: no label, UUID=d5d2a009-8ff2-4bf9-b9c1-3ea57b028379
systemd[1]: Job dev-mapper-swapc.device/start timed out.
systemd[1]: Job dev-mapper-swapc.swap/start failed with result 'dependency'.
systemd[1]: Job dev-mapper-swapc.device/start failed with result 'timeout'.
systemd[1]: Job dev-mapper-swapb.device/start timed out.
systemd[1]: Job dev-mapper-swapb.swap/start failed with result 'dependency'.
systemd[1]: Job dev-mapper-swapb.device/start failed with result 'timeout'.
systemd[1]: Job dev-mapper-swapa.device/start timed out.
systemd[1]: Job dev-mapper-swapa.swap/start failed with result 'dependency'.
systemd[1]: Job dev-mapper-swapa.device/start failed with result 'timeout'.
it87: Found IT8716F chip at 0x290, revision 3
it87: Beeping is supported
(......)
forcedeth 0000:00:06.0: irq 44 for MSI/MSI-X
# free
             total       used       free     shared    buffers     cached
Mem:       4054444    1141932    2912512          0      63728     835544
-/+ buffers/cache:     242660    3811784
Swap:            0          0          0
# rpm -q systemd
systemd-37-11.fc16.x86_64
# swapon -a
# free
             total       used       free     shared    buffers     cached
Mem:       4054444    1148168    2906276          0      63796     841500
-/+ buffers/cache:     242872    3811572
Swap:      3108564          0    3108564

Comment 12 Jóhann B. Guðmundsson 2012-01-29 12:45:31 UTC
I need you to boot with 'log_buf_len=1M systemd.log_level=debug
systemd.log_target=kmsg' then save the output of the dmesg command and attach
it here.

You will also need to attach /etc/crypttab along with /etc/fstab so we can properly diagnose this issue.

Thanks.

Comment 13 Fedora Update System 2012-01-30 20:59:35 UTC
systemd-37-11.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 14 udo 2012-01-31 14:23:25 UTC
Created attachment 558630 [details]
dmesg etc

Comment 15 Michal Schmidt 2012-01-31 15:39:22 UTC
It's what was suggested in comment #5 and #6.
systemd is not getting notified about the /dev/mapper/swap* devices properly. You can check with "udevadm info -q all -n /dev/mapper/swapa" that they are probably marked with SYSTEMD_READY=0.
As a workaround, installing the udisks package still works.

*** This bug has been marked as a duplicate of bug 711394 ***

Comment 16 udo 2012-02-01 14:37:58 UTC
# udevadm info -q all -n /dev/mapper/swapa
P: /devices/virtual/block/dm-9
N: dm-9
S: mapper/swapa
S: disk/by-id/dm-name-swapa
S: disk/by-id/dm-uuid-CRYPT-PLAIN-swapa
E: UDEV_LOG=3
E: DEVPATH=/devices/virtual/block/dm-9
E: MAJOR=253
E: MINOR=9
E: DEVNAME=/dev/dm-9
E: DEVTYPE=disk
E: SUBSYSTEM=block
E: DM_SBIN_PATH=/sbin
E: DM_UDEV_PRIMARY_SOURCE_FLAG=1
E: DM_NAME=swapa
E: DM_UUID=CRYPT-PLAIN-swapa
E: DM_SUSPENDED=0
E: DM_UDEV_RULES_VSN=2
E: MPATH_SBIN_PATH=/sbin
E: UDISKS_PRESENTATION_NOPOLICY=1
E: UDISKS_DM_TARGETS_COUNT=1
E: UDISKS_DM_TARGETS_TYPE=crypt
E: UDISKS_DM_TARGETS_START=0
E: UDISKS_DM_TARGETS_LENGTH=2072385
E: SYSTEMD_READY=0
E: DEVLINKS=/dev/mapper/swapa /dev/disk/by-id/dm-name-swapa /dev/disk/by-id/dm-uuid-CRYPT-PLAIN-swapa
E: TAGS=:systemd:

# yum install udisks
(....)
Setting up Install Process
Package udisks-1.0.4-3.fc16.x86_64 already installed and latest version
Nothing to do
# free
             total       used       free     shared    buffers     cached
Mem:       4054584     455300    3599284          0      28704     241924
-/+ buffers/cache:     184672    3869912
Swap:            0          0          0
# swapon -a
# free
             total       used       free     shared    buffers     cached
Mem:       4054584     456588    3597996          0      28720     241972
-/+ buffers/cache:     185896    3868688
Swap:      3108564          0    3108564


So the workaround does NOT work.
Now what?
That other bug does not really show progress.

Comment 17 udo 2012-02-01 14:39:41 UTC
Even worse: the disks in my other box, this one, do not have the SYSTEMD_READY=0.
So what is happening?

Comment 18 Michal Schmidt 2012-02-01 14:57:18 UTC
(In reply to comment #16)
> # udevadm info -q all -n /dev/mapper/swapa
...
> E: SYSTEMD_READY=0

So yes, this one is bug 711394.

> # yum install udisks
> (....)
> Setting up Install Process
> Package udisks-1.0.4-3.fc16.x86_64 already installed and latest version
> Nothing to do
...
> So the workaround does NOT work.

Well, that's unfortunate. On my system it really helps. Apparently there is some other interaction going on in the udev rules.

> Now what?
> That other bug does not really show progress.

Nevertheless, the other bug is what needs to be resolved. There's not much to do on the systemd side until udev gives correct and up-to-date information to systemd.

(In reply to comment #17)
> Even worse: the disks in my other box, this one, do not have the
> SYSTEMD_READY=0.
> So what is happening?

I'd need the same logs from the other box.

Comment 19 udo 2012-04-10 14:49:02 UTC
# free
             total       used       free     shared    buffers     cached
Mem:       4053584    3869076     184508          0      32740    1666724
-/+ buffers/cache:    2169612    1883972
Swap:      2072376      38192    2034184
# swapon -a
# free
             total       used       free     shared    buffers     cached
Mem:       4053584    3870420     183164          0      32780    1667572
-/+ buffers/cache:    2170068    1883516
Swap:      4144752      38192    4106560

Comment 20 Michal Schmidt 2012-04-17 16:45:56 UTC
[Un-duplicating, because the conclusion of bug 711394 did not fix this one bug.]

Please provide the output of "udevadm info -q all -n /dev/mapper/swapa" after booting with systemd-37-19.fc16.
A new dmesg output after booting with "log_buf_len=1M systemd.log_level=debug systemd.log_target=kmsg" would be useful too.

Comment 21 udo 2012-04-18 02:37:19 UTC
# udevadm info -q all -n /dev/mapper/swapa
P: /devices/virtual/block/dm-9
N: dm-9
S: mapper/swapa
S: disk/by-id/dm-name-swapa
S: disk/by-id/dm-uuid-CRYPT-PLAIN-swapa
S: disk/by-uuid/5cc66212-279a-47eb-91b2-186d9ba89511
E: UDEV_LOG=3
E: DEVPATH=/devices/virtual/block/dm-9
E: MAJOR=253
E: MINOR=9
E: DEVNAME=/dev/dm-9
E: DEVTYPE=disk
E: SUBSYSTEM=block
E: DM_SBIN_PATH=/sbin
E: DM_UDEV_PRIMARY_SOURCE_FLAG=1
E: DM_UDEV_RULES_VSN=2
E: DM_NAME=swapa
E: DM_UUID=CRYPT-PLAIN-swapa
E: DM_SUSPENDED=0
E: ID_FS_UUID=5cc66212-279a-47eb-91b2-186d9ba89511
E: ID_FS_UUID_ENC=5cc66212-279a-47eb-91b2-186d9ba89511
E: ID_FS_VERSION=2
E: ID_FS_TYPE=swap
E: ID_FS_USAGE=other
E: MPATH_SBIN_PATH=/sbin
E: UDISKS_PRESENTATION_NOPOLICY=1
E: UDISKS_DM_TARGETS_COUNT=1
E: UDISKS_DM_TARGETS_TYPE=crypt
E: UDISKS_DM_TARGETS_START=0
E: UDISKS_DM_TARGETS_LENGTH=2072385
E: DEVLINKS=/dev/mapper/swapa /dev/disk/by-id/dm-name-swapa /dev/disk/by-id/dm-uuid-CRYPT-PLAIN-swapa /dev/disk/by-uuid/5cc66212-279a-47eb-91b2-186d9ba89511
E: TAGS=:systemd:

Comment 22 Michal Schmidt 2012-04-18 12:31:49 UTC
This is an improvement. udev now has the expected information about the device.

To find out more, I will need the dmesg output after booting with "log_buf_len=1M systemd.log_level=debug systemd.log_target=kmsg".

Comment 23 Stefan Krüger 2012-04-23 06:28:31 UTC
If you want to do some housekeeping, please make bug 773089 a duplicate of this one.

Comment 24 Michal Schmidt 2012-04-23 11:26:10 UTC
Bug 773089 showed symptoms of more than one bug. It's already closed as a duplicate of one of them. For me it does not really matter which one.

A more important thing here is that we need the information described in comment #22 after booting with systemd-37-19.fc16.

Comment 25 udo 2012-04-28 13:21:32 UTC
Booting 3.3.4 did not have the swap problem. Either with or without debugging.

Comment 26 Stefan Krüger 2012-04-28 15:29:36 UTC
Linux kernel version 3.3.4 solves this problem?!

Comment 27 udo 2012-04-28 16:35:17 UTC
Well, yes, no.
I just booted into 3.3.4 and noticed that swap was on and that there was no delay during the boot.
So I disabled the debug stuff and rebooted into 3.3.4.
Same thing without debugging.
# rpm -q systemd
systemd-37-19.fc16.x86_64

So that didn't change.
I didn't reconfigure much, I only uninstalled some unrelated rpms.

So I am not 100% sure but so far it works for me.
Swaps are on partitions here.
Enabled in /etc/crypttab as encrypted.

Comment 28 Stefan Krüger 2012-04-29 10:44:23 UTC
I can confirm that systemd-37-19 fixes this issue.
Encrypted swap is enabled on boot, as it should be.

Thanks.

Comment 29 udo 2012-04-30 12:16:21 UTC
Except for when e.g. my /home is fsck'd, then I see swap-repated timeouts and no swap. `swapon -a` afterwards is the workaround.
In the old days this was not a dealbreaker.

The fsck info (oprphaned inodes etc) is printed twice, semi-interleaved, at least for part of the fsck info. 

Also fedora-storage-init-late.service failed to start for some weird reason.

Comment 30 udo 2012-04-30 12:18:21 UTC
swap-reLated timeouts, ouf course.

Comment 31 udo 2012-06-24 15:40:46 UTC
The issue of comment 29 does still happen.
I see timeouts on the swap device and crypsetup etc.
How can this be when swaps are on separate partitions?
Why then do swaps last in row if they are timeout-sensitive?
Why at all are they timeout sensitive?
In the old SysV style startup we did not have issues like this. (!)
If we need timouts because logic can' be depended upon, how did SysV init work?

Now everytime my box didn't shut down ultraclean all filesystems are recovered without issue yet swap isn't activated for some bogus reason or rather cause.

Comment 32 udo 2012-07-01 12:33:24 UTC
OTOH, on another box, I see weird stuff like:

May 13 03:20:44 epia systemd-cryptsetup[28493]: Set cipher aes, mode cbc-essiv:sha256, key size 256 bits for device /dev/sda9.
May 13 03:20:44 epia systemd-cryptsetup[28493]: Failed to activate: Device or resource busy
May 13 03:20:44 epia systemd[1]: cryptsetup: main process exited, code=exited, status=1
May 13 03:20:44 epia systemd[1]: Job dev-mapper-swap.device/start failed with result 'dependency'.
May 13 03:20:44 epia systemd[1]: Unit cryptsetup entered failed state.
May 13 03:25:44 epia systemd-cryptsetup[28603]: Set cipher aes, mode cbc-essiv:sha256, key size 256 bits for device /dev/sda9.
May 13 03:25:44 epia systemd-cryptsetup[28603]: Failed to activate: Device or resource busy
May 13 03:25:44 epia systemd[1]: cryptsetup: main process exited, code=exited, status=1
May 13 03:25:44 epia systemd[1]: Job dev-mapper-swap.device/start failed with result 'dependency'.
May 13 03:25:44 epia systemd[1]: Unit cryptsetup entered failed state.
May 13 03:30:44 epia systemd-cryptsetup[28732]: Set cipher aes, mode cbc-essiv:sha256, key size 256 bits for device /dev/sda9.
May 13 03:30:44 epia systemd-cryptsetup[28732]: Failed to activate: Device or resource busy
May 13 03:30:44 epia systemd[1]: cryptsetup: main process exited, code=exited, status=1
May 13 03:30:44 epia systemd[1]: Job dev-mapper-swap.device/start failed with result 'dependency'.
May 13 03:30:44 epia systemd[1]: Unit cryptsetup entered failed state.
May 13 03:35:44 epia systemd-cryptsetup[28891]: Set cipher aes, mode cbc-essiv:sha256, key size 256 bits for device /dev/sda9.
May 13 03:35:44 epia systemd-cryptsetup[28891]: Failed to activate: Device or resource busy
May 13 03:35:44 epia systemd[1]: cryptsetup: main process exited, code=exited, status=1
May 13 03:35:44 epia systemd[1]: Job dev-mapper-swap.device/start failed with result 'dependency'.
May 13 03:35:44 epia systemd[1]: Unit cryptsetup entered failed state.
May 13 03:40:44 epia systemd-cryptsetup[29018]: Set cipher aes, mode cbc-essiv:sha256, key size 256 bits for device /dev/sda9.
May 13 03:40:44 epia systemd-cryptsetup[29018]: Failed to activate: Device or resource busy
May 13 03:40:44 epia systemd[1]: cryptsetup: main process exited, code=exited, status=1
May 13 03:40:44 epia systemd[1]: Job dev-mapper-swap.device/start failed with result 'dependency'.
May 13 03:40:44 epia systemd[1]: Unit cryptsetup entered failed state.
May 13 03:45:44 epia systemd-cryptsetup[29130]: Set cipher aes, mode cbc-essiv:sha256, key size 256 bits for device /dev/sda9.
May 13 03:45:44 epia systemd-cryptsetup[29130]: Failed to activate: Device or resource busy
May 13 03:45:44 epia systemd[1]: cryptsetup: main process exited, code=exited, status=1
May 13 03:45:44 epia systemd[1]: Job dev-mapper-swap.device/start failed with result 'dependency'.
May 13 03:45:44 epia systemd[1]: Unit cryptsetup entered failed state.
May 13 03:50:44 epia systemd-cryptsetup[29430]: Set cipher aes, mode cbc-essiv:sha256, key size 256 bits for device /dev/sda9.
May 13 03:50:44 epia systemd-cryptsetup[29430]: Failed to activate: Device or resource busy
May 13 03:50:44 epia systemd[1]: cryptsetup: main process exited, code=exited, status=1
May 13 03:50:44 epia systemd[1]: Job dev-mapper-swap.device/start failed with result 'dependency'.
May 13 03:50:44 epia systemd[1]: Unit cryptsetup entered failed state.


I.e.: every five minutes it attempts to activate swap that is already active.
How come? Why?
And why does systemd simply quit the swap job on the other box? (this very one)
When can we see a solid design that does not `forget` to fix the swap?

Comment 33 udo 2012-07-12 16:08:16 UTC
Slightly changed behaviour noticed.
The system was shut down cleanly.
System comes up.
One or two swaps are activated OK.
Then the system sits for a while after printing the line:
Started Recreate Volatile Files and Directories (with no real disk activity)

Then the screen starts scrolling again and I see some errors flash by about the other swaps.
When I can open a root terminal I can simply do `swapon -a`.

So why is doing this basic task still an issue?
It has been half a year and it appears the basic design for activating swap is fatally flawed.
Please fix so I can boot my systems without having to think about swap anymore (like it used to be).

Comment 34 udo 2012-07-17 15:58:44 UTC
Created attachment 598678 [details]
dmesg

Comment 35 udo 2012-07-17 16:04:50 UTC
Essential lines from the dmesg:

systemd[1]: cryptsetup: main process exited, code=exited, status=0
(...)
systemd[1]: Job cryptsetup/start finished, result=done
(...)

systemd[1]: Job dev-mapper-swapb.device/start timed out.
systemd[1]: Job dev-mapper-swapb.device/start finished, result=timeout
systemd[1]: Job dev-mapper-swapb.swap/start finished, result=dependency
systemd[1]: Job dev-mapper-swapb.swap/start failed with result 'dependency'.
systemd[1]: Job dev-mapper-swapb.device/start failed with result 'timeout'.
systemd[1]: swap.target changed dead -> active
systemd[1]: Job swap.target/start finished, result=done

- How can it time out? The swap was not busy or anything
- Cryptsetup did exit OK so why didn't it work?
- How can swap target be done when we have a failure?
- How can I be sure this doesn't happen when there's only one swap?
- How can I be sure the fsck that was running during part of this boot was not a dependency?

Comment 36 udo 2012-07-17 16:20:51 UTC
systemd[1]: dev-mapper-swapa.device changed dead -> plugged
systemd[1]: Job dev-mapper-swapa.device/start finished, result=done
systemd[1]: dev-disk-by\x2duuid-889d6286\x2dacfe\x2d478e\x2db3e4\x2d410170b65ed2.device changed dead -> plugged
systemd[1]: dev-disk-by\x2did-dm\x2duuid\x2dCRYPT\x2dPLAIN\x2dswapa.device changed dead -> plugged
systemd[1]: dev-disk-by\x2did-dm\x2dname\x2dswapa.device changed dead -> plugged
systemd[1]: dev-dm\x2d12.device changed dead -> plugged
systemd[1]: sys-devices-virtual-block-dm\x2d12.device changed dead -> plugged
systemd[1]: About to execute: /sbin/swapon -p 0 /dev/mapper/swapa

This phase does not happen for swapb in the attached dmesg of 3.4.5.
This explains why the workaround of `swapon -a` works.

Comment 37 udo 2012-08-04 15:00:47 UTC
One again, we boot into 3.4.7, systemd fails to fulfill one of its main promises: quick boot.
We had a clean shutdown, clean filesystems.
We see two out of four encrypted swaps activated.
More or less as in the comments 34, 35 and 36.
And then the system sits.
It waits for some timeout for some reason instead of doing the work.

Please see https://bugzilla.redhat.com/show_bug.cgi?id=821254 for another bug that makes one wonder what is going on. (why does systemd ever need to kill a process when not shutting down?)

Same here. Why does this simple task time out and still not work well?
The old init was reliable and did what it needed to do.
And perhaps even quicker than the current waiting period due to this bug.

Should I sponsor you for a quicker solution?
Do you need more logging?
Or can I `downgrade` to something that works?

Comment 38 udo 2012-09-27 12:44:31 UTC
Any progress here?
What else besides the log, mentioned in comment 34, do you need to actually fix this issue?

Comment 39 udo 2012-09-30 09:26:43 UTC
And it happened again!

[udo@surfplank2 ~]$ free
             total       used       free     shared    buffers     cached
Mem:       4053436     958284    3095152          0      74536     368872
-/+ buffers/cache:     514876    3538560
Swap:      3108564          0    3108564
[udo@surfplank2 ~]$ su -
Password: 
[root@surfplank2 ~]# swapon -a
[root@surfplank2 ~]# free
             total       used       free     shared    buffers     cached
Mem:       4053436    2240060    1813376          0     101140     922924
-/+ buffers/cache:    1215996    2837440
Swap:      4144752          0    4144752

Same happenings as in the previous log.
When even virtual memory is not started properly anymore, a basic function, then what my friend is more important?

Comment 40 udo 2012-10-08 07:45:45 UTC
To be clear:
This issue happens on one box with 4 disks and thus four encrypted swaps but also on another box with three disks and thus three encrypted swaps.
Each time we get N minus one swaps activated.
Can we fix this issue?

Comment 41 Michal Schmidt 2012-10-16 12:34:49 UTC
Possibly related patch:
http://lists.freedesktop.org/archives/systemd-devel/2012-October/006895.html

Comment 42 udo 2012-10-16 14:54:59 UTC
3.6.x?

Comment 43 udo 2012-10-22 09:16:01 UTC
After booting into 3.6.2 the swap-related delay is gone.
All swaps appear to be mounted OK.
I need to test some more but the signs are favourable.

Comment 44 udo 2012-10-25 09:37:49 UTC
After rebooting this morning to activate 3.6.3 patched for the `recent` ext4 corruption bug only 3 out of 4 swaps were activated.

Comment 45 Lennart Poettering 2013-01-14 21:33:04 UTC
Does this happen only on LVM systems? Or also non-LVM systems?

Comment 46 udo 2013-01-15 13:57:07 UTC
Over here the swaps are on partitions. /boot is also a partition. The rest is lvm or more complex.

Comment 47 udo 2013-04-03 14:15:54 UTC
Any updates?
Sometimes it works. Most often not.
In the past few days, while coping with mesa git bug(s) I had a few chances to observe the problem of not activating all encrypted swaps.

Comment 48 Fedora End Of Life 2013-04-03 19:45:32 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19

Comment 49 udo 2013-04-09 15:01:17 UTC
[root@surfplank2 ~]# free
             total       used       free     shared    buffers     cached
Mem:       7289328    1222888    6066440          0      57372     620624
-/+ buffers/cache:     544892    6744436
Swap:      1023996          0    1023996
[root@surfplank2 ~]# swapon -a
[root@surfplank2 ~]# free
             total       used       free     shared    buffers     cached
Mem:       7289328    1272928    6016400          0      57408     620776
-/+ buffers/cache:     594744    6694584
Swap:      4095984          0    4095984


So 3 out of 4 swaps failed.
This caused a considerable delay when compared against a 'normal', speedy boot.
This situation negates any progress made by systemd.
Please fix!

Comment 50 Harald Hoyer 2013-04-10 15:06:30 UTC
can you provide your /etc/crypttab and /etc/fstab?

Comment 51 udo 2013-04-10 15:10:25 UTC
# cat /etc/fstab 
#
# /etc/fstab
# Created by anaconda on Tue Dec  2 15:33:03 2008
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or vol_id(8) for more info
#
/dev/myvg/rootlv     /                       ext4    defaults,noatime,nodiratime        1 1
/dev/myvg/usrlv         /usr                    ext4    defaults,noatime,nodiratime,nodev        1 2
/dev/myvg/varlv         /var                    ext4    defaults,noatime,nodiratime,nodev,nosuid        1 2
/dev/myvg/homelv		/home		ext4	defaults,noatime,nodiratime,nodev	1 2
UUID="1b97d8cb-9875-49cd-b2be-0ce334c1b125"	/boot	ext2    defaults,noatime,nodiratime,nodev,nosuid,noacl        1 2
tmpfs                   /dev/shm                tmpfs   rw,nodev,nosuid,noexec        0 0
devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
sysfs                   /sys                    sysfs   defaults        0 0
proc                    /proc                   proc    defaults        0 0
/dev/mapper/swapa	swap                    swap    defaults        0 0
/dev/mapper/swapb	swap                    swap    defaults        0 0
/dev/mapper/swapc	swap                    swap    defaults        0 0
/dev/mapper/swapd	swap                    swap    defaults        0 0
/dev/mapper/myvg-datalv	/usr/local		ext4	defaults,noatime,nodiratime,nodev	1 2
/dev/mapper/myvg-srclv	/usr/src		ext4	defaults,noatime,nodiratime,nodev	1 2
/dev/mapper/myvg-wwwlv	/var/www		ext4	defaults,noatime,nodiratime,nodev	1 2
/dev/mapper/myvg-optlv	/opt			ext4	defaults,noatime,nodiratime,nodev	1 2
tmpfs			/tmp                    tmpfs   defaults        0 0

# cat /etc/crypttab 
crypto UUID=05832112-77c2-4f9a-8215-c6b454c81422 none
swapa    /dev/sda4       /dev/urandom    swap,cipher=aes-cbc-essiv:sha256,size=256
swapb    /dev/sdb4       /dev/urandom    swap,cipher=aes-cbc-essiv:sha256,size=256
swapc    /dev/sdc4       /dev/urandom    swap,cipher=aes-cbc-essiv:sha256,size=256
swapd    /dev/sdd4       /dev/urandom    swap,cipher=aes-cbc-essiv:sha256,size=256

Comment 52 udo 2013-04-26 17:07:51 UTC
Any progress? Anything that I can help with?

Comment 53 Michal Schmidt 2013-05-09 14:48:31 UTC
Reading this BZ can be quite confusing, so I'll summarize:

- There were more bugs than one, conflating the observed issue. The original
  comment was about an ordering loop (resolved). Later comments were about udev
  and SYSTEMD_READY=0 (resolved in a different BZ). Subsequent comments are
  about the device units timing out for some other reason.

- The comments for the current issue start at about comment #29.

- There are debug logs in comment #34. Unfortunately they are from a run on
  a custom kernel without printk timestamps.

- I've been unable to reproduce the problem, until I focused on the mentions
  of fsck. I replaced /usr/sbin/fsck with a shell script that sleeps long
  enough to let the default device unit timeout trigger (90 s). I was able
  to reproduce the timing out of dev-mapper-swap*.device units when I let the
  fake fsck sleep if it was asked to check /var. There is a relevant ordering:
  var.mount, systemd-random-seed-load.service, systemd-cryptsetup@*.service.

Since upstream commit 6839500 "cryptsetup-generator: add JobTimeoutSec=0 for the decrypted crypt devices" (in v201) the timeout for the cryptsetup-created devices is infinite, so I cannot reproduce the problem anymore.

Are you still able to reproduce the problem with systemd >= 201? Current Rawhide and F19 have new enough systemd. For F18 systemd 201 is in updates-testing.

If you can still reproduce it, please attach a new dmesg output after booting with "log_buf_len=1M systemd.log_level=debug systemd.log_target=kmsg". Please make sure you use a Fedora kernel while producing this log. Based on past experience, I am not spending any efforts to investigate bugs with custom kernels.

Comment 54 udo 2013-05-09 15:08:27 UTC
We are using systemd-44-24.fc17.x86_64.
Fedora 19 does not appear on http://fedoraproject.org/get-fedora.
So I assume that Fedora 18 is current and that Fedora 17 is supported.
See https://fedoraproject.org/wiki/Fedora_Release_Life_Cycle?rd=LifeCycle#Maintenance_Schedule for explanation; see https://fedoraproject.org/wiki/Schedule for planning.
Latest systemd is from Fri Feb 15 2013. 
Fedora 19 is going in July 2013 if not later.
Now is May 9th, 2013. 
Please give support by producing a systemd for F17 with said commit so I can verify the issue is gone.

I generally do not use Fedora kernels. I build my own with source from kernel.org with very minor patching.

This morning I booted from 3.8.10 into 3.9.1 and saw no fsck; still only two swaps were activated.
I had to do a swapon -a to get them all activated.

Comment 55 Jóhann B. Guðmundsson 2013-05-09 22:34:40 UTC
(In reply to comment #54)
> We are using systemd-44-24.fc17.x86_64.
> Fedora 19 does not appear on http://fedoraproject.org/get-fedora.
> So I assume that Fedora 18 is current and that Fedora 17 is supported.
> See
> https://fedoraproject.org/wiki/
> Fedora_Release_Life_Cycle?rd=LifeCycle#Maintenance_Schedule for explanation;
> see https://fedoraproject.org/wiki/Schedule for planning.
> Latest systemd is from Fri Feb 15 2013. 
> Fedora 19 is going in July 2013 if not later.
> Now is May 9th, 2013. 
> Please give support by producing a systemd for F17 with said commit so I can
> verify the issue is gone.
> 
> I generally do not use Fedora kernels. I build my own with source from
> kernel.org with very minor patching.
> 
> This morning I booted from 3.8.10 into 3.9.1 and saw no fsck; still only two
> swaps were activated.
> I had to do a swapon -a to get them all activated.

Using your own custom kernel is not "supported" and there is not such thing as something being "supported" in Fedora it always just boils down to "best effort" which is dictated by the "free" time and the amount of effort it takes to maintain given component since this is an community driven distribution. If you want "support" buy RHEL or some other "supported" distribution so please provide the information using official fedora kernel as Michal requests or backport the relevant patch your self and test.

Comment 56 Michal Schmidt 2013-05-10 23:15:55 UTC
(In reply to comment #54)
Why did you move this BZ to Rawhide (twice!) if you're only willing to test on F17?

I'm not having the "support" discussion with you again.


Anyway, backporting the patch is not trivial, because it has dependencies on other patches (e.g., it relies on the functionality of *.d drop-in directories).

But as a quick&dirty hack for testing it would suffice to comment out this line from F17's src/core/device.c:device_init() :
        UNIT(d)->job_timeout = DEFAULT_TIMEOUT_USEC;
This would result in infinite job timeouts for device units by default. It is not ideal, but it would only be troublesome in the case of missing devices, which is an error case anyway.

You can try rebuilding the package locally with the change. I cannot build in Koji right now.

Comment 57 udo 2013-05-11 03:58:13 UTC
This bug has been open for a while.
The normal routine is that bugs are closed when the previous Fedora version get no love anymore.
I do think that process is sub optimal. Rawhide is the workaround for the auto-erase function of bugzilla.

Fedora 17 mostly works for me. (tm)
Fedora 18 got some bad reviews.
Fedora 19 is not out yet.

Thanks for the quick hack.
I will have a look into that after this weekend. (I will be AFK)

Comment 58 udo 2013-05-12 06:57:00 UTC
(In reply to comment #56)
> But as a quick&dirty hack for testing it would suffice to comment out this
> line from F17's src/core/device.c:device_init() :
>         UNIT(d)->job_timeout = DEFAULT_TIMEOUT_USEC;
> This would result in infinite job timeouts for device units by default.

How would this help in case of a clean shutdown, no fsck on the bootup?
I've seen swaps failing to be enabled in such conditions.

Comment 59 Michal Schmidt 2013-05-13 07:38:56 UTC
(In reply to comment #58)

I do not know for sure what causes the devices to timeout. There are two possibilities:
 1. The devices would be eventually discovered. In this case the unlimited
    timeout would help.
 2. For some reason the swap devices will not be discovered by systemd. In this
    case the unlimited timeout will likely cause the boot to get stuck.
    However, this situation would be ideal for further investigation.
    It would help to enable debug-shell.service before starting the test.
    (http://freedesktop.org/wiki/Software/systemd/Debugging#Early_Debug_Shell)
    If the boot gets stuck, you could then switch to tty9 and use
    "udevadm info ..." on the decrypted swap devices to look for any hints about
    why some of them are visible to systemd and the others are not.

Comment 60 udo 2013-05-18 16:50:04 UTC
1 might not be valid as a simple `swapon -a` turns on the swaps after a 'failed' action to do so. This implies that the partitions were there, that the encryption was active and that mkswap did something.
2 harder to say something about that. as the devices are all treated by mkswap I can not see why this would happen.
Below is a dump of what udev thinks of the swaps in a normal (perhaps corrected) state.

# for a in 9 10 11 12; do udevadm info --query=all --name=dm-$a; done
P: /devices/virtual/block/dm-9
N: dm-9
S: disk/by-id/dm-name-swapb
S: disk/by-id/dm-uuid-CRYPT-PLAIN-swapb
S: disk/by-uuid/b23ebd24-ec54-4389-8cbc-8dcf92f86295
S: mapper/swapb
E: DEVLINKS=/dev/disk/by-id/dm-name-swapb /dev/disk/by-id/dm-uuid-CRYPT-PLAIN-swapb /dev/disk/by-uuid/b23ebd24-ec54-4389-8cbc-8dcf92f86295 /dev/mapper/swapb
E: DEVNAME=/dev/dm-9
E: DEVPATH=/devices/virtual/block/dm-9
E: DEVTYPE=disk
E: DM_NAME=swapb
E: DM_SBIN_PATH=/sbin
E: DM_SUSPENDED=0
E: DM_UDEV_PRIMARY_SOURCE_FLAG=1
E: DM_UDEV_RULES_VSN=2
E: DM_UUID=CRYPT-PLAIN-swapb
E: ID_FS_TYPE=swap
E: ID_FS_USAGE=other
E: ID_FS_UUID=b23ebd24-ec54-4389-8cbc-8dcf92f86295
E: ID_FS_UUID_ENC=b23ebd24-ec54-4389-8cbc-8dcf92f86295
E: ID_FS_VERSION=2
E: MAJOR=253
E: MINOR=9
E: MPATH_SBIN_PATH=/sbin
E: SUBSYSTEM=block
E: TAGS=:systemd:
E: UDISKS_DM_TARGETS_COUNT=1
E: UDISKS_DM_TARGETS_LENGTH=2048000
E: UDISKS_DM_TARGETS_START=0
E: UDISKS_DM_TARGETS_TYPE=crypt
E: UDISKS_PRESENTATION_NOPOLICY=1
E: USEC_INITIALIZED=13925606

P: /devices/virtual/block/dm-10
N: dm-10
S: disk/by-id/dm-name-swapd
S: disk/by-id/dm-uuid-CRYPT-PLAIN-swapd
S: disk/by-uuid/40aa2cda-c96e-4ed8-add0-e02ce7cb2d1b
S: mapper/swapd
E: DEVLINKS=/dev/disk/by-id/dm-name-swapd /dev/disk/by-id/dm-uuid-CRYPT-PLAIN-swapd /dev/disk/by-uuid/40aa2cda-c96e-4ed8-add0-e02ce7cb2d1b /dev/mapper/swapd
E: DEVNAME=/dev/dm-10
E: DEVPATH=/devices/virtual/block/dm-10
E: DEVTYPE=disk
E: DM_NAME=swapd
E: DM_SBIN_PATH=/sbin
E: DM_SUSPENDED=0
E: DM_UDEV_PRIMARY_SOURCE_FLAG=1
E: DM_UDEV_RULES_VSN=2
E: DM_UUID=CRYPT-PLAIN-swapd
E: ID_FS_TYPE=swap
E: ID_FS_USAGE=other
E: ID_FS_UUID=40aa2cda-c96e-4ed8-add0-e02ce7cb2d1b
E: ID_FS_UUID_ENC=40aa2cda-c96e-4ed8-add0-e02ce7cb2d1b
E: ID_FS_VERSION=2
E: MAJOR=253
E: MINOR=10
E: MPATH_SBIN_PATH=/sbin
E: SUBSYSTEM=block
E: TAGS=:systemd:
E: UDISKS_DM_TARGETS_COUNT=1
E: UDISKS_DM_TARGETS_LENGTH=2048000
E: UDISKS_DM_TARGETS_START=0
E: UDISKS_DM_TARGETS_TYPE=crypt
E: UDISKS_PRESENTATION_NOPOLICY=1
E: USEC_INITIALIZED=13926599

P: /devices/virtual/block/dm-11
N: dm-11
S: disk/by-id/dm-name-swapc
S: disk/by-id/dm-uuid-CRYPT-PLAIN-swapc
S: mapper/swapc
E: DEVLINKS=/dev/disk/by-id/dm-name-swapc /dev/disk/by-id/dm-uuid-CRYPT-PLAIN-swapc /dev/mapper/swapc
E: DEVNAME=/dev/dm-11
E: DEVPATH=/devices/virtual/block/dm-11
E: DEVTYPE=disk
E: DM_NAME=swapc
E: DM_SBIN_PATH=/sbin
E: DM_SUSPENDED=0
E: DM_UDEV_PRIMARY_SOURCE_FLAG=1
E: DM_UDEV_RULES_VSN=2
E: DM_UUID=CRYPT-PLAIN-swapc
E: MAJOR=253
E: MINOR=11
E: MPATH_SBIN_PATH=/sbin
E: SUBSYSTEM=block
E: SYSTEMD_READY=0
E: TAGS=:systemd:
E: UDISKS_DM_TARGETS_COUNT=1
E: UDISKS_DM_TARGETS_LENGTH=2048000
E: UDISKS_DM_TARGETS_START=0
E: UDISKS_DM_TARGETS_TYPE=crypt
E: UDISKS_PRESENTATION_NOPOLICY=1
E: USEC_INITIALIZED=13929348

P: /devices/virtual/block/dm-12
N: dm-12
S: disk/by-id/dm-name-swapa
S: disk/by-id/dm-uuid-CRYPT-PLAIN-swapa
S: disk/by-uuid/e70e4e23-c7a8-477d-97af-cd8d5139510b
S: mapper/swapa
E: DEVLINKS=/dev/disk/by-id/dm-name-swapa /dev/disk/by-id/dm-uuid-CRYPT-PLAIN-swapa /dev/disk/by-uuid/e70e4e23-c7a8-477d-97af-cd8d5139510b /dev/mapper/swapa
E: DEVNAME=/dev/dm-12
E: DEVPATH=/devices/virtual/block/dm-12
E: DEVTYPE=disk
E: DM_NAME=swapa
E: DM_SBIN_PATH=/sbin
E: DM_SUSPENDED=0
E: DM_UDEV_PRIMARY_SOURCE_FLAG=1
E: DM_UDEV_RULES_VSN=2
E: DM_UUID=CRYPT-PLAIN-swapa
E: ID_FS_TYPE=swap
E: ID_FS_USAGE=other
E: ID_FS_UUID=e70e4e23-c7a8-477d-97af-cd8d5139510b
E: ID_FS_UUID_ENC=e70e4e23-c7a8-477d-97af-cd8d5139510b
E: ID_FS_VERSION=2
E: MAJOR=253
E: MINOR=12
E: MPATH_SBIN_PATH=/sbin
E: SUBSYSTEM=block
E: TAGS=:systemd:
E: UDISKS_DM_TARGETS_COUNT=1
E: UDISKS_DM_TARGETS_LENGTH=2048000
E: UDISKS_DM_TARGETS_START=0
E: UDISKS_DM_TARGETS_TYPE=crypt
E: UDISKS_PRESENTATION_NOPOLICY=1
E: USEC_INITIALIZED=13925982

Comment 61 udo 2013-05-21 15:33:05 UTC
This bug even occurs on systems with a single encrypted swap partition.

Comment 62 Michal Schmidt 2013-05-22 11:19:53 UTC
(In reply to udo from comment #60)
> ...
> P: /devices/virtual/block/dm-11
> ...
> E: SYSTEMD_READY=0
> E: TAGS=:systemd:
> ...

The output for this device is different from the others. It is missing the ID_FS_* values and is marked as not ready for systemd.
It's as if the udev watch was not working reliably.

I wonder what these commands would show for the device while it's in this state:
blkid /dev/dm-11
blkid -c /dev/null /dev/dm-11

Comment 63 udo 2013-05-22 13:35:39 UTC
# blkid /dev/dm-11
/dev/dm-11: UUID="a178f079-6b55-44b9-a4ca-cd55676a37f3" TYPE="swap" 
# blkid -c /dev/null /dev/dm-11
/dev/dm-11: UUID="a178f079-6b55-44b9-a4ca-cd55676a37f3" TYPE="swap" 
# 

(as in the previous output I gave, with all up and running)

Comment 64 Milan Broz 2013-05-22 14:38:17 UTC
Michal, just hint here :)

mkswap is run from systemd (see crypttab) every time device is activated (see random key attribute). The problem will be somewhere between udev running too early or just using cached data. (Before mkswap it contains just random data - this explains missing ID_FS_*.)

(mkswap should force udev change event but it depends if it is not too late, on the udev rules for DM device, phase of Moon and other variables.)

This cannot happen with LUKS devices because they had proper on-disk signature all the time, only with plain crypt devices.

Comment 65 udo 2013-06-06 17:12:39 UTC
Another and even worse case after booting into 3.9.4:

# free
             total       used       free     shared    buffers     cached
Mem:       7289228    1505216    5784012          0      63740     897908
-/+ buffers/cache:     543568    6745660
Swap:            0          0          0
# swapon -a
# free
             total       used       free     shared    buffers     cached
Mem:       7289228    1583184    5706044          0      70800     942668
-/+ buffers/cache:     569716    6719512
Swap:      4095984          0    4095984
# cat /proc/swaps 
Filename				Type		Size	Used	Priority
/dev/dm-11                              partition	1023996	0	-1
/dev/dm-9                               partition	1023996	0	-2
/dev/dm-10                              partition	1023996	0	-3
/dev/dm-12                              partition	1023996	0	-4

Yes, 4 out of 4 swaps were NOT activated but were prepared correctly. WHY oh WHY?

Comment 66 udo 2013-06-19 14:15:24 UTC
Any updates on stuff we can try to confirm the comment 64 items?

Comment 67 Fedora End Of Life 2013-07-04 06:35:55 UTC
This message is a reminder that Fedora 17 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 17. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '17'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 17's end of life.

Bug Reporter:  Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 17 is end of life. If you 
would still like  to see this bug fixed and are able to reproduce it 
against a later version  of Fedora, you are encouraged  change the 
'version' to a later Fedora version prior to Fedora 17's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 68 udo 2013-07-04 13:10:27 UTC
Loads of bugzilla emails won't help.

Comment 69 Michal Schmidt 2013-07-09 13:22:45 UTC
I suspect that this is due to a missing kernel config option, just like bug 979695.
A request to test using a Fedora kernel has been posted here in comment #53. I haven't seen a proof of this being followed in the following comments.

Comment 70 udo 2013-07-09 13:53:48 UTC
See https://bugzilla.redhat.com/show_bug.cgi?id=979695, comment 18.
It did not help in that case. (if that is the option you mean; help appreciated here)
I do have to test a bit more to see if it helps in the Fedora 17 cases I still have.

Comment 71 Michal Schmidt 2013-07-09 14:06:51 UTC
(In reply to udo from comment #70)
> See https://bugzilla.redhat.com/show_bug.cgi?id=979695, comment 18.

As I said in https://bugzilla.redhat.com/show_bug.cgi?id=979695#c15, CONFIG_DM_UEVENT was just a _guess_. The guess turned out to be wrong.

And you're again avoiding to do what I specifically asked for: To test with a Fedora kernel

Comment 72 udo 2013-07-09 14:14:31 UTC
I tested with a Fedora kernel in bug 979695.
So why can't anybody explain clearly the relevant features to make a certai feature work 100%?
It works, just not all the time.
I fear for migrating to Fedora 18 because of the other bug.

Comment 73 Michal Schmidt 2013-07-09 14:24:00 UTC
(In reply to udo from comment #72)
> I tested with a Fedora kernel in bug 979695.

Yes, and it helped. So let's do it for this bug as well.

> So why can't anybody explain clearly the relevant features to make a certai
> feature work 100%?

Because we usually take the kernel features for granted and rarely think worry about cut-down kernel configs.

Comment 74 udo 2013-07-12 10:27:23 UTC
Also for this bug, on F17, I did:

-CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
+CONFIG_UEVENT_HELPER_PATH=""

And 3.9.9 appears to reliably allow the swaps to be enabled. (rebooted today a few times while checking mesa bisecting)

Comment 75 Michal Schmidt 2013-07-12 12:22:33 UTC
(In reply to udo from comment #74)

Thanks for letting us know.

BTW, Fedora kernels have had CONFIG_UEVENT_HELPER_PATH="" since April 2009 (the request was in bug 496296).
Closing as NOTABUG then.

I was curious how exactly the mis-configured CONFIG_UEVENT_HELPER_PATH causes the bug, so I tested it in a KVM guest. Some observations:
 1. The bug is easy to reproduce with the altered kernel config.
 2. systemd-udevd reports an error:

    conflicting device node '/dev/mapper/swap' found, link to '/dev/dm-0' will
    not be created

 3. Indeed /dev/mapper/swap was not a symlink, but a block device node, with
    major:minor numbers the same as /dev/dm-0. This is unlike the normal
    working situation, where /dev/mapper/swap is a symlink to /dev/dm-0.
 4. udev never noticed an inotify event on /dev/mapper/swap. This can be
    explained by udev only having set an inotify watch on /dev/dm-0, which,
    though representing the same block device, is a different inode.
 5. Clearly udev is not the one who created /dev/mapper/swap in this scenario.
    Running systemd-cryptsetup under gdb proved that it did call mknod() by
    itself, from a code somewhere in libdevmapper.

I think the "udev sync" feature of libdevmapper somehow does not work as expected when /proc/sys/kernel/hotplug (CONFIG_UEVENT_HELPER_PATH) is a non-empty string. libdevmapper then apparently falls back to its own mknod, causing udev confusion. I am not sure why "udev sync" fails in this case, as I do not understand how it's implemented.

Milane, is this something that you think should be investigated more?

Comment 76 Milan Broz 2013-07-12 12:55:56 UTC
(In reply to Michal Schmidt from comment #75)
> I think the "udev sync" feature of libdevmapper somehow does not work as
> expected when /proc/sys/kernel/hotplug (CONFIG_UEVENT_HELPER_PATH) is a
> non-empty string. libdevmapper then apparently falls back to its own mknod,
> causing udev confusion. I am not sure why "udev sync" fails in this case, as
> I do not understand how it's implemented.
> 
> Milane, is this something that you think should be investigated more?

I think Peter will know better (added to cc), it is inside devicemapper library. My opinion is that this non-udev code should be disabled in Fedora completely.

Comment 77 Michal Schmidt 2013-07-12 16:41:39 UTC
Maybe it is libcryptsetup misusing devicemapper somehow. These steps demonstrate that dmsetup is at least capable of correct operation, while cryptsetup is affected by the problem:

1. Boot F18 with a standard Fedora kernel, login as root.
2. Introduce a similar breakage that udo had in the custom kernel:
   echo '/some/nonexistent/path' > /proc/sys/kernel/hotplug
3. Let there be a block device for testing:
   dd if=/dev/zero of=/nothing bs=1M count=10
   losetup /dev/loop0 /nothing
4. Have cryptsetup create a mapping:
   cryptsetup open --type plain /dev/loop0 test
   (Any random password will do when asked.)
5. And have dmsetup create one too:
   dmsetup create test2 --table '0 1000 zero'
6. See what happened:
   ls -l /dev/mapper

For me it shows 'test' is a block device (incorrectly) and 'test2' is a symlink (correctly).

Comment 78 Milan Broz 2013-07-14 22:15:15 UTC
I think that dmsetup should misbehave as well... Anyway I added issue to cryptsetup to check it later.
http://code.google.com/p/cryptsetup/issues/detail?id=170

Comment 79 Peter Rajnoha 2013-07-16 11:53:59 UTC
Hi Milan! :)

Well, yes, the hotplug path being set is a known issue - the kernel fn responsible for sending the uevent returns non-zero status ("uevent not sent") if the path is set to invalid value despite the fact the uevent *is* actually sent even if the hotplug path is invalid).

The libdevmapper checks this status in userpace and to prevent a hang in case there was some error in sending the event, it cancels the udev synchronization itself (otherwise, it would wait indefinitely for the non-existent uevent processing in userspace udev to be finished). So that's what libdevmapper does automatically... Unfortunately, we discovered the problem with the hotplug path later - the exact source is the kernel fn that returns odd status if the hotplug path is not found...

Now, as for the libdevmapper fallback to creating nodes itself under /dev - that is configurable. At the moment, by default, the libdevmapper *does* the fallback if the node/symlink is not found or found incorrect under /dev/mapper. Any libdevmapper user must disable this fallback directly if needed...

To disable the fallback, libdevmapper users must provide the extra DM_UDEV_DISABLE_LIBRARY_FALLBACK flag when creating the udev sync cookie (the dm_task_set_cookie fn call).

The dmsetup and LVM tools already disabled this fallback (that's the reason why the dmsetup worked and cryptsetup not in comment #77).

N.B. When the kernel returns a status that the uevent has not been sent (even incorrectly, like in the case of the hotplug path being used), libdevmapper always *cancels* any udev synchronization. So even with the libdm fallback disabled for the node creation, when the command is finished, the /dev could be incomplete because of the uevent still being processed by udev!

IOW, please, the use of CONFIG_UEVENT_HELPER_PATH is highly unsupported (this setting is a piece of history that makes confusion everywhere anyway).


So, from my side (libdevmapper) probably two things I should consider here:

  - totally disable the fallback in libdevmapper (iow, no need for an extra DM_UDEV_DISABLE_LIBRARY_FALLBACK flag) - but this does not depend on me only, unfortunately :)

  - document the bug with CONFIG_UEVENT_HELPER_PATH somewhere!

Comment 80 Peter Rajnoha 2013-07-16 12:03:43 UTC
(In reply to Michal Schmidt from comment #71)
> (In reply to udo from comment #70)
> > See https://bugzilla.redhat.com/show_bug.cgi?id=979695, comment 18.
> 
> As I said in https://bugzilla.redhat.com/show_bug.cgi?id=979695#c15,
> CONFIG_DM_UEVENT was just a _guess_. The guess turned out to be wrong.
> 

(The CONFIG_DM_UEVENT affects only multipath-specific events that are generated in addition to general device-mapper events. So this setting has no effect on general device-mapper uevents. It's a very bad and undescriptive config name as a matter of fact!)

Comment 81 Kay Sievers 2013-07-16 12:10:29 UTC
Keying off, making assumptions about the state of hotplug, changing behavior
of tools by looking at CONFIG_UEVENT_HELPER_PATH, or /proc/sys/kernel/hotplug
makes no sense.

The sending and creation of uevents have nothing to do with the actual state of
CONFIG_UEVENT_HELPER_PATH. They are not related to each other.

Anybody is free to add whatever useless tool there and it will slow down
the system, but it must not change the behavior of tools using uevents.

Granted, /sbin/hotplug was a broken idea, and it should be disabled, but reading
its value from today's tools and make assumptions is even more broken, just
ignore it, and please entirely remove the logic that reads it.

Comment 82 Michal Schmidt 2013-07-16 12:22:59 UTC
(In reply to Kay Sievers from comment #81)
> Keying off, making assumptions about the state of hotplug, changing behavior
> of tools by looking at CONFIG_UEVENT_HELPER_PATH, or /proc/sys/kernel/hotplug
> makes no sense.

Kay,
I think you misunderstood slightly. They are not explicitly reading /proc/sys/kernel/hotplug. They're merely checking an error code (I see that lib/kobject_uevent.c:kobject_uevent_env() returns non-zero in this case).

Comment 83 Kay Sievers 2013-07-16 12:37:22 UTC
Oh, really, that's how it works? It's still broken.

It's a nasty kernel behavior. It was introduced by:
  http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=0ad1d6f37cc3bb234c6e7ae30e40d1d40b9aa258

Which is just not right to do that, and not how uevents are supposed to work.

If anybody puts "/does-not-exist" here, the uevent sending code should not fail
in any tools! I could understand that idea, if there was also a check that there
is no single netlink client connected, but doing that unconditionally is wrong:

Making the return code of the primarily netlink-based interface depend on
the legacy/debugging usermodehelper facility breaks things which are expected
to just work, as we see here.

Anyway, I can't really see why checking the return code of a *broadcast* message
would be beneficial at all.

Comment 84 Peter Rajnoha 2013-07-16 13:01:58 UTC
Would it be possible to add a big fat warning to not use the uevent helper anymore? Really, what's the use case these days anyway? The option should be deprecated.

Comment 85 Kay Sievers 2013-07-16 13:15:40 UTC
Right, ideally we would just delete the entire thing from the kernel sources.
It can be used as a debugging tool, but it's of limited value, I think.

I guess the first thing would be to turn the code in the kernel into an option
to disable the entire facility and the proc file, and set the default of the
kernel option to =n.

This beast really creates more problem than it solves, and it's out of business
for years now.

Comment 87 Milan Broz 2013-07-23 20:40:20 UTC
Btw I tested reproducer in comment #77 and despite it sometimes prints some strange errors, I always get proper symlink.

Let's fix it on the correct side - I agree with Kay that this hotplug nonsense should be deleted from recent kernels.


Note You need to log in before you can comment on or make changes to this bug.