Description of problem: systemd does not activate encrypted swap anymore since upgrading to F16 Version-Release number of selected component (if applicable): systemd-37-3.fc16.x86_64 How reproducible: do preupgrade upgrade to f16 from f15 Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: systemd[1]: Found ordering cycle on local-fs.target/start systemd[1]: Walked on cycle path to cryptsetup/start systemd[1]: Walked on cycle path to systemd-random-seed-load.service/start systemd[1]: Walked on cycle path to local-fs.target/start systemd[1]: Breaking ordering cycle by deleting job cryptsetup/start
[root@recorder LiveTV]# mkswap /dev/sda2 /dev/sda2: Device or resource busy [root@recorder LiveTV]# fdisk -l /dev/sda Disk /dev/sda: 2000.4 GB, 2000398934016 bytes 255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x000eb881 Device Boot Start End Blocks Id System /dev/sda1 * 63 409662 204800 83 Linux /dev/sda2 409663 4603966 2097152 82 Linux swap / Solaris /dev/sda3 4603967 3907024064 1951210049 8e Linux LVM [root@recorder LiveTV]# lsof|grep /dev/sda2 [root@recorder LiveTV]# fuser /dev/sda2 [root@recorder LiveTV]# free total used free shared buffers cached Mem: 3797928 1697672 2100256 0 64032 1263844 -/+ buffers/cache: 369796 3428132 Swap: 0 0 0
This ordering loop is fixed upstream and the fix should go into the next systemd update in both F15 and F16.
systemd-37-6.fc16 has been submitted as an update for Fedora 16. https://admin.fedoraproject.org/updates/systemd-37-6.fc16
Package systemd-37-6.fc16: * should fix your issue, * was pushed to the Fedora 16 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing systemd-37-6.fc16' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-0409/systemd-37-6.fc16 then log in and leave karma (feedback).
Solves the problem, but only when udisks is installed.
If udisks makes a difference, you're seeing bug 711394.
Package systemd-37-7.fc16: * should fix your issue, * was pushed to the Fedora 16 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing systemd-37-7.fc16' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-0409/systemd-37-7.fc16 then log in and leave karma (feedback).
Package systemd-37-8.fc16: * should fix your issue, * was pushed to the Fedora 16 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing systemd-37-8.fc16' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-0409/systemd-37-8.fc16 then log in and leave karma (feedback).
Package systemd-37-10.fc16: * should fix your issue, * was pushed to the Fedora 16 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing systemd-37-10.fc16' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-0409/systemd-37-10.fc16 then log in and leave karma (feedback).
Package systemd-37-11.fc16: * should fix your issue, * was pushed to the Fedora 16 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing systemd-37-11.fc16' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-0409/systemd-37-11.fc16 then log in and leave karma (feedback).
(...) EXT4-fs (dm-2): mounted filesystem with ordered data mode. Opts: (null) RPC: Registered named UNIX socket transport module. RPC: Registered udp transport module. RPC: Registered tcp transport module. RPC: Registered tcp NFSv4.1 backchannel transport module. EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: (null) EXT4-fs (dm-1): mounted filesystem with ordered data mode. Opts: (null) EXT4-fs (dm-4): mounted filesystem with ordered data mode. Opts: (null) fedora-storage-init[2745]: Setting up Logical Volume Management: 7 logical volume(s) in volume group "myvg" now active fedora-storage-init[2745]: [ OK ] lvm[2790]: 7 logical volume(s) in volume group "myvg" monitored systemd-cryptsetup[2798]: Set cipher aes, mode cbc-essiv:sha256, key size 256 bits for device /dev/sdb3. systemd-cryptsetup[2797]: Set cipher aes, mode cbc-essiv:sha256, key size 256 bits for device /dev/sdc3. systemd-cryptsetup[2799]: Set cipher aes, mode cbc-essiv:sha256, key size 256 bits for device /dev/sda3. mkswap[2853]: mkswap: /dev/mapper/swapc: warning: don't erase bootbits sectors mkswap[2853]: on whole disk. Use -f to force. mkswap[2871]: mkswap: /dev/mapper/swapa: warning: don't erase bootbits sectors mkswap[2871]: on whole disk. Use -f to force. mkswap[2867]: mkswap: /dev/mapper/swapb: warning: don't erase bootbits sectors mkswap[2867]: on whole disk. Use -f to force. mkswap[2853]: Setting up swapspace version 1, size = 1036188 KiB mkswap[2853]: no label, UUID=22062731-57a5-4436-af25-cfcc0085e0e6 mkswap[2871]: Setting up swapspace version 1, size = 1036188 KiB mkswap[2871]: no label, UUID=c5a8366e-4a78-4bc6-b4da-1ebf16f29857 mkswap[2867]: Setting up swapspace version 1, size = 1036188 KiB mkswap[2867]: no label, UUID=d5d2a009-8ff2-4bf9-b9c1-3ea57b028379 systemd[1]: Job dev-mapper-swapc.device/start timed out. systemd[1]: Job dev-mapper-swapc.swap/start failed with result 'dependency'. systemd[1]: Job dev-mapper-swapc.device/start failed with result 'timeout'. systemd[1]: Job dev-mapper-swapb.device/start timed out. systemd[1]: Job dev-mapper-swapb.swap/start failed with result 'dependency'. systemd[1]: Job dev-mapper-swapb.device/start failed with result 'timeout'. systemd[1]: Job dev-mapper-swapa.device/start timed out. systemd[1]: Job dev-mapper-swapa.swap/start failed with result 'dependency'. systemd[1]: Job dev-mapper-swapa.device/start failed with result 'timeout'. it87: Found IT8716F chip at 0x290, revision 3 it87: Beeping is supported (......) forcedeth 0000:00:06.0: irq 44 for MSI/MSI-X # free total used free shared buffers cached Mem: 4054444 1141932 2912512 0 63728 835544 -/+ buffers/cache: 242660 3811784 Swap: 0 0 0 # rpm -q systemd systemd-37-11.fc16.x86_64 # swapon -a # free total used free shared buffers cached Mem: 4054444 1148168 2906276 0 63796 841500 -/+ buffers/cache: 242872 3811572 Swap: 3108564 0 3108564
I need you to boot with 'log_buf_len=1M systemd.log_level=debug systemd.log_target=kmsg' then save the output of the dmesg command and attach it here. You will also need to attach /etc/crypttab along with /etc/fstab so we can properly diagnose this issue. Thanks.
systemd-37-11.fc16 has been pushed to the Fedora 16 stable repository. If problems still persist, please make note of it in this bug report.
Created attachment 558630 [details] dmesg etc
It's what was suggested in comment #5 and #6. systemd is not getting notified about the /dev/mapper/swap* devices properly. You can check with "udevadm info -q all -n /dev/mapper/swapa" that they are probably marked with SYSTEMD_READY=0. As a workaround, installing the udisks package still works. *** This bug has been marked as a duplicate of bug 711394 ***
# udevadm info -q all -n /dev/mapper/swapa P: /devices/virtual/block/dm-9 N: dm-9 S: mapper/swapa S: disk/by-id/dm-name-swapa S: disk/by-id/dm-uuid-CRYPT-PLAIN-swapa E: UDEV_LOG=3 E: DEVPATH=/devices/virtual/block/dm-9 E: MAJOR=253 E: MINOR=9 E: DEVNAME=/dev/dm-9 E: DEVTYPE=disk E: SUBSYSTEM=block E: DM_SBIN_PATH=/sbin E: DM_UDEV_PRIMARY_SOURCE_FLAG=1 E: DM_NAME=swapa E: DM_UUID=CRYPT-PLAIN-swapa E: DM_SUSPENDED=0 E: DM_UDEV_RULES_VSN=2 E: MPATH_SBIN_PATH=/sbin E: UDISKS_PRESENTATION_NOPOLICY=1 E: UDISKS_DM_TARGETS_COUNT=1 E: UDISKS_DM_TARGETS_TYPE=crypt E: UDISKS_DM_TARGETS_START=0 E: UDISKS_DM_TARGETS_LENGTH=2072385 E: SYSTEMD_READY=0 E: DEVLINKS=/dev/mapper/swapa /dev/disk/by-id/dm-name-swapa /dev/disk/by-id/dm-uuid-CRYPT-PLAIN-swapa E: TAGS=:systemd: # yum install udisks (....) Setting up Install Process Package udisks-1.0.4-3.fc16.x86_64 already installed and latest version Nothing to do # free total used free shared buffers cached Mem: 4054584 455300 3599284 0 28704 241924 -/+ buffers/cache: 184672 3869912 Swap: 0 0 0 # swapon -a # free total used free shared buffers cached Mem: 4054584 456588 3597996 0 28720 241972 -/+ buffers/cache: 185896 3868688 Swap: 3108564 0 3108564 So the workaround does NOT work. Now what? That other bug does not really show progress.
Even worse: the disks in my other box, this one, do not have the SYSTEMD_READY=0. So what is happening?
(In reply to comment #16) > # udevadm info -q all -n /dev/mapper/swapa ... > E: SYSTEMD_READY=0 So yes, this one is bug 711394. > # yum install udisks > (....) > Setting up Install Process > Package udisks-1.0.4-3.fc16.x86_64 already installed and latest version > Nothing to do ... > So the workaround does NOT work. Well, that's unfortunate. On my system it really helps. Apparently there is some other interaction going on in the udev rules. > Now what? > That other bug does not really show progress. Nevertheless, the other bug is what needs to be resolved. There's not much to do on the systemd side until udev gives correct and up-to-date information to systemd. (In reply to comment #17) > Even worse: the disks in my other box, this one, do not have the > SYSTEMD_READY=0. > So what is happening? I'd need the same logs from the other box.
# free total used free shared buffers cached Mem: 4053584 3869076 184508 0 32740 1666724 -/+ buffers/cache: 2169612 1883972 Swap: 2072376 38192 2034184 # swapon -a # free total used free shared buffers cached Mem: 4053584 3870420 183164 0 32780 1667572 -/+ buffers/cache: 2170068 1883516 Swap: 4144752 38192 4106560
[Un-duplicating, because the conclusion of bug 711394 did not fix this one bug.] Please provide the output of "udevadm info -q all -n /dev/mapper/swapa" after booting with systemd-37-19.fc16. A new dmesg output after booting with "log_buf_len=1M systemd.log_level=debug systemd.log_target=kmsg" would be useful too.
# udevadm info -q all -n /dev/mapper/swapa P: /devices/virtual/block/dm-9 N: dm-9 S: mapper/swapa S: disk/by-id/dm-name-swapa S: disk/by-id/dm-uuid-CRYPT-PLAIN-swapa S: disk/by-uuid/5cc66212-279a-47eb-91b2-186d9ba89511 E: UDEV_LOG=3 E: DEVPATH=/devices/virtual/block/dm-9 E: MAJOR=253 E: MINOR=9 E: DEVNAME=/dev/dm-9 E: DEVTYPE=disk E: SUBSYSTEM=block E: DM_SBIN_PATH=/sbin E: DM_UDEV_PRIMARY_SOURCE_FLAG=1 E: DM_UDEV_RULES_VSN=2 E: DM_NAME=swapa E: DM_UUID=CRYPT-PLAIN-swapa E: DM_SUSPENDED=0 E: ID_FS_UUID=5cc66212-279a-47eb-91b2-186d9ba89511 E: ID_FS_UUID_ENC=5cc66212-279a-47eb-91b2-186d9ba89511 E: ID_FS_VERSION=2 E: ID_FS_TYPE=swap E: ID_FS_USAGE=other E: MPATH_SBIN_PATH=/sbin E: UDISKS_PRESENTATION_NOPOLICY=1 E: UDISKS_DM_TARGETS_COUNT=1 E: UDISKS_DM_TARGETS_TYPE=crypt E: UDISKS_DM_TARGETS_START=0 E: UDISKS_DM_TARGETS_LENGTH=2072385 E: DEVLINKS=/dev/mapper/swapa /dev/disk/by-id/dm-name-swapa /dev/disk/by-id/dm-uuid-CRYPT-PLAIN-swapa /dev/disk/by-uuid/5cc66212-279a-47eb-91b2-186d9ba89511 E: TAGS=:systemd:
This is an improvement. udev now has the expected information about the device. To find out more, I will need the dmesg output after booting with "log_buf_len=1M systemd.log_level=debug systemd.log_target=kmsg".
If you want to do some housekeeping, please make bug 773089 a duplicate of this one.
Bug 773089 showed symptoms of more than one bug. It's already closed as a duplicate of one of them. For me it does not really matter which one. A more important thing here is that we need the information described in comment #22 after booting with systemd-37-19.fc16.
Booting 3.3.4 did not have the swap problem. Either with or without debugging.
Linux kernel version 3.3.4 solves this problem?!
Well, yes, no. I just booted into 3.3.4 and noticed that swap was on and that there was no delay during the boot. So I disabled the debug stuff and rebooted into 3.3.4. Same thing without debugging. # rpm -q systemd systemd-37-19.fc16.x86_64 So that didn't change. I didn't reconfigure much, I only uninstalled some unrelated rpms. So I am not 100% sure but so far it works for me. Swaps are on partitions here. Enabled in /etc/crypttab as encrypted.
I can confirm that systemd-37-19 fixes this issue. Encrypted swap is enabled on boot, as it should be. Thanks.
Except for when e.g. my /home is fsck'd, then I see swap-repated timeouts and no swap. `swapon -a` afterwards is the workaround. In the old days this was not a dealbreaker. The fsck info (oprphaned inodes etc) is printed twice, semi-interleaved, at least for part of the fsck info. Also fedora-storage-init-late.service failed to start for some weird reason.
swap-reLated timeouts, ouf course.
The issue of comment 29 does still happen. I see timeouts on the swap device and crypsetup etc. How can this be when swaps are on separate partitions? Why then do swaps last in row if they are timeout-sensitive? Why at all are they timeout sensitive? In the old SysV style startup we did not have issues like this. (!) If we need timouts because logic can' be depended upon, how did SysV init work? Now everytime my box didn't shut down ultraclean all filesystems are recovered without issue yet swap isn't activated for some bogus reason or rather cause.
OTOH, on another box, I see weird stuff like: May 13 03:20:44 epia systemd-cryptsetup[28493]: Set cipher aes, mode cbc-essiv:sha256, key size 256 bits for device /dev/sda9. May 13 03:20:44 epia systemd-cryptsetup[28493]: Failed to activate: Device or resource busy May 13 03:20:44 epia systemd[1]: cryptsetup: main process exited, code=exited, status=1 May 13 03:20:44 epia systemd[1]: Job dev-mapper-swap.device/start failed with result 'dependency'. May 13 03:20:44 epia systemd[1]: Unit cryptsetup entered failed state. May 13 03:25:44 epia systemd-cryptsetup[28603]: Set cipher aes, mode cbc-essiv:sha256, key size 256 bits for device /dev/sda9. May 13 03:25:44 epia systemd-cryptsetup[28603]: Failed to activate: Device or resource busy May 13 03:25:44 epia systemd[1]: cryptsetup: main process exited, code=exited, status=1 May 13 03:25:44 epia systemd[1]: Job dev-mapper-swap.device/start failed with result 'dependency'. May 13 03:25:44 epia systemd[1]: Unit cryptsetup entered failed state. May 13 03:30:44 epia systemd-cryptsetup[28732]: Set cipher aes, mode cbc-essiv:sha256, key size 256 bits for device /dev/sda9. May 13 03:30:44 epia systemd-cryptsetup[28732]: Failed to activate: Device or resource busy May 13 03:30:44 epia systemd[1]: cryptsetup: main process exited, code=exited, status=1 May 13 03:30:44 epia systemd[1]: Job dev-mapper-swap.device/start failed with result 'dependency'. May 13 03:30:44 epia systemd[1]: Unit cryptsetup entered failed state. May 13 03:35:44 epia systemd-cryptsetup[28891]: Set cipher aes, mode cbc-essiv:sha256, key size 256 bits for device /dev/sda9. May 13 03:35:44 epia systemd-cryptsetup[28891]: Failed to activate: Device or resource busy May 13 03:35:44 epia systemd[1]: cryptsetup: main process exited, code=exited, status=1 May 13 03:35:44 epia systemd[1]: Job dev-mapper-swap.device/start failed with result 'dependency'. May 13 03:35:44 epia systemd[1]: Unit cryptsetup entered failed state. May 13 03:40:44 epia systemd-cryptsetup[29018]: Set cipher aes, mode cbc-essiv:sha256, key size 256 bits for device /dev/sda9. May 13 03:40:44 epia systemd-cryptsetup[29018]: Failed to activate: Device or resource busy May 13 03:40:44 epia systemd[1]: cryptsetup: main process exited, code=exited, status=1 May 13 03:40:44 epia systemd[1]: Job dev-mapper-swap.device/start failed with result 'dependency'. May 13 03:40:44 epia systemd[1]: Unit cryptsetup entered failed state. May 13 03:45:44 epia systemd-cryptsetup[29130]: Set cipher aes, mode cbc-essiv:sha256, key size 256 bits for device /dev/sda9. May 13 03:45:44 epia systemd-cryptsetup[29130]: Failed to activate: Device or resource busy May 13 03:45:44 epia systemd[1]: cryptsetup: main process exited, code=exited, status=1 May 13 03:45:44 epia systemd[1]: Job dev-mapper-swap.device/start failed with result 'dependency'. May 13 03:45:44 epia systemd[1]: Unit cryptsetup entered failed state. May 13 03:50:44 epia systemd-cryptsetup[29430]: Set cipher aes, mode cbc-essiv:sha256, key size 256 bits for device /dev/sda9. May 13 03:50:44 epia systemd-cryptsetup[29430]: Failed to activate: Device or resource busy May 13 03:50:44 epia systemd[1]: cryptsetup: main process exited, code=exited, status=1 May 13 03:50:44 epia systemd[1]: Job dev-mapper-swap.device/start failed with result 'dependency'. May 13 03:50:44 epia systemd[1]: Unit cryptsetup entered failed state. I.e.: every five minutes it attempts to activate swap that is already active. How come? Why? And why does systemd simply quit the swap job on the other box? (this very one) When can we see a solid design that does not `forget` to fix the swap?
Slightly changed behaviour noticed. The system was shut down cleanly. System comes up. One or two swaps are activated OK. Then the system sits for a while after printing the line: Started Recreate Volatile Files and Directories (with no real disk activity) Then the screen starts scrolling again and I see some errors flash by about the other swaps. When I can open a root terminal I can simply do `swapon -a`. So why is doing this basic task still an issue? It has been half a year and it appears the basic design for activating swap is fatally flawed. Please fix so I can boot my systems without having to think about swap anymore (like it used to be).
Created attachment 598678 [details] dmesg
Essential lines from the dmesg: systemd[1]: cryptsetup: main process exited, code=exited, status=0 (...) systemd[1]: Job cryptsetup/start finished, result=done (...) systemd[1]: Job dev-mapper-swapb.device/start timed out. systemd[1]: Job dev-mapper-swapb.device/start finished, result=timeout systemd[1]: Job dev-mapper-swapb.swap/start finished, result=dependency systemd[1]: Job dev-mapper-swapb.swap/start failed with result 'dependency'. systemd[1]: Job dev-mapper-swapb.device/start failed with result 'timeout'. systemd[1]: swap.target changed dead -> active systemd[1]: Job swap.target/start finished, result=done - How can it time out? The swap was not busy or anything - Cryptsetup did exit OK so why didn't it work? - How can swap target be done when we have a failure? - How can I be sure this doesn't happen when there's only one swap? - How can I be sure the fsck that was running during part of this boot was not a dependency?
systemd[1]: dev-mapper-swapa.device changed dead -> plugged systemd[1]: Job dev-mapper-swapa.device/start finished, result=done systemd[1]: dev-disk-by\x2duuid-889d6286\x2dacfe\x2d478e\x2db3e4\x2d410170b65ed2.device changed dead -> plugged systemd[1]: dev-disk-by\x2did-dm\x2duuid\x2dCRYPT\x2dPLAIN\x2dswapa.device changed dead -> plugged systemd[1]: dev-disk-by\x2did-dm\x2dname\x2dswapa.device changed dead -> plugged systemd[1]: dev-dm\x2d12.device changed dead -> plugged systemd[1]: sys-devices-virtual-block-dm\x2d12.device changed dead -> plugged systemd[1]: About to execute: /sbin/swapon -p 0 /dev/mapper/swapa This phase does not happen for swapb in the attached dmesg of 3.4.5. This explains why the workaround of `swapon -a` works.
One again, we boot into 3.4.7, systemd fails to fulfill one of its main promises: quick boot. We had a clean shutdown, clean filesystems. We see two out of four encrypted swaps activated. More or less as in the comments 34, 35 and 36. And then the system sits. It waits for some timeout for some reason instead of doing the work. Please see https://bugzilla.redhat.com/show_bug.cgi?id=821254 for another bug that makes one wonder what is going on. (why does systemd ever need to kill a process when not shutting down?) Same here. Why does this simple task time out and still not work well? The old init was reliable and did what it needed to do. And perhaps even quicker than the current waiting period due to this bug. Should I sponsor you for a quicker solution? Do you need more logging? Or can I `downgrade` to something that works?
Any progress here? What else besides the log, mentioned in comment 34, do you need to actually fix this issue?
And it happened again! [udo@surfplank2 ~]$ free total used free shared buffers cached Mem: 4053436 958284 3095152 0 74536 368872 -/+ buffers/cache: 514876 3538560 Swap: 3108564 0 3108564 [udo@surfplank2 ~]$ su - Password: [root@surfplank2 ~]# swapon -a [root@surfplank2 ~]# free total used free shared buffers cached Mem: 4053436 2240060 1813376 0 101140 922924 -/+ buffers/cache: 1215996 2837440 Swap: 4144752 0 4144752 Same happenings as in the previous log. When even virtual memory is not started properly anymore, a basic function, then what my friend is more important?
To be clear: This issue happens on one box with 4 disks and thus four encrypted swaps but also on another box with three disks and thus three encrypted swaps. Each time we get N minus one swaps activated. Can we fix this issue?
Possibly related patch: http://lists.freedesktop.org/archives/systemd-devel/2012-October/006895.html
3.6.x?
After booting into 3.6.2 the swap-related delay is gone. All swaps appear to be mounted OK. I need to test some more but the signs are favourable.
After rebooting this morning to activate 3.6.3 patched for the `recent` ext4 corruption bug only 3 out of 4 swaps were activated.
Does this happen only on LVM systems? Or also non-LVM systems?
Over here the swaps are on partitions. /boot is also a partition. The rest is lvm or more complex.
Any updates? Sometimes it works. Most often not. In the past few days, while coping with mesa git bug(s) I had a few chances to observe the problem of not activating all encrypted swaps.
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle. Changing version to '19'. (As we did not run this process for some time, it could affect also pre-Fedora 19 development cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.) More information and reason for this action is here: https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19
[root@surfplank2 ~]# free total used free shared buffers cached Mem: 7289328 1222888 6066440 0 57372 620624 -/+ buffers/cache: 544892 6744436 Swap: 1023996 0 1023996 [root@surfplank2 ~]# swapon -a [root@surfplank2 ~]# free total used free shared buffers cached Mem: 7289328 1272928 6016400 0 57408 620776 -/+ buffers/cache: 594744 6694584 Swap: 4095984 0 4095984 So 3 out of 4 swaps failed. This caused a considerable delay when compared against a 'normal', speedy boot. This situation negates any progress made by systemd. Please fix!
can you provide your /etc/crypttab and /etc/fstab?
# cat /etc/fstab # # /etc/fstab # Created by anaconda on Tue Dec 2 15:33:03 2008 # # Accessible filesystems, by reference, are maintained under '/dev/disk' # See man pages fstab(5), findfs(8), mount(8) and/or vol_id(8) for more info # /dev/myvg/rootlv / ext4 defaults,noatime,nodiratime 1 1 /dev/myvg/usrlv /usr ext4 defaults,noatime,nodiratime,nodev 1 2 /dev/myvg/varlv /var ext4 defaults,noatime,nodiratime,nodev,nosuid 1 2 /dev/myvg/homelv /home ext4 defaults,noatime,nodiratime,nodev 1 2 UUID="1b97d8cb-9875-49cd-b2be-0ce334c1b125" /boot ext2 defaults,noatime,nodiratime,nodev,nosuid,noacl 1 2 tmpfs /dev/shm tmpfs rw,nodev,nosuid,noexec 0 0 devpts /dev/pts devpts gid=5,mode=620 0 0 sysfs /sys sysfs defaults 0 0 proc /proc proc defaults 0 0 /dev/mapper/swapa swap swap defaults 0 0 /dev/mapper/swapb swap swap defaults 0 0 /dev/mapper/swapc swap swap defaults 0 0 /dev/mapper/swapd swap swap defaults 0 0 /dev/mapper/myvg-datalv /usr/local ext4 defaults,noatime,nodiratime,nodev 1 2 /dev/mapper/myvg-srclv /usr/src ext4 defaults,noatime,nodiratime,nodev 1 2 /dev/mapper/myvg-wwwlv /var/www ext4 defaults,noatime,nodiratime,nodev 1 2 /dev/mapper/myvg-optlv /opt ext4 defaults,noatime,nodiratime,nodev 1 2 tmpfs /tmp tmpfs defaults 0 0 # cat /etc/crypttab crypto UUID=05832112-77c2-4f9a-8215-c6b454c81422 none swapa /dev/sda4 /dev/urandom swap,cipher=aes-cbc-essiv:sha256,size=256 swapb /dev/sdb4 /dev/urandom swap,cipher=aes-cbc-essiv:sha256,size=256 swapc /dev/sdc4 /dev/urandom swap,cipher=aes-cbc-essiv:sha256,size=256 swapd /dev/sdd4 /dev/urandom swap,cipher=aes-cbc-essiv:sha256,size=256
Any progress? Anything that I can help with?
Reading this BZ can be quite confusing, so I'll summarize: - There were more bugs than one, conflating the observed issue. The original comment was about an ordering loop (resolved). Later comments were about udev and SYSTEMD_READY=0 (resolved in a different BZ). Subsequent comments are about the device units timing out for some other reason. - The comments for the current issue start at about comment #29. - There are debug logs in comment #34. Unfortunately they are from a run on a custom kernel without printk timestamps. - I've been unable to reproduce the problem, until I focused on the mentions of fsck. I replaced /usr/sbin/fsck with a shell script that sleeps long enough to let the default device unit timeout trigger (90 s). I was able to reproduce the timing out of dev-mapper-swap*.device units when I let the fake fsck sleep if it was asked to check /var. There is a relevant ordering: var.mount, systemd-random-seed-load.service, systemd-cryptsetup@*.service. Since upstream commit 6839500 "cryptsetup-generator: add JobTimeoutSec=0 for the decrypted crypt devices" (in v201) the timeout for the cryptsetup-created devices is infinite, so I cannot reproduce the problem anymore. Are you still able to reproduce the problem with systemd >= 201? Current Rawhide and F19 have new enough systemd. For F18 systemd 201 is in updates-testing. If you can still reproduce it, please attach a new dmesg output after booting with "log_buf_len=1M systemd.log_level=debug systemd.log_target=kmsg". Please make sure you use a Fedora kernel while producing this log. Based on past experience, I am not spending any efforts to investigate bugs with custom kernels.
We are using systemd-44-24.fc17.x86_64. Fedora 19 does not appear on http://fedoraproject.org/get-fedora. So I assume that Fedora 18 is current and that Fedora 17 is supported. See https://fedoraproject.org/wiki/Fedora_Release_Life_Cycle?rd=LifeCycle#Maintenance_Schedule for explanation; see https://fedoraproject.org/wiki/Schedule for planning. Latest systemd is from Fri Feb 15 2013. Fedora 19 is going in July 2013 if not later. Now is May 9th, 2013. Please give support by producing a systemd for F17 with said commit so I can verify the issue is gone. I generally do not use Fedora kernels. I build my own with source from kernel.org with very minor patching. This morning I booted from 3.8.10 into 3.9.1 and saw no fsck; still only two swaps were activated. I had to do a swapon -a to get them all activated.
(In reply to comment #54) > We are using systemd-44-24.fc17.x86_64. > Fedora 19 does not appear on http://fedoraproject.org/get-fedora. > So I assume that Fedora 18 is current and that Fedora 17 is supported. > See > https://fedoraproject.org/wiki/ > Fedora_Release_Life_Cycle?rd=LifeCycle#Maintenance_Schedule for explanation; > see https://fedoraproject.org/wiki/Schedule for planning. > Latest systemd is from Fri Feb 15 2013. > Fedora 19 is going in July 2013 if not later. > Now is May 9th, 2013. > Please give support by producing a systemd for F17 with said commit so I can > verify the issue is gone. > > I generally do not use Fedora kernels. I build my own with source from > kernel.org with very minor patching. > > This morning I booted from 3.8.10 into 3.9.1 and saw no fsck; still only two > swaps were activated. > I had to do a swapon -a to get them all activated. Using your own custom kernel is not "supported" and there is not such thing as something being "supported" in Fedora it always just boils down to "best effort" which is dictated by the "free" time and the amount of effort it takes to maintain given component since this is an community driven distribution. If you want "support" buy RHEL or some other "supported" distribution so please provide the information using official fedora kernel as Michal requests or backport the relevant patch your self and test.
(In reply to comment #54) Why did you move this BZ to Rawhide (twice!) if you're only willing to test on F17? I'm not having the "support" discussion with you again. Anyway, backporting the patch is not trivial, because it has dependencies on other patches (e.g., it relies on the functionality of *.d drop-in directories). But as a quick&dirty hack for testing it would suffice to comment out this line from F17's src/core/device.c:device_init() : UNIT(d)->job_timeout = DEFAULT_TIMEOUT_USEC; This would result in infinite job timeouts for device units by default. It is not ideal, but it would only be troublesome in the case of missing devices, which is an error case anyway. You can try rebuilding the package locally with the change. I cannot build in Koji right now.
This bug has been open for a while. The normal routine is that bugs are closed when the previous Fedora version get no love anymore. I do think that process is sub optimal. Rawhide is the workaround for the auto-erase function of bugzilla. Fedora 17 mostly works for me. (tm) Fedora 18 got some bad reviews. Fedora 19 is not out yet. Thanks for the quick hack. I will have a look into that after this weekend. (I will be AFK)
(In reply to comment #56) > But as a quick&dirty hack for testing it would suffice to comment out this > line from F17's src/core/device.c:device_init() : > UNIT(d)->job_timeout = DEFAULT_TIMEOUT_USEC; > This would result in infinite job timeouts for device units by default. How would this help in case of a clean shutdown, no fsck on the bootup? I've seen swaps failing to be enabled in such conditions.
(In reply to comment #58) I do not know for sure what causes the devices to timeout. There are two possibilities: 1. The devices would be eventually discovered. In this case the unlimited timeout would help. 2. For some reason the swap devices will not be discovered by systemd. In this case the unlimited timeout will likely cause the boot to get stuck. However, this situation would be ideal for further investigation. It would help to enable debug-shell.service before starting the test. (http://freedesktop.org/wiki/Software/systemd/Debugging#Early_Debug_Shell) If the boot gets stuck, you could then switch to tty9 and use "udevadm info ..." on the decrypted swap devices to look for any hints about why some of them are visible to systemd and the others are not.
1 might not be valid as a simple `swapon -a` turns on the swaps after a 'failed' action to do so. This implies that the partitions were there, that the encryption was active and that mkswap did something. 2 harder to say something about that. as the devices are all treated by mkswap I can not see why this would happen. Below is a dump of what udev thinks of the swaps in a normal (perhaps corrected) state. # for a in 9 10 11 12; do udevadm info --query=all --name=dm-$a; done P: /devices/virtual/block/dm-9 N: dm-9 S: disk/by-id/dm-name-swapb S: disk/by-id/dm-uuid-CRYPT-PLAIN-swapb S: disk/by-uuid/b23ebd24-ec54-4389-8cbc-8dcf92f86295 S: mapper/swapb E: DEVLINKS=/dev/disk/by-id/dm-name-swapb /dev/disk/by-id/dm-uuid-CRYPT-PLAIN-swapb /dev/disk/by-uuid/b23ebd24-ec54-4389-8cbc-8dcf92f86295 /dev/mapper/swapb E: DEVNAME=/dev/dm-9 E: DEVPATH=/devices/virtual/block/dm-9 E: DEVTYPE=disk E: DM_NAME=swapb E: DM_SBIN_PATH=/sbin E: DM_SUSPENDED=0 E: DM_UDEV_PRIMARY_SOURCE_FLAG=1 E: DM_UDEV_RULES_VSN=2 E: DM_UUID=CRYPT-PLAIN-swapb E: ID_FS_TYPE=swap E: ID_FS_USAGE=other E: ID_FS_UUID=b23ebd24-ec54-4389-8cbc-8dcf92f86295 E: ID_FS_UUID_ENC=b23ebd24-ec54-4389-8cbc-8dcf92f86295 E: ID_FS_VERSION=2 E: MAJOR=253 E: MINOR=9 E: MPATH_SBIN_PATH=/sbin E: SUBSYSTEM=block E: TAGS=:systemd: E: UDISKS_DM_TARGETS_COUNT=1 E: UDISKS_DM_TARGETS_LENGTH=2048000 E: UDISKS_DM_TARGETS_START=0 E: UDISKS_DM_TARGETS_TYPE=crypt E: UDISKS_PRESENTATION_NOPOLICY=1 E: USEC_INITIALIZED=13925606 P: /devices/virtual/block/dm-10 N: dm-10 S: disk/by-id/dm-name-swapd S: disk/by-id/dm-uuid-CRYPT-PLAIN-swapd S: disk/by-uuid/40aa2cda-c96e-4ed8-add0-e02ce7cb2d1b S: mapper/swapd E: DEVLINKS=/dev/disk/by-id/dm-name-swapd /dev/disk/by-id/dm-uuid-CRYPT-PLAIN-swapd /dev/disk/by-uuid/40aa2cda-c96e-4ed8-add0-e02ce7cb2d1b /dev/mapper/swapd E: DEVNAME=/dev/dm-10 E: DEVPATH=/devices/virtual/block/dm-10 E: DEVTYPE=disk E: DM_NAME=swapd E: DM_SBIN_PATH=/sbin E: DM_SUSPENDED=0 E: DM_UDEV_PRIMARY_SOURCE_FLAG=1 E: DM_UDEV_RULES_VSN=2 E: DM_UUID=CRYPT-PLAIN-swapd E: ID_FS_TYPE=swap E: ID_FS_USAGE=other E: ID_FS_UUID=40aa2cda-c96e-4ed8-add0-e02ce7cb2d1b E: ID_FS_UUID_ENC=40aa2cda-c96e-4ed8-add0-e02ce7cb2d1b E: ID_FS_VERSION=2 E: MAJOR=253 E: MINOR=10 E: MPATH_SBIN_PATH=/sbin E: SUBSYSTEM=block E: TAGS=:systemd: E: UDISKS_DM_TARGETS_COUNT=1 E: UDISKS_DM_TARGETS_LENGTH=2048000 E: UDISKS_DM_TARGETS_START=0 E: UDISKS_DM_TARGETS_TYPE=crypt E: UDISKS_PRESENTATION_NOPOLICY=1 E: USEC_INITIALIZED=13926599 P: /devices/virtual/block/dm-11 N: dm-11 S: disk/by-id/dm-name-swapc S: disk/by-id/dm-uuid-CRYPT-PLAIN-swapc S: mapper/swapc E: DEVLINKS=/dev/disk/by-id/dm-name-swapc /dev/disk/by-id/dm-uuid-CRYPT-PLAIN-swapc /dev/mapper/swapc E: DEVNAME=/dev/dm-11 E: DEVPATH=/devices/virtual/block/dm-11 E: DEVTYPE=disk E: DM_NAME=swapc E: DM_SBIN_PATH=/sbin E: DM_SUSPENDED=0 E: DM_UDEV_PRIMARY_SOURCE_FLAG=1 E: DM_UDEV_RULES_VSN=2 E: DM_UUID=CRYPT-PLAIN-swapc E: MAJOR=253 E: MINOR=11 E: MPATH_SBIN_PATH=/sbin E: SUBSYSTEM=block E: SYSTEMD_READY=0 E: TAGS=:systemd: E: UDISKS_DM_TARGETS_COUNT=1 E: UDISKS_DM_TARGETS_LENGTH=2048000 E: UDISKS_DM_TARGETS_START=0 E: UDISKS_DM_TARGETS_TYPE=crypt E: UDISKS_PRESENTATION_NOPOLICY=1 E: USEC_INITIALIZED=13929348 P: /devices/virtual/block/dm-12 N: dm-12 S: disk/by-id/dm-name-swapa S: disk/by-id/dm-uuid-CRYPT-PLAIN-swapa S: disk/by-uuid/e70e4e23-c7a8-477d-97af-cd8d5139510b S: mapper/swapa E: DEVLINKS=/dev/disk/by-id/dm-name-swapa /dev/disk/by-id/dm-uuid-CRYPT-PLAIN-swapa /dev/disk/by-uuid/e70e4e23-c7a8-477d-97af-cd8d5139510b /dev/mapper/swapa E: DEVNAME=/dev/dm-12 E: DEVPATH=/devices/virtual/block/dm-12 E: DEVTYPE=disk E: DM_NAME=swapa E: DM_SBIN_PATH=/sbin E: DM_SUSPENDED=0 E: DM_UDEV_PRIMARY_SOURCE_FLAG=1 E: DM_UDEV_RULES_VSN=2 E: DM_UUID=CRYPT-PLAIN-swapa E: ID_FS_TYPE=swap E: ID_FS_USAGE=other E: ID_FS_UUID=e70e4e23-c7a8-477d-97af-cd8d5139510b E: ID_FS_UUID_ENC=e70e4e23-c7a8-477d-97af-cd8d5139510b E: ID_FS_VERSION=2 E: MAJOR=253 E: MINOR=12 E: MPATH_SBIN_PATH=/sbin E: SUBSYSTEM=block E: TAGS=:systemd: E: UDISKS_DM_TARGETS_COUNT=1 E: UDISKS_DM_TARGETS_LENGTH=2048000 E: UDISKS_DM_TARGETS_START=0 E: UDISKS_DM_TARGETS_TYPE=crypt E: UDISKS_PRESENTATION_NOPOLICY=1 E: USEC_INITIALIZED=13925982
This bug even occurs on systems with a single encrypted swap partition.
(In reply to udo from comment #60) > ... > P: /devices/virtual/block/dm-11 > ... > E: SYSTEMD_READY=0 > E: TAGS=:systemd: > ... The output for this device is different from the others. It is missing the ID_FS_* values and is marked as not ready for systemd. It's as if the udev watch was not working reliably. I wonder what these commands would show for the device while it's in this state: blkid /dev/dm-11 blkid -c /dev/null /dev/dm-11
# blkid /dev/dm-11 /dev/dm-11: UUID="a178f079-6b55-44b9-a4ca-cd55676a37f3" TYPE="swap" # blkid -c /dev/null /dev/dm-11 /dev/dm-11: UUID="a178f079-6b55-44b9-a4ca-cd55676a37f3" TYPE="swap" # (as in the previous output I gave, with all up and running)
Michal, just hint here :) mkswap is run from systemd (see crypttab) every time device is activated (see random key attribute). The problem will be somewhere between udev running too early or just using cached data. (Before mkswap it contains just random data - this explains missing ID_FS_*.) (mkswap should force udev change event but it depends if it is not too late, on the udev rules for DM device, phase of Moon and other variables.) This cannot happen with LUKS devices because they had proper on-disk signature all the time, only with plain crypt devices.
Another and even worse case after booting into 3.9.4: # free total used free shared buffers cached Mem: 7289228 1505216 5784012 0 63740 897908 -/+ buffers/cache: 543568 6745660 Swap: 0 0 0 # swapon -a # free total used free shared buffers cached Mem: 7289228 1583184 5706044 0 70800 942668 -/+ buffers/cache: 569716 6719512 Swap: 4095984 0 4095984 # cat /proc/swaps Filename Type Size Used Priority /dev/dm-11 partition 1023996 0 -1 /dev/dm-9 partition 1023996 0 -2 /dev/dm-10 partition 1023996 0 -3 /dev/dm-12 partition 1023996 0 -4 Yes, 4 out of 4 swaps were NOT activated but were prepared correctly. WHY oh WHY?
Any updates on stuff we can try to confirm the comment 64 items?
This message is a reminder that Fedora 17 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 17. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '17'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 17's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 17 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior to Fedora 17's end of life. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Loads of bugzilla emails won't help.
I suspect that this is due to a missing kernel config option, just like bug 979695. A request to test using a Fedora kernel has been posted here in comment #53. I haven't seen a proof of this being followed in the following comments.
See https://bugzilla.redhat.com/show_bug.cgi?id=979695, comment 18. It did not help in that case. (if that is the option you mean; help appreciated here) I do have to test a bit more to see if it helps in the Fedora 17 cases I still have.
(In reply to udo from comment #70) > See https://bugzilla.redhat.com/show_bug.cgi?id=979695, comment 18. As I said in https://bugzilla.redhat.com/show_bug.cgi?id=979695#c15, CONFIG_DM_UEVENT was just a _guess_. The guess turned out to be wrong. And you're again avoiding to do what I specifically asked for: To test with a Fedora kernel
I tested with a Fedora kernel in bug 979695. So why can't anybody explain clearly the relevant features to make a certai feature work 100%? It works, just not all the time. I fear for migrating to Fedora 18 because of the other bug.
(In reply to udo from comment #72) > I tested with a Fedora kernel in bug 979695. Yes, and it helped. So let's do it for this bug as well. > So why can't anybody explain clearly the relevant features to make a certai > feature work 100%? Because we usually take the kernel features for granted and rarely think worry about cut-down kernel configs.
Also for this bug, on F17, I did: -CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug" +CONFIG_UEVENT_HELPER_PATH="" And 3.9.9 appears to reliably allow the swaps to be enabled. (rebooted today a few times while checking mesa bisecting)
(In reply to udo from comment #74) Thanks for letting us know. BTW, Fedora kernels have had CONFIG_UEVENT_HELPER_PATH="" since April 2009 (the request was in bug 496296). Closing as NOTABUG then. I was curious how exactly the mis-configured CONFIG_UEVENT_HELPER_PATH causes the bug, so I tested it in a KVM guest. Some observations: 1. The bug is easy to reproduce with the altered kernel config. 2. systemd-udevd reports an error: conflicting device node '/dev/mapper/swap' found, link to '/dev/dm-0' will not be created 3. Indeed /dev/mapper/swap was not a symlink, but a block device node, with major:minor numbers the same as /dev/dm-0. This is unlike the normal working situation, where /dev/mapper/swap is a symlink to /dev/dm-0. 4. udev never noticed an inotify event on /dev/mapper/swap. This can be explained by udev only having set an inotify watch on /dev/dm-0, which, though representing the same block device, is a different inode. 5. Clearly udev is not the one who created /dev/mapper/swap in this scenario. Running systemd-cryptsetup under gdb proved that it did call mknod() by itself, from a code somewhere in libdevmapper. I think the "udev sync" feature of libdevmapper somehow does not work as expected when /proc/sys/kernel/hotplug (CONFIG_UEVENT_HELPER_PATH) is a non-empty string. libdevmapper then apparently falls back to its own mknod, causing udev confusion. I am not sure why "udev sync" fails in this case, as I do not understand how it's implemented. Milane, is this something that you think should be investigated more?
(In reply to Michal Schmidt from comment #75) > I think the "udev sync" feature of libdevmapper somehow does not work as > expected when /proc/sys/kernel/hotplug (CONFIG_UEVENT_HELPER_PATH) is a > non-empty string. libdevmapper then apparently falls back to its own mknod, > causing udev confusion. I am not sure why "udev sync" fails in this case, as > I do not understand how it's implemented. > > Milane, is this something that you think should be investigated more? I think Peter will know better (added to cc), it is inside devicemapper library. My opinion is that this non-udev code should be disabled in Fedora completely.
Maybe it is libcryptsetup misusing devicemapper somehow. These steps demonstrate that dmsetup is at least capable of correct operation, while cryptsetup is affected by the problem: 1. Boot F18 with a standard Fedora kernel, login as root. 2. Introduce a similar breakage that udo had in the custom kernel: echo '/some/nonexistent/path' > /proc/sys/kernel/hotplug 3. Let there be a block device for testing: dd if=/dev/zero of=/nothing bs=1M count=10 losetup /dev/loop0 /nothing 4. Have cryptsetup create a mapping: cryptsetup open --type plain /dev/loop0 test (Any random password will do when asked.) 5. And have dmsetup create one too: dmsetup create test2 --table '0 1000 zero' 6. See what happened: ls -l /dev/mapper For me it shows 'test' is a block device (incorrectly) and 'test2' is a symlink (correctly).
I think that dmsetup should misbehave as well... Anyway I added issue to cryptsetup to check it later. http://code.google.com/p/cryptsetup/issues/detail?id=170
Hi Milan! :) Well, yes, the hotplug path being set is a known issue - the kernel fn responsible for sending the uevent returns non-zero status ("uevent not sent") if the path is set to invalid value despite the fact the uevent *is* actually sent even if the hotplug path is invalid). The libdevmapper checks this status in userpace and to prevent a hang in case there was some error in sending the event, it cancels the udev synchronization itself (otherwise, it would wait indefinitely for the non-existent uevent processing in userspace udev to be finished). So that's what libdevmapper does automatically... Unfortunately, we discovered the problem with the hotplug path later - the exact source is the kernel fn that returns odd status if the hotplug path is not found... Now, as for the libdevmapper fallback to creating nodes itself under /dev - that is configurable. At the moment, by default, the libdevmapper *does* the fallback if the node/symlink is not found or found incorrect under /dev/mapper. Any libdevmapper user must disable this fallback directly if needed... To disable the fallback, libdevmapper users must provide the extra DM_UDEV_DISABLE_LIBRARY_FALLBACK flag when creating the udev sync cookie (the dm_task_set_cookie fn call). The dmsetup and LVM tools already disabled this fallback (that's the reason why the dmsetup worked and cryptsetup not in comment #77). N.B. When the kernel returns a status that the uevent has not been sent (even incorrectly, like in the case of the hotplug path being used), libdevmapper always *cancels* any udev synchronization. So even with the libdm fallback disabled for the node creation, when the command is finished, the /dev could be incomplete because of the uevent still being processed by udev! IOW, please, the use of CONFIG_UEVENT_HELPER_PATH is highly unsupported (this setting is a piece of history that makes confusion everywhere anyway). So, from my side (libdevmapper) probably two things I should consider here: - totally disable the fallback in libdevmapper (iow, no need for an extra DM_UDEV_DISABLE_LIBRARY_FALLBACK flag) - but this does not depend on me only, unfortunately :) - document the bug with CONFIG_UEVENT_HELPER_PATH somewhere!
(In reply to Michal Schmidt from comment #71) > (In reply to udo from comment #70) > > See https://bugzilla.redhat.com/show_bug.cgi?id=979695, comment 18. > > As I said in https://bugzilla.redhat.com/show_bug.cgi?id=979695#c15, > CONFIG_DM_UEVENT was just a _guess_. The guess turned out to be wrong. > (The CONFIG_DM_UEVENT affects only multipath-specific events that are generated in addition to general device-mapper events. So this setting has no effect on general device-mapper uevents. It's a very bad and undescriptive config name as a matter of fact!)
Keying off, making assumptions about the state of hotplug, changing behavior of tools by looking at CONFIG_UEVENT_HELPER_PATH, or /proc/sys/kernel/hotplug makes no sense. The sending and creation of uevents have nothing to do with the actual state of CONFIG_UEVENT_HELPER_PATH. They are not related to each other. Anybody is free to add whatever useless tool there and it will slow down the system, but it must not change the behavior of tools using uevents. Granted, /sbin/hotplug was a broken idea, and it should be disabled, but reading its value from today's tools and make assumptions is even more broken, just ignore it, and please entirely remove the logic that reads it.
(In reply to Kay Sievers from comment #81) > Keying off, making assumptions about the state of hotplug, changing behavior > of tools by looking at CONFIG_UEVENT_HELPER_PATH, or /proc/sys/kernel/hotplug > makes no sense. Kay, I think you misunderstood slightly. They are not explicitly reading /proc/sys/kernel/hotplug. They're merely checking an error code (I see that lib/kobject_uevent.c:kobject_uevent_env() returns non-zero in this case).
Oh, really, that's how it works? It's still broken. It's a nasty kernel behavior. It was introduced by: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=0ad1d6f37cc3bb234c6e7ae30e40d1d40b9aa258 Which is just not right to do that, and not how uevents are supposed to work. If anybody puts "/does-not-exist" here, the uevent sending code should not fail in any tools! I could understand that idea, if there was also a check that there is no single netlink client connected, but doing that unconditionally is wrong: Making the return code of the primarily netlink-based interface depend on the legacy/debugging usermodehelper facility breaks things which are expected to just work, as we see here. Anyway, I can't really see why checking the return code of a *broadcast* message would be beneficial at all.
Would it be possible to add a big fat warning to not use the uevent helper anymore? Really, what's the use case these days anyway? The option should be deprecated.
Right, ideally we would just delete the entire thing from the kernel sources. It can be used as a debugging tool, but it's of limited value, I think. I guess the first thing would be to turn the code in the kernel into an option to disable the entire facility and the proc file, and set the default of the kernel option to =n. This beast really creates more problem than it solves, and it's out of business for years now.
Btw I tested reproducer in comment #77 and despite it sometimes prints some strange errors, I always get proper symlink. Let's fix it on the correct side - I agree with Kay that this hotplug nonsense should be deleted from recent kernels.