Description of problem: After upgrading from kernel 3.17.3 to 3.18, systemd fails to mount a volume and drops to maintenance shell. Version-Release number of selected component (if applicable): Don't know if this is a systemd problem or what. systemd-208-28.fc20.x86_64 How reproducible: Boot kernel 3.17.3 OK. Boot kernel 3.18(.0) fails. Steps to Reproduce: 1. MD raid0 on sda1 and sdb1 2. Boot kernel 3.18 3. Actual results: GPT partitions on sda and sdb appear to be discovered by the kernel: Dec 11 21:00:42 bang.int.primordial.ca kernel: sdc: sdc1 sdc2 sdc3 sdc4 sdc5 Dec 11 21:00:42 bang.int.primordial.ca kernel: sdb: sdb1 Dec 11 21:00:42 bang.int.primordial.ca kernel: sda: sda1 root filesystem (sdc4) is found and systemd is started. Then: Dec 11 21:00:43 bang.int.primordial.ca systemd-udevd[304]: inotify_add_watch(7, /dev/sda1, 10) failed: No such file or directory Dec 11 21:00:43 bang.int.primordial.ca systemd-udevd[308]: inotify_add_watch(7, /dev/sdb1, 10) failed: No such file or directory Dec 11 21:02:12 bang.int.primordial.ca systemd[1]: Job dev-disk-by\x2duuid-0f0a9437\x2d3bb8\x2d4b30\x2dab5b\x2de46dc749f88a.device/start timed out. Dec 11 21:02:12 bang.int.primordial.ca systemd[1]: Timed out waiting for device dev-disk-by\x2duuid-0f0a9437\x2d3bb8\x2d4b30\x2dab5b\x2de46dc749f88a.device. -- Subject: Unit dev-disk-by\x2duuid-0f0a9437\x2d3bb8\x2d4b30\x2dab5b\x2de46dc749f88a.device has failed -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit dev-disk-by\x2duuid-0f0a9437\x2d3bb8\x2d4b30\x2dab5b\x2de46dc749f88a.device has failed. -- -- The result is timeout. Dec 11 21:02:12 bang.int.primordial.ca systemd[1]: Dependency failed for /mnt/export/r0. -- Subject: Unit mnt-export-r0.mount has failed -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit mnt-export-r0.mount has failed. -- -- The result is dependency. Dec 11 21:02:12 bang.int.primordial.ca systemd[1]: Dependency failed for Local File Systems. -- Subject: Unit local-fs.target has failed -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit local-fs.target has failed. -- -- The result is dependency. I found this: # cat /proc/partitions major minor #blocks name 8 16 2930266584 sdb 8 0 2930266584 sda 8 32 117220824 sdc 8 33 204800 sdc1 8 34 512000 sdc2 8 35 32768000 sdc3 8 36 20480000 sdc4 8 37 63254528 sdc5 Why would there be no partitions listed for sda and sdb when I just saw the kernel find and report them? Next I force a re-read of the partition tables: # partprobe md: bind<sda1> md: bind<sdb1> md/raid0:md0: md_size is 11720536064 sectors. md: RAID0 configuration for md0 - 1 zone md: zone0=[sda1/sdb1] zone-offset= 0KB, device-offset= 0KB, size=5860268032KB md0: detected capacity change from 0 to 6000914464768 md0: unknown partition table bcache: register_bdev() registered backing device md0 bcache: bch_cached_dev_attach() Caching md0 as bcache0 on set 2172a0ff-749a-4e02-b23e-fcfa05ae9805 BTRFS: device fsid 0f0a9437-3bb8-4b30-ab5b-e46dc749f88a devid 1 transid 115586 /dev/bcache0 BTRFS info (device bcache0): disk space caching is enabled BTRFS: detected SSD devices, enabling SSD mode # mount ... /dev/bcache0 on /mnt/export/r0 type btrfs (rw,relatime,ssd,space_cache) # cat /proc/partitions major minor #blocks name 8 16 2930266584 sdb 8 17 2930265543 sdb1 8 0 2930266584 sda 8 1 2930265543 sda1 8 32 117220824 sdc 8 33 204800 sdc1 8 34 512000 sdc2 8 35 32768000 sdc3 8 36 20480000 sdc4 8 37 63254528 sdc5 9 0 5860268032 md0 253 0 5860268024 bcache0 At this point I can exit the maintenance shell and the rest of the boot-up completes successfully. However I cannot reboot without manually performing this partition discovery each time. Expected results: For the partitions to be discovered and the system boot to complete without failure. Additional info:
In double-checking the difference between the successful 3.17.3 boot and the failed 3.18 boot, I found that in 3.17.3 that my partitions sda1 and sdb1 were being remapped by multipathd, so my RAID0 was actually being found on devices dm-3[0] dm-2[1]. Blargh! I always disable or remove multipathd because I've been bitten in the a** more than a few times by it causing problems on many different systems. Somehow it got re-enabled (probably by an overzealous update) and was purely lucky that it still worked until now. That being said, something about 3.18 is not happy with multipathd (or vice-versa more likely). I don't know if multipathd is failing to remap the partitions or what (nothing in any logs I can find). I have worked around the problem by disabling multipathd, -- again! --: # systemctl disable multipathd and just for good measure, in case something causes it to be forcefully started: # echo >> /etc/multipath.conf << EOF blacklist { devnode "*" } EOF On other systems, I always remove device-mapper-multipath and it's ilk entirely, but here I'm running oVirt, and vdsm has an RPM dependency on device-mapper-multipath for whatever reason, so I cannot remove it. There's a bug here somewhere, probably with multipathd (someone should just take it out behind the shed and **BLAMO**.. put it out of it's misery), but for now the above workaround allows the system to boot successfully. I'm changing this bug's component to "device-mapper-multipath", because I've got a strong suspicion that's where the root of the problem lies. Someone feel free to reduce the Severity from urgent to high -- I don't seem to be able to.
First off, are /dev/sda and /dev/sdb actually different paths to the same device? Or do you not have any devices with multiple paths? If you don't, then adding defaults { find_multipaths yes } to /etc/multipath.conf, and then running # rm /etc/multipath/wwids should fix this. This should be the default set by anaconda, but there was recently a bug that caused it not to get set. It will make multipath ignore devices without multiple paths. You should also note that since your problem is happening in your initramfs, you need to remake that after changing /etc/multipath.conf, so the file can be copied to the new initramfs image.
Thanks for the info, Ben. Good to know that it is a misconfigured-by-default issue. For the record, my sda and sdb are independent devices (two 3TB WD Reds in a RAID0, then a bcache SSD layer on top of that). There are no devices in this system with multiple paths. Also, I do not use any initramfs. Since I have no need for multipathd, either fix effectively resolves my problem by making multipathd go sit quietly in the corner and get out of the way. I would just remove it entirely if not for the RPM dependency from vdsm. :-(
Also, if you remove /etc/multipath.conf entirely, then systemd will never start up multipathd even if the service is enabled, and all multipath commands check for a configuration file as soon as they start, and stop if not is present. Once your system is running I don't know of anything that would generate a multipath.conf file. Anaconda does and RHEV does on install, but that should be it.
*** This bug has been marked as a duplicate of bug 1160478 ***