1173389 – GPT partitions not initially found by kernel/systemd after kernel upgrade to 3.18

Bug 1173389 - GPT partitions not initially found by kernel/systemd after kernel upgrade to 3.18

Summary: GPT partitions not initially found by kernel/systemd after kernel upgrade to ...

Keywords:
Status:	CLOSED DUPLICATE of bug 1160478
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	device-mapper-multipath
Sub Component:
Version:	20
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	systemd-maint
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-12-12 03:00 UTC by Jillian Morgan
Modified:	2015-01-08 19:57 UTC (History)
CC List:	17 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2015-01-08 19:57:54 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Jillian Morgan 2014-12-12 03:00:40 UTC

Description of problem:
After upgrading from kernel 3.17.3 to 3.18, systemd fails to mount a volume and drops to maintenance shell.

Version-Release number of selected component (if applicable):
Don't know if this is a systemd problem or what.
systemd-208-28.fc20.x86_64

How reproducible:
Boot kernel 3.17.3 OK.
Boot kernel 3.18(.0) fails.

Steps to Reproduce:
1. MD raid0 on sda1 and sdb1
2. Boot kernel 3.18
3.

Actual results:
GPT partitions on sda and sdb appear to be discovered by the kernel:

Dec 11 21:00:42 bang.int.primordial.ca kernel:  sdc: sdc1 sdc2 sdc3 sdc4 sdc5
Dec 11 21:00:42 bang.int.primordial.ca kernel:  sdb: sdb1
Dec 11 21:00:42 bang.int.primordial.ca kernel:  sda: sda1

root filesystem (sdc4) is found and systemd is started. Then:

Dec 11 21:00:43 bang.int.primordial.ca systemd-udevd[304]: inotify_add_watch(7, /dev/sda1, 10) failed: No such file or directory
Dec 11 21:00:43 bang.int.primordial.ca systemd-udevd[308]: inotify_add_watch(7, /dev/sdb1, 10) failed: No such file or directory

Dec 11 21:02:12 bang.int.primordial.ca systemd[1]: Job dev-disk-by\x2duuid-0f0a9437\x2d3bb8\x2d4b30\x2dab5b\x2de46dc749f88a.device/start timed out.
Dec 11 21:02:12 bang.int.primordial.ca systemd[1]: Timed out waiting for device dev-disk-by\x2duuid-0f0a9437\x2d3bb8\x2d4b30\x2dab5b\x2de46dc749f88a.device.
-- Subject: Unit dev-disk-by\x2duuid-0f0a9437\x2d3bb8\x2d4b30\x2dab5b\x2de46dc749f88a.device has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit dev-disk-by\x2duuid-0f0a9437\x2d3bb8\x2d4b30\x2dab5b\x2de46dc749f88a.device has failed.
-- 
-- The result is timeout.
Dec 11 21:02:12 bang.int.primordial.ca systemd[1]: Dependency failed for /mnt/export/r0.
-- Subject: Unit mnt-export-r0.mount has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit mnt-export-r0.mount has failed.
-- 
-- The result is dependency.
Dec 11 21:02:12 bang.int.primordial.ca systemd[1]: Dependency failed for Local File Systems.
-- Subject: Unit local-fs.target has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit local-fs.target has failed.
-- 
-- The result is dependency.


I found this:

# cat /proc/partitions 
major minor  #blocks  name

   8       16 2930266584 sdb
   8        0 2930266584 sda
   8       32  117220824 sdc
   8       33     204800 sdc1
   8       34     512000 sdc2
   8       35   32768000 sdc3
   8       36   20480000 sdc4
   8       37   63254528 sdc5

Why would there be no partitions listed for sda and sdb when I just saw the kernel find and report them?

Next I force a re-read of the partition tables:

# partprobe
md: bind<sda1>
md: bind<sdb1>
md/raid0:md0: md_size is 11720536064 sectors.
md: RAID0 configuration for md0 - 1 zone
md: zone0=[sda1/sdb1]
      zone-offset=         0KB, device-offset=         0KB, size=5860268032KB

md0: detected capacity change from 0 to 6000914464768
 md0: unknown partition table
bcache: register_bdev() registered backing device md0
bcache: bch_cached_dev_attach() Caching md0 as bcache0 on set 2172a0ff-749a-4e02-b23e-fcfa05ae9805
BTRFS: device fsid 0f0a9437-3bb8-4b30-ab5b-e46dc749f88a devid 1 transid 115586 /dev/bcache0
BTRFS info (device bcache0): disk space caching is enabled
BTRFS: detected SSD devices, enabling SSD mode

# mount
...
/dev/bcache0 on /mnt/export/r0 type btrfs (rw,relatime,ssd,space_cache)

# cat /proc/partitions
major minor  #blocks  name

   8       16 2930266584 sdb
   8       17 2930265543 sdb1
   8        0 2930266584 sda
   8        1 2930265543 sda1
   8       32  117220824 sdc
   8       33     204800 sdc1
   8       34     512000 sdc2
   8       35   32768000 sdc3
   8       36   20480000 sdc4
   8       37   63254528 sdc5
   9        0 5860268032 md0
 253        0 5860268024 bcache0

At this point I can exit the maintenance shell and the rest of the boot-up completes successfully. However I cannot reboot without manually performing this partition discovery each time.

Expected results:
For the partitions to be discovered and the system boot to complete without failure.


Additional info:

Comment 1 Jillian Morgan 2014-12-12 03:47:57 UTC

In double-checking the difference between the successful 3.17.3 boot and the failed 3.18 boot, I found that in 3.17.3 that my partitions sda1 and sdb1 were being remapped by multipathd, so my RAID0 was actually being found on devices dm-3[0] dm-2[1]. Blargh!

I always disable or remove multipathd because I've been bitten in the a** more than a few times by it causing problems on many different systems. Somehow it got re-enabled (probably by an overzealous update) and was purely lucky that it still worked until now.

That being said, something about 3.18 is not happy with multipathd (or vice-versa more likely). I don't know if multipathd is failing to remap the partitions or what (nothing in any logs I can find). I have worked around the problem by disabling multipathd, -- again! --:

# systemctl disable multipathd

and just for good measure, in case something causes it to be forcefully started:

# echo >> /etc/multipath.conf << EOF
blacklist {
devnode "*"
}
EOF

On other systems, I always remove device-mapper-multipath and it's ilk entirely, but here I'm running oVirt, and vdsm has an RPM dependency on device-mapper-multipath for whatever reason, so I cannot remove it.

There's a bug here somewhere, probably with multipathd (someone should just take it out behind the shed and **BLAMO**.. put it out of it's misery), but for now the above workaround allows the system to boot successfully.

I'm changing this bug's component to "device-mapper-multipath", because I've got a strong suspicion that's where the root of the problem lies.

Someone feel free to reduce the Severity from urgent to high -- I don't seem to be able to.

Comment 2 Ben Marzinski 2014-12-12 21:55:58 UTC

First off, are /dev/sda and /dev/sdb actually different paths to the same device?  Or do you not have any devices with multiple paths?  If you don't,

then adding

defaults {
        find_multipaths yes
}

to /etc/multipath.conf, and then running

# rm /etc/multipath/wwids

should fix this.  This should be the default set by anaconda, but there was recently a bug that caused it not to get set. It will make multipath ignore devices without multiple paths. You should also note that since your problem is happening in your initramfs, you need to remake that after changing /etc/multipath.conf, so the file can be copied to the new initramfs image.

Comment 3 Jillian Morgan 2014-12-13 00:05:45 UTC

Thanks for the info, Ben. Good to know that it is a misconfigured-by-default issue.

For the record, my sda and sdb are independent devices (two 3TB WD Reds in a RAID0, then a bcache SSD layer on top of that). There are no devices in this system with multiple paths. Also, I do not use any initramfs.

Since I have no need for multipathd, either fix effectively resolves my problem by making multipathd go sit quietly in the corner and get out of the way. I would just remove it entirely if not for the RPM dependency from vdsm. :-(

Comment 4 Ben Marzinski 2014-12-13 00:53:11 UTC

Also, if you remove /etc/multipath.conf entirely, then systemd will never start up multipathd even if the service is enabled, and all multipath commands check for a configuration file as soon as they start, and stop if not is present.  Once your system is running I don't know of anything that would generate a multipath.conf file. Anaconda does and RHEV does on install, but that should be it.

Comment 5 Ben Marzinski 2015-01-08 19:57:54 UTC


*** This bug has been marked as a duplicate of bug 1160478 ***

Note You need to log in before you can comment on or make changes to this bug.