1172510 – boot fails with "Failed to mount /boot."

Bug 1172510 - boot fails with "Failed to mount /boot."

Summary: boot fails with "Failed to mount /boot."

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	dmraid
Sub Component:
Version:	22
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	LVM and device-mapper development team
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1187753 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-12-10 09:16 UTC by Vilius Šumskas
Modified:	2016-07-19 12:30 UTC (History)
CC List:	23 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2016-07-19 12:30:14 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
RAID metadata (320 bytes, application/x-gzip) 2015-02-05 08:59 UTC, Vilius Šumskas	no flags	Details
blkid non working (454.52 KB, image/jpeg) 2015-10-11 15:30 UTC, Vilius Šumskas	no flags	Details
lsblk non working (430.02 KB, image/jpeg) 2015-10-11 15:31 UTC, Vilius Šumskas	no flags	Details
working journal log (85.74 KB, text/plain) 2015-10-11 15:48 UTC, Vilius Šumskas	no flags	Details
non-working journal logs (68.51 KB, text/plain) 2015-10-11 15:50 UTC, Vilius Šumskas	no flags	Details
udev.log.pre-trigger (1.09 KB, text/plain) 2015-11-24 19:56 UTC, Helmut Schlattl	no flags	Details
udev.log.initqueue (2.43 KB, text/plain) 2015-11-24 19:57 UTC, Helmut Schlattl	no flags	Details
Updated /lib/dracut/modules.d/90dmraid/dmraid.sh (1.49 KB, application/x-shellscript) 2015-11-24 20:00 UTC, Helmut Schlattl	no flags	Details
Updated /lib/systemd/fedora-dmraid-activation (926 bytes, application/x-shellscript) 2015-11-24 20:04 UTC, Helmut Schlattl	no flags	Details
rdsosreport.txt (1.05 MB, text/plain) 2015-11-25 17:36 UTC, Helmut Schlattl	no flags	Details
View All

Description Vilius Šumskas 2014-12-10 09:16:40 UTC

Description of problem:
I have HP DL320 server with fake RAID. RAID is configured as a mirror. After upgrade from F20 to F21 the system doesn't boot anymore. Journal shows:

systemd-fsck[706]: /dev/sdb1 is in use.
systemd-fsck[706]: e2fsck: Cannot continue, aborting.
systemd-fsck[706]: fsck failed with error code 8.
systemd-fsck[706]: Ignoring error.
mount[711]: mount: /dev/sdb1 is already mounted or /boot busy
systemd[1]: boot.mount mount process exited, code=exited status=32
systemd[1]: Failed to mount /boot.
systemd[1]: Dependency failed for Local File Systems.
systemd[1]: Dependency failed for Relabel all filesystems, if necessary.
systemd[1]: Dependency failed for Mark the need to relabel after reboot.
systemd[1]: Unit boot.mount entered failed state.

It's strange, because my /boot is actually /dev/mapper/lsi_biijdfedaea1.

My /etc/fstab file:
/dev/mapper/fedora_tekila-root /                       ext4    defaults        1 1
UUID=4fcf2772-4ad7-43c3-ab9b-313d1a2d45b0 /boot                   ext4    defaults        1 2
/dev/mapper/fedora_tekila-swap swap                    swap    defaults        0 0

[]# findfs UUID=4fcf2772-4ad7-43c3-ab9b-313d1a2d45b0
/dev/mapper/lsi_biijdfedaea1

If I change /etc/fstab file to point to /dev/mapper/lsi_biijdfedaea1 directly, not using UUID, then the system boots normally.

Version-Release number of selected component (if applicable):
systemd-216-12.fc21.i686

How reproducible:
Always


Additional info:
I'm not exactly sure if that's an issue in systemd or any other related component.

[]# blkid
/dev/sda2: UUID="1mKoSn-TThI-dtCN-OZxg-IX3z-QNU0-UxoJEH" TYPE="LVM2_member" PARTUUID="0001a902-02"
/dev/sdb1: UUID="4fcf2772-4ad7-43c3-ab9b-313d1a2d45b0" TYPE="ext4" PARTUUID="0001a902-01"
/dev/sdb2: UUID="1mKoSn-TThI-dtCN-OZxg-IX3z-QNU0-UxoJEH" TYPE="LVM2_member" PARTUUID="0001a902-02"
/dev/mapper/lsi_biijdfedaea1: UUID="4fcf2772-4ad7-43c3-ab9b-313d1a2d45b0" TYPE="ext4" PARTUUID="0001a902-01"
/dev/mapper/lsi_biijdfedaea2: UUID="1mKoSn-TThI-dtCN-OZxg-IX3z-QNU0-UxoJEH" TYPE="LVM2_member" PARTUUID="0001a902-02"
/dev/mapper/fedora_tekila-swap: UUID="0af3d641-d8a5-4fa5-a3a0-6faef0509973" TYPE="swap"
/dev/mapper/fedora_tekila-root: UUID="ad5700c7-e8f6-472f-891e-be057de9ce57" TYPE="ext4"
/dev/mapper/lsi_biijdfedaea: PTUUID="0001a902" PTTYPE="dos"

[]# more /proc/partitions
major minor  #blocks  name

   8        0   78150744 sda
   8        2   77637632 sda2
   8       16   78150744 sdb
   8       17     512000 sdb1
   8       18   77637632 sdb2
  11        0    1048575 sr0
 253        0   78150743 dm-0
 253        1     512000 dm-1
 253        2   77637632 dm-2
 253        3    5210112 dm-3
 253        4   72425472 dm-4

More errors which could be related (or not):

lvm[699]: device-mapper: reload ioctl on  failed: Invalid argument
kernel: device-mapper: table: 253:4: linear: dm-linear: Device lookup failed
kernel: device-mapper: ioctl: error adding target to table
kernel: device-mapper: table: 253:3: linear: dm-linear: Device lookup failed
kernel: device-mapper: ioctl: error adding target to table
lvm[699]: Failed to suspend root.
lvm[699]: device-mapper: reload ioctl on  failed: Invalid argument
lvm[699]: Failed to suspend swap.

Comment 1 Lennart Poettering 2015-02-04 20:06:24 UTC

My guess is that your fakeraid setup results in the raw devices not carriying any particular header, so that udev cannot distuingish them from the resulting assembled disk, and by specifying the uuid it hence might go for the raw partition instead of the fakeraid setup.

Such a fakeraid technology is of course awful. However, this is nothing we can solve in the systemd context. Reassigning to the dm people.

Comment 2 Peter Rajnoha 2015-02-05 08:35:11 UTC

Might be a blkid issue as well or incorrect udev rule, will check... For starters, would it be possible to get the output of "dmraid --dump_metadata" and "dmraid -r".

Comment 3 Peter Rajnoha 2015-02-05 08:36:45 UTC

(...while using the FS UUID in fstab...) Does the problem still exist if you recreate the initramfs ("dracut --force").

Comment 4 Vilius Šumskas 2015-02-05 08:59:50 UTC

Created attachment 988453 [details]
RAID metadata

Comment 5 Vilius Šumskas 2015-02-05 09:01:48 UTC

[]# dmraid -r
/dev/sdb: lsi, "lsi_biijdfedaea", mirror, ok, 156301487 sectors, data@ 0
/dev/sda: lsi, "lsi_biijdfedaea", mirror, ok, 156301487 sectors, data@ 0

Metadata dump attached.

I do not want to mess with dracut recreate now, maybe later this week, but I think I've tried that couple of times.

Comment 6 Vilius Šumskas 2015-05-27 13:55:08 UTC

Not sure how but this is fixed now in F22. Just upgraded this morning.

findfs produces now somewhat different results then before, but at least it works:

[]# findfs UUID=4fcf2772-4ad7-43c3-ab9b-313d1a2d45b0
/dev/sda1

[]# more /proc/partitions
major minor  #blocks  name

  11        0    1048575 sr0
   8        0   78150744 sda
   8        1     512000 sda1
   8        2   77637632 sda2
   8       16   78150744 sdb
   8       18   77637632 sdb2
 253        0   78150743 dm-0
 253        1     512000 dm-1
 253        2   77637632 dm-2
 253        3    5210112 dm-3
 253        4   72425472 dm-4

By the way new systemd now shows these warnings. Not sure of that's related:

May 27 16:43:17 host systemd[1]: Device dev-disk-by\x2did-lvm\x2dpv\x2duuid\x2d1mKoSn\x2dTThI\x2ddtCN\x2dOZxg\x2dIX3z\x2dQNU0\x2dUxoJEH.device appeared twice with different sysfs paths /sys/devices/pci0000:00/0000:00:02.0/ata4/host3
May 27 16:43:17 host kernel: Adding 5210108k swap on /dev/mapper/fedora_tekila-swap.  Priority:-1 extents:1 across:5210108k FS
May 27 16:43:17 host systemd[1]: Device dev-disk-by\x2duuid-4fcf2772\x2d4ad7\x2d43c3\x2dab9b\x2d313d1a2d45b0.device appeared twice with different sysfs paths /sys/devices/virtual/block/dm-1 and /sys/devices/pci0000:00/0000:00:02.0/a
May 27 16:43:17 host kernel: EXT4-fs (dm-1): mounted filesystem with ordered data mode. Opts: (null)
May 27 16:43:17 host kernel: device-mapper: ioctl: device doesn't appear to be in the dev hash table.
May 27 16:43:17 host kernel:  sda: sda1 sda2
May 27 16:43:17 host systemd[1]: Device dev-disk-by\x2did-lvm\x2dpv\x2duuid\x2d1mKoSn\x2dTThI\x2ddtCN\x2dOZxg\x2dIX3z\x2dQNU0\x2dUxoJEH.device appeared twice with different sysfs paths /sys/devices/pci0000:00/0000:00:02.0/ata4/host3
May 27 16:43:17 host systemd[1]: Requested transaction contradicts existing jobs: Resource deadlock avoided
May 27 16:43:17 host systemd[1]: Device dev-disk-by\x2duuid-4fcf2772\x2d4ad7\x2d43c3\x2dab9b\x2d313d1a2d45b0.device appeared twice with different sysfs paths /sys/devices/virtual/block/dm-1 and /sys/devices/pci0000:00/0000:00:02.0/a
May 27 16:43:17 host systemd[1]: Device dev-disk-by\x2did-lvm\x2dpv\x2duuid\x2d1mKoSn\x2dTThI\x2ddtCN\x2dOZxg\x2dIX3z\x2dQNU0\x2dUxoJEH.device appeared twice with different sysfs paths /sys/devices/pci0000:00/0000:00:02.0/ata4/host3
May 27 16:43:07 host systemd[1]: Started Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling.
May 27 16:43:07 host systemd[1]: Started Create Static Device Nodes in /dev.
May 27 16:43:07 host systemd[1]: Reached target Local File Systems (Pre).
May 27 16:43:07 host systemd[1]: Starting Local File Systems (Pre).
May 27 16:43:07 host systemd[1]: Starting udev Kernel Device Manager...
May 27 16:43:07 host systemd-udevd[589]: starting version 219
May 27 16:43:08 host systemd[1]: Started Configure read-only root support.
May 27 16:43:08 host systemd[1]: Starting Load/Save Random Seed...
May 27 16:43:08 host systemd[1]: Started udev Kernel Device Manager.
May 27 16:43:08 host systemd[1]: Started Load/Save Random Seed.
May 27 16:43:09 host systemd[1]: Found device /dev/disk/by-uuid/4fcf2772-4ad7-43c3-ab9b-313d1a2d45b0.

Comment 7 Hin-Tak Leung 2015-07-23 01:37:24 UTC

The "...device appeared twice with different sysfs paths ..." is an upstream bug 
https://bugs.freedesktop.org/show_bug.cgi?id=90386
which apparently just zealous logging - the "fix" is simply to change it from warning to a debug. I'll file a bug here to track it...

Comment 8 Vilius Šumskas 2015-10-06 15:00:43 UTC

I just now realized that I forgot to update the ticket.

Unfortunatelly this problem was not solved in F22. When rebooting the server there are times when it works, and there are times that it doesn't work.

Could it be related to the order of disks found on the system? Maybe the order is randomly different every time the server is started and one of the disks doesn't have needed information to properly boot up?

Comment 9 Peter Rajnoha 2015-10-07 11:45:15 UTC

How does the output of blkid and output of lsblk looks like in working and non-working case? (the blkid for non-working case is in comment #0 already, but let's get fresh results for sure)

Comment 10 Vilius Šumskas 2015-10-11 15:30:50 UTC

Created attachment 1081799 [details]
blkid non working

Comment 11 Vilius Šumskas 2015-10-11 15:31:35 UTC

Created attachment 1081800 [details]
lsblk non working

Comment 12 Vilius Šumskas 2015-10-11 15:45:13 UTC

I'm attaching a non working condition as attachments (sorry images only).

Working condition, which I managed to get after 5-6 reboots, below:

[vilius@tekila ~]$ blkid
/dev/sda1: UUID="4fcf2772-4ad7-43c3-ab9b-313d1a2d45b0" TYPE="ext4" PARTUUID="0001a902-01"
/dev/sda2: UUID="1mKoSn-TThI-dtCN-OZxg-IX3z-QNU0-UxoJEH" TYPE="LVM2_member" PARTUUID="0001a902-02"
/dev/sdb2: UUID="1mKoSn-TThI-dtCN-OZxg-IX3z-QNU0-UxoJEH" TYPE="LVM2_member" PARTUUID="0001a902-02"
/dev/mapper/lsi_biijdfedaea1: UUID="4fcf2772-4ad7-43c3-ab9b-313d1a2d45b0" TYPE="ext4" PARTUUID="0001a902-01"
/dev/mapper/lsi_biijdfedaea2: UUID="1mKoSn-TThI-dtCN-OZxg-IX3z-QNU0-UxoJEH" TYPE="LVM2_member" PARTUUID="0001a902-02"
/dev/mapper/fedora_tekila-swap: UUID="0af3d641-d8a5-4fa5-a3a0-6faef0509973" TYPE="swap"
/dev/mapper/fedora_tekila-root: UUID="ad5700c7-e8f6-472f-891e-be057de9ce57" TYPE="ext4"
[vilius@tekila ~]$ lsblk
NAME                     MAJ:MIN RM  SIZE RO TYPE   MOUNTPOINT
sda                        8:0    0 74.5G  0 disk
├─sda1                     8:1    0  500M  0 part
├─sda2                     8:2    0   74G  0 part
└─lsi_biijdfedaea        253:0    0 74.5G  0 dmraid
  ├─lsi_biijdfedaea1     253:1    0  500M  0 part
  └─lsi_biijdfedaea2     253:2    0   74G  0 part
    ├─fedora_tekila-swap 253:3    0    5G  0 lvm    [SWAP]
    └─fedora_tekila-root 253:4    0 69.1G  0 lvm    /
sdb                        8:16   0 74.5G  0 disk
├─sdb2                     8:18   0   74G  0 part
└─lsi_biijdfedaea        253:0    0 74.5G  0 dmraid
  ├─lsi_biijdfedaea1     253:1    0  500M  0 part
  └─lsi_biijdfedaea2     253:2    0   74G  0 part
    ├─fedora_tekila-swap 253:3    0    5G  0 lvm    [SWAP]
    └─fedora_tekila-root 253:4    0 69.1G  0 lvm    /
sr0                       11:0    1 1024M  0 rom

This is under latest 4.1.10 Fedora kernel.

As you can probably see they are completely the same except of /dev/mapper/lsi_biijdfedaea in blkid is a little bit out of order. But that should not be a problem, right?

What is interesting (and what I only realized now) is that in case the server boots successfully the /boot mount is unmounted by systemd automatically for some reason. If I boot the server by changing fstab to use disk path instead of UUID, then the server boots normaly everytime and /boot mount is always mounted correctly.

I'm also attaching journal logs of working and non-working conditions.

Comment 13 Vilius Šumskas 2015-10-11 15:48:41 UTC

Created attachment 1081801 [details]
working journal log

Comment 14 Vilius Šumskas 2015-10-11 15:50:30 UTC

Created attachment 1081802 [details]
non-working journal logs

Comment 15 Peter Rajnoha 2015-10-12 11:11:27 UTC

(In reply to Vilius Šumskas from comment #12)
...
> /dev/sda2: UUID="1mKoSn-TThI-dtCN-OZxg-IX3z-QNU0-UxoJEH" TYPE="LVM2_member"
> PARTUUID="0001a902-02"
> /dev/sdb2: UUID="1mKoSn-TThI-dtCN-OZxg-IX3z-QNU0-UxoJEH" TYPE="LVM2_member"
> PARTUUID="0001a902-02"
...
> /dev/mapper/lsi_biijdfedaea2: UUID="1mKoSn-TThI-dtCN-OZxg-IX3z-QNU0-UxoJEH"
> TYPE="LVM2_member" PARTUUID="0001a902-02"

OK, in this case the sda2 and sdb2 shouldn't have been identified as "LVM2_member" - only the top-level dmraid device is the correct one to use. So yes, there's some confusion with marking devices with proper blkid labels which could cause improper device to be used in the end.

I'll check the journals too. Thanks for the logs!

Comment 16 Karel Zak 2015-10-12 13:20:16 UTC

Vilius, please try: 

  # LIBBLKID_DEBUG=all blkid -p -o udev /dev/sda2 &> log
  
and add the "log" to the bugzilla. You can also try:

  # wipefs --no-act /dev/sda2

it will NOT erase anything, the command just lists all detectable signatures on the device.

It seems that the problem is libblkid LSI RAID signature detection.

If blkid -p or wipefs is not able to detect the RAID then it would be nice to have the superblock for (my) local testing, all you need is to call:

 # dd if=/dev/sda2 of=raid.img bs=512 skip=$(( $(cat /sys/block/sda/sda2/size) - 2048 ))

 # gzip raid.img

it will dump last 1MiB of the device to the file. Add this file to bugzilla if necessary (if wipefs does not see the raid).

Thanks for you time!

Comment 17 Karel Zak 2015-10-12 13:41:53 UTC

Ah, I probably see (from libblkid code) where is the problem. Please, ignore comment #16.

The libblkid assumes that firmware based RAIDs are possible on whole-disk devices only, but you have it on partition (sda2 and sdb2). So, your RAID is invisible for libblkid (and then for used/systemd).

The problem exists since util-linux v2.18 (year 2010)...

Comment 18 Vilius Šumskas 2015-10-12 14:41:28 UTC

Thank you for your investigation Karel.

I'm just wondering would this also somehow explain why it did work in F20 or why it does work 1 time out of 5?

P.S. I still can do the logs just tell me if you need them from a working condition when a system is booted through disk path, or non-working one when a system uses UUID?

Comment 19 Karel Zak 2015-10-13 09:08:06 UTC

Vilius, where is expected the partition table? Do you have:

 1/ partitioned RAID
    - partition table has been created by fdisk /dev/mapper/lsi_biijdfedaea

 2/ or the original sda and sdb devices has been partitioned and the RAID is created from sda2 and sdb2?

Now when I read you report again, I'm not sure if my conclusion from comment #17 is true. Please, copy & part to bugzilla:

 fdisk --list /dev/mapper/lsi_biijdfedaea
 fdisk --list /dev/sda
 fdisk --list /dev/sdb
 
The case 2) is pretty common with Linux SW RAID, but libblkid does not support it your RAID.

Comment 20 Vilius Šumskas 2015-10-13 10:50:39 UTC

That's a good question actually. I don't remember how exactly that RAID was created but it was something along these lines:

I installed Fedora couple of years ago (it was something like F12 or F13). I definetely did not create /dev/mapper/lsi_biijdfedaea myself and I didn't choose any RAID levels or any other options. It was picked up automatically by anaconda, I've just assumed that Fedora found my fake RAID device and automatically created a mapper device for that. That lsi_biijdfedaea device then was partitioned into lsi_biijdfedaea1 for /boot and lsi_biijdfedaea2 for LVM.

All UUIDs and fstab was also generated at that time and it worked without issues up until F21.

[]# fdisk --list /dev/mapper/lsi_biijdfedaea
Disk /dev/mapper/lsi_biijdfedaea: 74.5 GiB, 80026361344 bytes, 156301487 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x0001a902

Device                       Boot   Start       End   Sectors  Size Id Type
/dev/mapper/lsi_biijdfedaea1 *       2048   1026047   1024000  500M 83 Linux
/dev/mapper/lsi_biijdfedaea2      1026048 156301311 155275264   74G 8e Linux LVM

[]# fdisk --list /dev/sda
Disk /dev/sda: 74.5 GiB, 80026361856 bytes, 156301488 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x0001a902

Device     Boot   Start       End   Sectors  Size Id Type
/dev/sda1  *       2048   1026047   1024000  500M 83 Linux
/dev/sda2       1026048 156301311 155275264   74G 8e Linux LVM

[]# fdisk --list /dev/sdb
Disk /dev/sdb: 74.5 GiB, 80026361856 bytes, 156301488 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x0001a902

Device     Boot   Start       End   Sectors  Size Id Type
/dev/sdb1  *       2048   1026047   1024000  500M 83 Linux
/dev/sdb2       1026048 156301311 155275264   74G 8e Linux LVM

Comment 21 Karel Zak 2015-10-13 12:32:02 UTC

Yeah, it really seems like a partitioned RAID1. In this case sda{1,2} and sdb{1,2} devices are unwanted and incorrectly created by kernel. Unfortunately, kernel is not smart enough to detect raid signature before it parses PT on sda and sdb. 

Vilius, try:

   blkid -p -o udev /dec/sda /dev/sdb

I guess the result will be "lsi_mega_raid_member".

IMHO udev rules should be improved to call "partx -d" to delete partitions on underlying whole-disk devices (aka raid_members) before we assemble a raid device from whole-disk devices. 

(I thought we already have such udev rule, because we had this problem years ago with Linux raid where old (0.90) on-disk format uses end of the device for the raid metadata).

Comment 22 Vilius Šumskas 2015-10-13 12:37:51 UTC

Yep.

[]# blkid -p -o udev /dev/sda /dev/sdb
ID_FS_TYPE=lsi_mega_raid_member
ID_FS_USAGE=raid

ID_FS_TYPE=lsi_mega_raid_member
ID_FS_USAGE=raid

Comment 23 Harald Hoyer 2015-10-15 10:29:42 UTC

What is your kernel command line?

# cat /proc/cmdline

Comment 24 Karel Zak 2015-10-15 10:37:52 UTC

Just for the record (note for comment #21): we already have such udev rule (for dracut), see /usr/lib/dracut/modules.d/90dmraid/61-dmraid-imsm.rules.

Comment 25 Vilius Šumskas 2015-10-15 12:21:23 UTC

[]$ cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-4.1.10-200.fc22.i686+PAE root=/dev/mapper/fedora_tekila-root
 ro rd.lvm.lv=fedora_tekila/swap vconsole.font=latarcyrheb-sun16 rd.dm.uuid=lsi_
biijdfedaea rd.lvm.lv=fedora_tekila/root rhgb quiet ipv6.disable=1 audit=0 LANG=
en_US.UTF-8

Comment 26 Helmut Schlattl 2015-11-22 10:43:48 UTC

I have the same problem and submitted bug 1187753 some time ago. Perhaps it is helpful in solving the issue.

Comment 27 Helmut Schlattl 2015-11-24 19:54:59 UTC

Did some more debugging by playing around with rd.break=initqueue or rd.break=pre-trigger. 

rd.break=pre-trigger:
  sda, sdb and all subdevices (sda1 sdb1 etc.) were apparent

rd.break=initqueue:
  only sda and sdb were there. Thus 'partx -d' of comment #21 is working!

To get rid of sda1, sdb1 at rd.break=pre-trigger I performed the partx -d manually. 
 
Then I did in both cases:
  udevadm monitor 1>/run/udev.log 2>&1 &
  dmraid -ay -i -p -Z pdc_bcdbecjef

Result for udev.log is the respective attachment udev.log....

As you can see udev (or the kernel?) is adding the subdevices of sda and sdb again! Could not figure out why, but a bugfix is to remove the devices again after activating the dmraid-arrays using partx. 

In the attachment you'll find a replacement for /lib/dracut/modules.d/90dmraid/dmraid.sh. It's probably not the most elegant, but it is working!

Comment 28 Helmut Schlattl 2015-11-24 19:56:50 UTC

Created attachment 1098333 [details]
udev.log.pre-trigger

Output of udevadm monitor at rd.break=pre-trigger

Comment 29 Helmut Schlattl 2015-11-24 19:57:49 UTC

Created attachment 1098335 [details]
udev.log.initqueue

Output of udevadm monitor at rd.break=initqueue

Comment 30 Helmut Schlattl 2015-11-24 20:00:10 UTC

Created attachment 1098336 [details]
Updated /lib/dracut/modules.d/90dmraid/dmraid.sh

Bug fix to remove subdevices (dracut)

Comment 31 Helmut Schlattl 2015-11-24 20:04:47 UTC

Created attachment 1098337 [details]
Updated /lib/systemd/fedora-dmraid-activation

Analogously, the activation of dmraid in systemd has to be adjusted.

Comment 32 Helmut Schlattl 2015-11-24 20:07:56 UTC

*** Bug 1187753 has been marked as a duplicate of this bug. ***

Comment 33 Harald Hoyer 2015-11-25 12:50:40 UTC

61-dmraid-imsm.rules should have taken care of this:


SUBSYSTEM!="block", GOTO="dm_end"
ACTION!="add|change", GOTO="dm_end"
# Also don't process disks that are slated to be a multipath device
ENV{DM_MULTIPATH_DEVICE_PATH}=="?*", GOTO="dm_end"

ENV{ID_FS_TYPE}=="linux_raid_member", GOTO="dm_end"

ENV{ID_FS_TYPE}!="*_raid_member" , GOTO="dm_end"

ENV{ID_FS_TYPE}=="isw_raid_member", ENV{rd_NO_MDIMSM}!="?*", GOTO="dm_end"
ENV{ID_FS_TYPE}=="ddf_raid_member", ENV{rd_NO_MDDDF}!="?*", GOTO="dm_end"

ENV{rd_NO_DM}=="?*", GOTO="dm_end"

ENV{DM_UDEV_DISABLE_OTHER_RULES_FLAG}=="1", GOTO="dm_end"

PROGRAM=="/bin/sh -c 'for i in $sys/$devpath/holders/dm-[0-9]*; do [ -e $$i ] && exit 0; done; exit 1;' ", \
    GOTO="dm_end"

ENV{DEVTYPE}!="partition", \
    RUN+="/sbin/partx -d --nr 1-1024 $env{DEVNAME}"

RUN+="/sbin/initqueue --onetime --unique --settled /sbin/dmraid_scan $env{DEVNAME}"

LABEL="dm_end"


.... see the "partx -d"

Comment 34 Harald Hoyer 2015-11-25 12:52:58 UTC

maybe this was:

commit ede344452a54e1c53f541cad12a06269a4fe96a9
Author: Kay Sievers <kay>
Date:   Wed Jun 4 13:30:24 2014 +0200

    udev: try first re-reading the partition table
    
    mounted partitions:
      # dd if=/dev/zero of=/dev/sda bs=1 count=1
      UDEV  [4157.369250] change   .../0:0:0:0/block/sda (block)
      UDEV  [4157.375059] change   .../0:0:0:0/block/sda/sda1 (block)
      UDEV  [4157.397088] change   .../0:0:0:0/block/sda/sda2 (block)
      UDEV  [4157.404842] change   .../0:0:0:0/block/sda/sda4 (block)
    
    unmounted partitions:
      # dd if=/dev/zero of=/dev/sdb bs=1 count=1
      UDEV  [4163.450217] remove   .../target6:0:0/6:0:0:0/block/sdb/sdb1 (block)
      UDEV  [4163.593167] change   .../target6:0:0/6:0:0:0/block/sdb (block)
      UDEV  [4163.713982] add      .../target6:0:0/6:0:0:0/block/sdb/sdb1 (block)

Comment 35 Harald Hoyer 2015-11-25 12:58:50 UTC

Can you please boot your system with "rd.debug rd.udev.log-priority=debug" and save /run/initramfs/rdsosreport.txt as soon, as dracut drops your to a shell?

Comment 36 Helmut Schlattl 2015-11-25 17:36:20 UTC

Created attachment 1098927 [details]
rdsosreport.txt


The critical lines are probably around line 9920.

Comment 37 Helmut Schlattl 2015-11-25 17:55:47 UTC

I am not familiar with the way udev is operating, so I can only guess.

I am simply wondering: Is
UDEV  [4163.713982] add      .../target6:0:0/6:0:0:0/block/sdb/sdb1 (block)
executed after
DEV  [4163.593167] change   .../target6:0:0/6:0:0:0/block/sdb (block)   ?
In yes, then any 'partx -d' executed in 'change ... sdb' does not has consequence, because 'add ...sdb1' will add it again.

Perhaps some helpful further observation (in rd.break=initqueue):
When executing  'dmraid -ay -i -p pdc_bcdbecjef' (i.e., without asking dmraid to remove  the partitions) then there is no 'change .../sda' udev event, and thus no adding of the subdevices.

Comment 38 Harald Hoyer 2015-11-26 15:41:52 UTC

[    2.000574] mainpc.home systemd-udevd[379]: starting '/sbin/partx -d --nr 1-1024 /dev/sdb'
[    2.097354] mainpc.home systemd-udevd[351]: Failure opening block device /dev/sdb1: No such file or directory
[    2.100466] mainpc.home systemd-udevd[343]: Failure opening block device /dev/sdb4: No such file or directory

.. so far so good.

then
[   21.762346] mainpc.home dracut-initqueue[330]: /sbin/dmraid_scan@38(main): dmraid -ay -i -p --rm_partitions pdc_bcdbecjef

which outputs:
RAID set "pdc_bcdbecjef" was activated
RAID set "pdc_bcdbecjef" was not activated

??? and then the partitions get readded.

[   22.242296] mainpc.home kernel: device-mapper: ioctl: device doesn't appear to be in the dev hash table.
[   22.243179] mainpc.home kernel:  sda: sda1 sda4
[   22.244431] mainpc.home kernel:  sdb: sdb1 sdb4

reassigning to dmraid

Comment 39 Zdenek Kabelac 2015-11-26 15:51:35 UTC

dmraid is considered obsolete for quite a few years.

Its functionality is replaced with  'mdadm' tool.

Why is not 'mdadm' used instead of dmraid ?

Is mdadm missing some functionality ?

Comment 40 Vilius Šumskas 2015-11-26 16:00:03 UTC

I didn't choose to use anything. All mapper devices on the system were automatically created by Anaconda years ago. If dmraid is considered obsolete, then Anaconda should not use it. At the minimum this should be mentioned at least in the release/upgrade notes of previous Fedora versions.

In any case, we need a viable plan. Should we move from UUIDs to labels back again, because official Fedora recommendation was to move fom labels to UUIDs? Should we just reinstall the server and hope that Anaconda will try to use mdadm now? Wait for the fix?

Comment 41 Harald Hoyer 2015-11-26 16:01:32 UTC

# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-4.2.6-300.fc23.x86_64 root=UUID=e9229340-75b9-407b-afa5-3f4656dea902 ro fbcon=font:SUN8x16 LANG=de_DE.UTF-8 loop.max_loop=16 loop.max_part=8 selinux=0 rd.auto=1 rd.md=0 radeon.modeset=1 elevator=noop rd.debug rd.udev.log-priority=debug

so md is disabled by purpose with "rd.md=0"

You can just remove it and add "rd.auto"

Comment 42 Harald Hoyer 2015-11-26 16:03:07 UTC

If your machine boots, then see what dracut suggests for the kernel cmdline:

# dracut --print-cmdline

Comment 43 Vilius Šumskas 2015-11-26 16:04:42 UTC

Sorry, when I said "labels" in previous comment I've ment "device paths".

Comment 44 Zdenek Kabelac 2015-11-26 16:12:09 UTC

So if this is  'lsi' format you might not have other chance then using 'dmraid'

Maybe  initramfs needs to be regenerated.

Moving to Heinz.

Comment 45 Harald Hoyer 2015-11-26 16:21:52 UTC

Ok, can you please try the following fix:

Edit /usr/lib/dracut/modules.d/90dmraid/61-dmraid-imsm.rules

After:
ENV{ID_FS_TYPE}!="*_raid_member", , GOTO="dm_end"

Add:
OPTIONS:="nowatch"


Then recreate the initramfs:

# dracut --kver <KERNEL_VERSION> --force

Comment 46 Harald Hoyer 2015-11-26 16:24:46 UTC

(In reply to Harald Hoyer from comment #45)
> Ok, can you please try the following fix:
> 
> Edit /usr/lib/dracut/modules.d/90dmraid/61-dmraid-imsm.rules
> 
> After:
> ENV{ID_FS_TYPE}!="*_raid_member", , GOTO="dm_end"
> 
> Add:
> OPTIONS:="nowatch"
> 
> 
> Then recreate the initramfs:
> 
> # dracut --kver <KERNEL_VERSION> --force

# dracut --kver 4.2.6-300.fc23.x86_64 --force

Comment 47 Helmut Schlattl 2015-11-27 16:00:05 UTC

The nowatch-option (comment 45) did the trick! Now the subdevices are not added again. Great! So the booting now works. Now we are just left to get systemd also to be working correctly afterwards, because /lib/systemd/fedora-dmraid-activation causes the same problem: It is readding the subdevices sda1 etc.

For completeness the answers to the other comments:

To comment 38: Yes, they are readded (but I guess this became clear now)

To comment 39: Unless mdadm can cope with my existing dmraid-arrays, I need dmraid (at least, as long as I don't want to set up my system again with mdadm).

To comment 41: rd.md=0 was set on purpose to be sure that it does not interfere with dmraid (because of the described problem)

To comment 42: dracut --print-cmdline yields:
rd.dm.uuid=pdc_bcdbecjef rd.lvm.lv=vg01/swap 
resume=/dev/mapper/vg01-swap root=/dev/mapper/pdc_bcdbecjef1 rootfstype=btrfs rootflags=rw,relatime,space_cache,subvolid=5,subvol=/,

Comment 48 francesco 2015-12-01 11:58:29 UTC

Hi everyone 
I' m facing  the same issue  starting from the updating to  fedora 21
on my HP ML115  having nvidia raid 

 dracut --print-cmdline 
rd.dm.uuid=nvidia_dbcdieji rd.lvm.lv=vg_NORAID/lv00 
resume=/dev/mapper/vg_NORAID-lv00 root=/dev/mapper/nvidia_dbcdieji2 rootflags=rw,relatime,data=ordered rootfstype=ext4

Regards

Comment 49 Fedora End Of Life 2016-07-19 12:30:14 UTC

Fedora 22 changed to end-of-life (EOL) status on 2016-07-19. Fedora 22 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.

agk
bmr
dwysocha
francescostabile
harald
heinzm
helmut.schlattl
htl10
johannbg
jonathan
jsynacek
kzak
lnykryn
lvm-team
msekleta
prajnoha
prockai
s
systemd-maint
vilius
vpavlin
zbyszek
zkabelac