Bug 909727 - Dracut ignores network (iSCSI) md raid member when local (SATA) member is present
Summary: Dracut ignores network (iSCSI) md raid member when local (SATA) member is pre...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: dracut
Version: 17
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: dracut-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-02-10 18:47 UTC by Radek Hladik
Modified: 2013-08-01 18:37 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2013-08-01 18:37:08 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Radek Hladik 2013-02-10 18:47:08 UTC
Description of problem:

I have a setup with root FS on Linux MD RAID1 with two drives. One is remote one on iSCSI server and the other is local SSD disk. Dracut configuration looks like this (in config it is on one line):

kernel /basefarm/vmlinuz-3.5.2-1.fc17.x86_64 ro   
netroot=iscsi:farmprivradekin:XXXXXX:farmprivradekout:XXXXX.16.1:3260::iqn.2000-10.cz.company:datajumbo.radek.data 
iscsi_initiator=iqn.2000-10.cz.company:basefarm17 
bridge=br0:eth0 ip=10.38.25.213::10.38.25.129:255.255.255.128:basefarm17.company.cz:br0:none 
rd_NO_FSTAB rd_NO_MDADMCONF rd_NO_LUKS rd_NO_LVM rd_NO_DM 
root=UUID=ac84adb2-4203-4f63-b878-6c06baa9bdfb rootfstype=ext4 
rd_MD_UUID=be648f2c:82d05fb3:bfa4ed0e:58f9ce1f
LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYTABLE=us nomodeset biosdevname=0

as you can see, its using bridge br0 with eth0 as slave and static IP, then MD raid identified by UUID and root fs identified by UUID.

This works perfectly, if there is only one raid memeber on iSCSI. Dracut brings up br0 interface with eth0 as slave, connects to iSCSI, assembles RAID and boots.

However if there is a local member, dracut does not even try to bring any networking up and assembles the raid from only one member. To make it worse the local drive is meant more like "cache only" for "slow" iSCSI disk, so this is the worst combination. If I had to choose I would prefer to assemble the raid from the remote drive only as it is supposed to contain the "right" data.



Version-Release number of selected component (if applicable):

dracut-network-018-105.git20120927.fc17.noarch
dracut-018-105.git20120927.fc17.noarch
Linux basefarmf17 3.5.2-1.fc17.x86_64 #1 SMP Wed Aug 15 16:09:27 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux


How reproducible:
Always

Steps to Reproduce:
1. Install Fedora with / on raid1 with only one member from iSCSI server. I  am not sure if this can be easily done by anaconda, but its not a big problem to do it "by hand" from "plain local installation". Dracut is very helpful tool with this.
2. Boot system (raid with 1 memeber)
3. Add local drive into RADI1
4. Boot system again (raid with 2 members)
  
Actual results:
In step 2 dracut brings up networking according to its configuration and assembles root raid as [U] with one remote iSCSI drive.

In step 4 dracut does not bring any networking up, does not connect to iSCSI server (even does not even try to) and assembles root as [_U] with only local SATA drive.

Expected results:
In step 4 dracut should bring up networking according to its configuration, connect to iSCSI server and assemble root raid as [UU] or with the newer drive if the raid is not in sync.

Additional info:

If I change the raid UUID during boot (in grub), dracut can is not able to find it on local drive and brings up networking, connects to iSCSI and tries to find it there - and then drops to shell as it can not find the UUID.

This may be a really nasty situation and may lead to data loss if the local drive gets out of sync and the system is rebooted.

Comment 1 Harald Hoyer 2013-02-11 15:56:58 UTC
Have you tried dracut from F18? It might fix your issue.

Comment 2 Radek Hladik 2013-02-11 20:42:19 UTC
(In reply to comment #1)
> Have you tried dracut from F18? It might fix your issue.

I tried the newest version for F17. But I will try the one from F18 tommorow to see if there is any difference...

Comment 3 Radek Hladik 2013-02-11 23:22:48 UTC
I was not able to use F18 dracut RPMS directly in F17 because of their dependency on systemd (version that is not available for F17). I am not sure that I want to upgrade systemd from F18 rpms :)

So I took other machine with F18 with these versions

dracut-network-024-23.git20130118.fc18.x86_64
dracut-024-23.git20130118.fc18.x86_64

and I created new dracut initramfs:

dracut --nomdadmconf --no-hostonly initramfs-3.7.4-204.fc18.test.img 3.7.4-204.fc18.x86_64 -v

and I tried to run this initramfs and the appropriate kernel from F18 (3.7.4-204...) on F17 machine. I know that the system will be not able to boot completely, as there will be kernel version mismatch  but I thought it should be enough to see how dracut will assemble the root raid.
In F18 systemd is so integrated into dracut that I was not able to find the exact moment when dracut finished its work but I think that the behaviour is same as in F17 version.

When I booted the F18 vmlinuz+initrd with proper RAID UUID, I've got into systemd emergency shell with networking down, no iscsi connection and raid assembled from only one mirror.

When I booted the F18 vmlinuz+initrd with invalid RAID UUID, I've got into dracut shell but before that dracut brought the netowrking up, connected to iscsi and I was even able to see the remote /dev/sdX drive. Of course that the RAID was not assembled at all as the UUID was invalid.

From what I can see it is consistent with F17 dracut. When there is a local raid memeber then dracut assembles the raid without even trying network+iscsi (and then the boot failed and I've got the systemd emergency shell). But when dracut is not able to find raid on local drives it brings up networking+iscsi and tries there...

Comment 4 Harald Hoyer 2013-02-12 12:57:57 UTC
(In reply to comment #3)
> I was not able to use F18 dracut RPMS directly in F17 because of their
> dependency on systemd (version that is not available for F17). I am not sure
> that I want to upgrade systemd from F18 rpms :)
> 
> So I took other machine with F18 with these versions
> 
> dracut-network-024-23.git20130118.fc18.x86_64
> dracut-024-23.git20130118.fc18.x86_64
> 
> and I created new dracut initramfs:
> 
> dracut --nomdadmconf --no-hostonly initramfs-3.7.4-204.fc18.test.img
> 3.7.4-204.fc18.x86_64 -v
> 
> and I tried to run this initramfs and the appropriate kernel from F18
> (3.7.4-204...) on F17 machine. I know that the system will be not able to
> boot completely, as there will be kernel version mismatch  but I thought it
> should be enough to see how dracut will assemble the root raid.
> In F18 systemd is so integrated into dracut that I was not able to find the
> exact moment when dracut finished its work but I think that the behaviour is
> same as in F17 version.
> 
> When I booted the F18 vmlinuz+initrd with proper RAID UUID, I've got into
> systemd emergency shell with networking down, no iscsi connection and raid
> assembled from only one mirror.
> 
> When I booted the F18 vmlinuz+initrd with invalid RAID UUID, I've got into
> dracut shell but before that dracut brought the netowrking up, connected to
> iscsi and I was even able to see the remote /dev/sdX drive. Of course that
> the RAID was not assembled at all as the UUID was invalid.
> 
> From what I can see it is consistent with F17 dracut. When there is a local
> raid memeber then dracut assembles the raid without even trying
> network+iscsi (and then the boot failed and I've got the systemd emergency
> shell). But when dracut is not able to find raid on local drives it brings
> up networking+iscsi and tries there...

ok, thanks for trying. I will try to reproduce here and will add it to my dracut test suite.

Comment 5 Harald Hoyer 2013-02-12 13:08:22 UTC
(In reply to comment #3)
> I was not able to use F18 dracut RPMS directly in F17 because of their
> dependency on systemd (version that is not available for F17). I am not sure
> that I want to upgrade systemd from F18 rpms :)
> 
> So I took other machine with F18 with these versions
> 
> dracut-network-024-23.git20130118.fc18.x86_64
> dracut-024-23.git20130118.fc18.x86_64
> 
> and I created new dracut initramfs:
> 
> dracut --nomdadmconf --no-hostonly initramfs-3.7.4-204.fc18.test.img
> 3.7.4-204.fc18.x86_64 -v
> 
> and I tried to run this initramfs and the appropriate kernel from F18
> (3.7.4-204...) on F17 machine. I know that the system will be not able to
> boot completely, as there will be kernel version mismatch  but I thought it
> should be enough to see how dracut will assemble the root raid.
> In F18 systemd is so integrated into dracut that I was not able to find the
> exact moment when dracut finished its work but I think that the behaviour is
> same as in F17 version.
> 
> When I booted the F18 vmlinuz+initrd with proper RAID UUID, I've got into
> systemd emergency shell with networking down, no iscsi connection and raid
> assembled from only one mirror.
> 
> When I booted the F18 vmlinuz+initrd with invalid RAID UUID, I've got into
> dracut shell but before that dracut brought the netowrking up, connected to
> iscsi and I was even able to see the remote /dev/sdX drive. Of course that
> the RAID was not assembled at all as the UUID was invalid.
> 
> From what I can see it is consistent with F17 dracut. When there is a local
> raid memeber then dracut assembles the raid without even trying
> network+iscsi (and then the boot failed and I've got the systemd emergency
> shell). But when dracut is not able to find raid on local drives it brings
> up networking+iscsi and tries there...

btw, you can get rid of systemd in the initramfs with:

# dracut --omit systemd ....

Comment 6 Radek Hladik 2013-03-03 13:12:09 UTC
I had to investigate the issue deeper and I found out that the problem is unfortunately in the very mode of dracut's operation. 

When dracut parses the command line, it creates simple check scripts that should check whether the root device is ready. Then udev is executed and is asked to do cold plug of certain devices. There are also udev rules generated that will do "md incremental assembly" iff some raid member is found.
After this the finished checks are executed and if they succeed dracut continues to "find and mount root" phase. If any of the checks fails dracut continues to execute other modules until all checks pass (or until certain number of iterations is reached).
The check for md raid is really simple, basicaly test, if there is an appropriate device in /dev. 
So what happen in my setup is that dracut launches udev, it finds local SATA drive, does incremental assembly (so we have partially assembled raid in "auto-read-only mode") and tests for the presence of md device. The device is already present and so the check is satisfied and dracut continues to the next phase without even touching any other dracut module.

I managed to "solve" the issue with this simple workaround:

diff -urd 95iscsi/iscsiroot.sh 95iscsi.new/iscsiroot.sh
--- 95iscsi/iscsiroot.sh        2013-03-02 09:40:45.000000000 +0100
+++ 95iscsi.new/iscsiroot.sh    2013-03-02 10:10:16.000000000 +0100
@@ -171,6 +171,8 @@
     handle_netroot $iroot
 fi

+echo 'started' >/tmp/iscsistarted
+
 need_shutdown

 # now we have a root filesystem somewhere in /dev/sda*
diff -urd 95iscsi/parse-iscsiroot.sh 95iscsi.new/parse-iscsiroot.sh
--- 95iscsi/parse-iscsiroot.sh  2012-04-05 13:54:38.000000000 +0200
+++ 95iscsi.new/parse-iscsiroot.sh      2013-03-02 10:05:39.000000000 +0100
@@ -79,6 +79,8 @@
     modprobe -q iscsi_tcp || die "iscsiroot requested but kernel/initrd does not support iscsi"
 fi

+echo '[ -e "/tmp/iscsistarted" ]' > $hookdir/initqueue/finished/iscsi_started.sh
+
 # Done, all good!
 rootok=1


It basically adds another check to see if the iscsi module finished. But it is not the proper solution. Way better would be any means how check can say something like "yeah, if we needed we could continue, but I think it would be better to try more (meaning: I have array with 1/2 disks only)" or avoid the incremental assembly at all, prepare as much as possible and then try to find the best members for the requested raid.

The issue with md incremental assembly is that it needs to see all the members before the array is first written to. I just tried this simple scenario:
1) md127=sda1+sdc (sda is SATA, sdc is iSCSI,md127 is RAID1 with /)
2) sync md127
3) unplug sda1 without any "--fail" or something
4) array is degraded with only sdc md127=sdc
5) power off the machine
6) reconnect sda1
7) power on and see what incremental assembly is doing
8) udev sees sda1 and thinks "cool, its raid member, lets start assembling the array". At this moment we have md127=sda1 and there is no way how to find out that we actually assembled array with old data... So if dracut would boot at this moment we would have a serious issue
9) if dracut is somehow forced to continue with iSCSI, it starts iSCSI and connects the remote drive. At that moment udev sees sdc and says "hey, new member for md127, lets add it" and passes the command to mdadm. Mdadm gets sdc and checks that the member is actually newer than sda1 and says "kicking non-fresh member sda1 from array" and adds sdc. Only at this time we have the array in correct state md127=sdc.

Note: If you just --fail the member you will get different scenario, because mdadm will update the failed member superblock with the information that it has been removed from the array and incremental assembly will ignore such member. The point is to simulate the situation when there is no possibility to update failed member superblock and you need to rely on event counter.

Comment 7 Harald Hoyer 2013-05-29 12:50:09 UTC
commit c3dd68fcf108fc80e0bdcac64d553b1a3727be7a

Comment 8 Fedora End Of Life 2013-07-04 06:56:53 UTC
This message is a reminder that Fedora 17 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 17. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '17'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 17's end of life.

Bug Reporter:  Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 17 is end of life. If you 
would still like  to see this bug fixed and are able to reproduce it 
against a later version  of Fedora, you are encouraged  change the 
'version' to a later Fedora version prior to Fedora 17's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 9 Fedora End Of Life 2013-08-01 18:37:13 UTC
Fedora 17 changed to end-of-life (EOL) status on 2013-07-30. Fedora 17 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.