Bug 557339 - FC12 is unbootable with Adaptec RAID
Summary: FC12 is unbootable with Adaptec RAID
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: dracut
Version: 12
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Harald Hoyer
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-01-21 03:28 UTC by James.Schatzman@futurelabusa.com
Modified: 2010-04-15 14:14 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2010-04-15 14:14:52 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description James.Schatzman@futurelabusa.com 2010-01-21 03:28:49 UTC
Upon upgrading from FC11 to FC12, a Linux system using a system disk with a software RAIDed root filesystem through an Adaptec 3960 SCSI controller becomes unbootable. A few seconds after selecting the FC12 kernel in Grub, the kernel starts to boot, the screen clears and a message (Boot has failed, sleeping forever), in a very illegible font, appears on the otherwise black screen.

This seems to be related to a message that appears when yum installs the kernel rpm:

W: Possible missing firmware aic94xx-seq.fw for module aic94xx.ko

This firmware module can be obtain from Adaptec:

aic94xx-seq.fw

or as an RPM

aic94xx-seq-30-1.tar.gz

Supposedly, you put the firmware module in /lib/firmware, write a special script, run update-initramfs (in Ubuntu, at least), and you are done. However, other sources suggest using dracut, with no details provided. Please help!!

Comment 1 Harald Hoyer 2010-01-21 11:31:31 UTC
might be another problem.

What is your current kernel command line?

Please add "rdshell rdinfo" and remove "quiet rhgb" to/from the kernel command line.

If you want more debug output, add "rdinitdebug" also.

Comment 2 James.Schatzman@futurelabusa.com 2010-01-21 13:54:10 UTC
Command line for FC12 boot is

ro root=LABEL=ROOT rhgb quiet ata_piix rootdelay=200 all_generic_ide SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=us

I will attempt reboot with

ro root=LABEL=ROOT ata_piix rootdelay=200 all_generic_ide rdshell rdinfo rdinitdebug SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=us

Comment 3 James.Schatzman@futurelabusa.com 2010-01-21 14:14:16 UTC
When I then attempt to boot the FC12 kernel, here is the result.

Hundreds of lines of "dracut" output. The following error sequence is repeated about 10 times

udevadm settle --timeout=0
modprobe scsi_wait_scan
dracut: FATAL: Could not load /lib/modules/2.6.31.9-174.fc12.i686.PAE/modules.dep: No such file or directory

After emitting "No root device found", I then get kicked into an emergency shell.

The result of

ls /lib/modules

is

/lib/modules/2.6.30.8-64.fc11.i686.PAE

I am curious - why does the initramfs contain only a modules directory for a different kernel?

Also, there is NO output indicating that md has been loaded. When I try

modprobe md

I get the same message: "Could not load /lib/modules/2.6.31.9-174.fc12.i686.PAE/modules.dep: No such file or directory".

In order to boot my system, md and dm-mod are going to have to be loaded, I believe, since the root filesystem is ext3 on LVM on MD (RAID-1).

Please advise!

Comment 4 Harald Hoyer 2010-01-21 18:31:17 UTC
are you sure the kernel and the initramfs belong together? they have to have the same version number

Comment 5 James.Schatzman@futurelabusa.com 2010-01-21 19:18:58 UTC
Apparently, there is a problem. All I did was use yum to install the kernel. The rpm installer creates the initramfs, apparently, and apparently it did it incorrectly. Why??

Here is the grub data

title Fedora (2.6.31.9-174.fc12.i686.PAE) DEBUG
        root (hd0,0)
        kernel /vmlinuz-2.6.31.9-174.fc12.i686.PAE ro root=LABEL=ROOT ata_piix rootdelay=200 all_generic_ide rdshell rdinfo rdinitde
bug SYSFONT=latarcyrheb-sunA16 LANG=en_US.UTF-8 KEYTABLE=us
        initrd /initramfs-2.6.31.9-174.fc12.i686.PAE.img

Comment 6 James.Schatzman@futurelabusa.com 2010-01-21 19:22:33 UTC
Where can I find a tutorial for dracut?

I will try to run it manually to create a new initramfs. The man page for dracut is fairly cryptic to a guy who was able to use mkinitrd but only with difficulty.

Thanks!

Comment 7 James.Schatzman@futurelabusa.com 2010-01-21 22:25:38 UTC
I created a new initramfs using dracut. I used the following commands

cd /boot
dracut initramfs-2.6.31.9-174.fc12.i686.PAE.img 2.6.31.9-174.fc12.i686.PAE


When I reboot - the behavior is exactly as before. The boot messages appear to be the same. Once I am dumped into an emergency shell, I find that the only subdirectory in /lib/modules is

/lib/modules/2.6.30.8-64.fc11.i686.PAE

Why is dracut putting a modules subdirectory for the wrong kernel in the initramfs?

After rebooting into the latest FC11 kernel, I do ls /lib/modules and I find

2.6.29.6-217.2.16.fc11.i586      2.6.30.8-64.fc11.i586      2.6.31.9-174.fc12.i686
2.6.29.6-217.2.16.fc11.i686.PAE  2.6.30.8-64.fc11.i686.PAE  2.6.31.9-174.fc12.i686.PAE

So... the appropriate subdirectory is present at the time I run dracut. 

Any ideas?

Comment 8 James.Schatzman@futurelabusa.com 2010-01-21 23:06:12 UTC
I also note that when I repeat dracut with "dracut -v", among other things I see in the output


I: Installing ///lib/modules/2.6.31.9-174.fc12.i686.PAE/kernel/drivers/md/faulty.ko
I: Installing ///lib/modules/2.6.31.9-174.fc12.i686.PAE/kernel/drivers/md/linear.ko
I: Installing ///lib/modules/2.6.31.9-174.fc12.i686.PAE/kernel/drivers/md/multipath.ko
I: Installing ///lib/modules/2.6.31.9-174.fc12.i686.PAE/kernel/drivers/md/dm-multipath.ko
I: Installing ///lib/modules/2.6.31.9-174.fc12.i686.PAE/kernel/drivers/md/dm-round-robin.ko
I: Installing ///lib/modules/2.6.31.9-174.fc12.i686.PAE/kernel/drivers/md/raid6_pq.ko
I: Installing ///lib/modules/2.6.31.9-174.fc12.i686.PAE/kernel/drivers/md/dm-queue-length.ko
I: Installing ///lib/modules/2.6.31.9-174.fc12.i686.PAE/kernel/drivers/md/dm-log-userspace.ko
I: Installing ///lib/modules/2.6.31.9-174.fc12.i686.PAE/kernel/drivers/md/dm-service-time.ko
I: Installing ///lib/modules/2.6.31.9-174.fc12.i686.PAE/kernel/drivers/md/raid0.ko
I: Installing ///lib/modules/2.6.31.9-174.fc12.i686.PAE/kernel/crypto/xor.ko
I: Installing ///lib/modules/2.6.31.9-174.fc12.i686.PAE/kernel/crypto/async_tx/async_tx.ko
I: Installing ///lib/modules/2.6.31.9-174.fc12.i686.PAE/kernel/crypto/async_tx/async_memcpy.ko
I: Installing ///lib/modules/2.6.31.9-174.fc12.i686.PAE/kernel/crypto/async_tx/async_xor.ko
I: Installing ///lib/modules/2.6.31.9-174.fc12.i686.PAE/kernel/drivers/md/raid456.ko
I: Installing ///lib/modules/2.6.31.9-174.fc12.i686.PAE/kernel/drivers/md/raid10.ko
I: Installing ///lib/modules/2.6.31.9-174.fc12.i686.PAE/kernel/drivers/md/raid1.ko

This seems to suggest that dracut thinkgs it is doing the right thing. Is it possible that Grub is loading the wrong initramfs file??  Again, the Grub commands I execute are

title Fedora (2.6.31.9-174.fc12.i686.PAE) DEBUG
        root (hd0,0)
        kernel /vmlinuz-2.6.31.9-174.fc12.i686.PAE ro root=LABEL=ROOT ata_piix rootdelay=200 all_generic_ide rdshell rdinfo rdinitde
bug SYSFONT=latarcyrheb-sunA16 LANG=en_US.UTF-8 KEYTABLE=us
        initrd /initramfs-2.6.31.9-174.fc12.i686.PAE.img


Is there a tool to dump an initramfs so I can see what is in it, to confirm that dracut is building the file correctly?

Secondly, how can I confirm what initramfs file Grub is loading?

Comment 9 Harald Hoyer 2010-01-22 08:13:05 UTC
this is very odd!

where is your /boot located? Is it a real partition?

root(hd0,0) points to /dev/sda1

Is this the /boot partition you mounted?

Comment 10 James.Schatzman@futurelabusa.com 2010-01-22 18:15:04 UTC
/boot is a real EXT3 partition on /dev/sda1 - no RAID or any other complication

/ is an LVM on MD RAID1 partition (LVM logical volume on /dev/md0 on physical partitions /dev/sda3 and /dev/sdb3)

Grub doesn't seem to be having any trouble finding the boot partition. It pops up the boot menu exactly as expected. Then, it has no trouble finding any other kernels, but fails to boot FC12.

I have run dracut a few more times (running under FC11, but pointing it to the FC12 kernel). It seems odd to me, but every time I run it, with exactly the same command line options, the output file has a different size. WIth five seemingly identical runs in a row, the sizes are

11377134
11377343
11360746
11360000
11360539

Why is the size erratic?

I do get the following warning message from dracut:

W: Possible missing firmware ql8100_fw.bin for module qla2xxx.ko

but I have read that this message is not important and can be ignored.

Comment 11 James.Schatzman@futurelabusa.com 2010-01-22 18:57:49 UTC
Flash!  New development!

New kernel updates were apparently posted to the repos today. Doing another update, I got the following updates:


Installed:
  kernel.i686 0:2.6.31.12-174.2.3.fc12               kernel-PAE.i686 0:2.6.31.12-174.2.3.fc12              

Updated:
  bind.i686 32:9.6.1-15.P3.fc12                         bind-chroot.i686 32:9.6.1-15.P3.fc12              
  bind-libs.i686 32:9.6.1-15.P3.fc12                    bind-utils.i686 32:9.6.1-15.P3.fc12               
  dhclient.i686 12:4.1.0p1-17.fc12                      genisoimage.i686 0:1.1.10-1.fc12                  
  gvfs.i686 0:1.4.3-3.fc12                              gvfs-archive.i686 0:1.4.3-3.fc12                  
  gvfs-fuse.i686 0:1.4.3-3.fc12                         gvfs-gphoto2.i686 0:1.4.3-3.fc12                  
  gvfs-obexftp.i686 0:1.4.3-3.fc12                      gvfs-smb.i686 0:1.4.3-3.fc12                      
  hsqldb.i686 1:1.8.0.10-5.fc12                         icedax.i686 0:1.1.10-1.fc12                       
  kernel-firmware.noarch 0:2.6.31.12-174.2.3.fc12       kernel-headers.i686 0:2.6.31.12-174.2.3.fc12      
  perf.noarch 0:2.6.31.12-174.2.3.fc12                  policycoreutils.i686 0:2.0.78-10.fc12             
  policycoreutils-python.i686 0:2.0.78-10.fc12          wodim.i686 0:1.1.10-1.fc12    


Now, I don't have the same problem. No strange issues with /lib apparently coming from the wrong kernel. Booting gets past the issue with the kernel modules. 

However, the root filesystem cannot be mounted. MDADM reports

No devices listed in MDADM conf file could be found.


When I get kicked into a shell I note the following:

The RAID hard drives appear in /dev, but under unexpected device names. What were /dev/sda and /dev/sdb are now /dev/sdi and /dev/sdj. It looks like my USB drives (of which there are several) have now been enumerated prior to my SCSI drives.

I am guessing that this may be why mdadm is unable to find the partitions listed in mdadm.conf. 

Has there been (yet another) change to the ATA subsystem so that devices are numbered differently?  How do I get the initramfs to tell mdadm where to find the RAID partitions?  Can I tell it to enumerate the SCSI drives first (revert to old behavior)?

Comment 12 Harald Hoyer 2010-01-25 12:39:07 UTC
you can try to add "rd_NO_MDADMCONF" to the kernel command line.

see the dracut manpage

Comment 13 James.Schatzman@futurelabusa.com 2010-01-25 16:04:52 UTC
I tried adding rd_NO_MDADMCONF to the kernel command line in grub.conf - no change. Everything in the boot output looks fine until

dracut: Autoassembling MD raid
dracut: mdadm: No devices listed in conf file were found.

At that point, the boot process stops and drops me into a shell. The system drives are devices /dev/sdi and /dev/sdj (versus /dev/sda and /dev/sdb for FC11). While my working system (FC11) has a mdadm.conf file

DEVICE partitions
ARRAY /dev/md0 metadata=0.90 UUID=6c5f2fe8:b54cb47f:132783e8:19cdff95

from the FC12 emergency shell I note that /etc/mdadm.conf contains

DEVICE /dev/sda3,/dev/sdb3
ARRAY /dev/md0 level=raid1 num-devices=2 metadata=00.90 devices=/dev/sda3,/dev/sdb3

which is completely wrong, since FC12 has enumerated the SCSI drives differently from FC11.


1) Do I need to create a new initramfs, putting "rd_NO_MDADMCONF" on the dracut command line?  If so, do I need any other command line options?

2) Is there some way to prevent dracut from messing up the mdadm.conf file?  Wouldn't it be better just to use the version of the file directly from the running (FC11) system?  If that file is configured using UUIDs instead of physical device names then it should work fine. Why would anyone want dracut to hardwire the device names, given that they are likely to be wrong????

3) I notice that the shell I get dropped into is missing basic commands, like df and more. Is there a way to add helpful standard linux commands to the initramfs environment?

4) Again - is there a tutorial somewhere for dracut?? The man page is more than usually terse and uninformative. 

Thanks!

Comment 14 Harald Hoyer 2010-01-26 10:45:30 UTC
(In reply to comment #13)
> I tried adding rd_NO_MDADMCONF to the kernel command line in grub.conf - no
> change. Everything in the boot output looks fine until
> 
> dracut: Autoassembling MD raid
> dracut: mdadm: No devices listed in conf file were found.
> 
> At that point, the boot process stops and drops me into a shell. The system
> drives are devices /dev/sdi and /dev/sdj (versus /dev/sda and /dev/sdb for
> FC11). While my working system (FC11) has a mdadm.conf file
> 
> DEVICE partitions
> ARRAY /dev/md0 metadata=0.90 UUID=6c5f2fe8:b54cb47f:132783e8:19cdff95
> 
> from the FC12 emergency shell I note that /etc/mdadm.conf contains
> 
> DEVICE /dev/sda3,/dev/sdb3
> ARRAY /dev/md0 level=raid1 num-devices=2 metadata=00.90
> devices=/dev/sda3,/dev/sdb3
> 
> which is completely wrong, since FC12 has enumerated the SCSI drives
> differently from FC11.
> 
> 
> 1) Do I need to create a new initramfs, putting "rd_NO_MDADMCONF" on the dracut
> command line?  If so, do I need any other command line options?

kernel command line is ok

> 
> 2) Is there some way to prevent dracut from messing up the mdadm.conf file? 
> Wouldn't it be better just to use the version of the file directly from the
> running (FC11) system?  If that file is configured using UUIDs instead of
> physical device names then it should work fine. Why would anyone want dracut to
> hardwire the device names, given that they are likely to be wrong????

dracut does/did not mess up the mdadm.conf file and it uses the file, which is installed on the system

> 
> 3) I notice that the shell I get dropped into is missing basic commands, like
> df and more. Is there a way to add helpful standard linux commands to the
> initramfs environment?

yes, with:
# dracut -a debug /boot/initramfs-<kernel version>.img <kernel version>

> 
> 4) Again - is there a tutorial somewhere for dracut?? The man page is more than
> usually terse and uninformative. 
> 
> Thanks!    

https://fedoraproject.org/wiki/Dracut

Comment 15 James.Schatzman@futurelabusa.com 2010-01-26 14:05:27 UTC
Thanks for the help. However, the problem remains. My FC11 system contains the working mdadm.conf file

DEVICE partitions
ARRAY /dev/md0 metadata=0.90 UUID=6c5f2fe8:b54cb47f:132783e8:19cdff95

but the mdadm.conf file in the FC12 boot environment contains

DEVICE /dev/sda3,/dev/sdb3
ARRAY /dev/md0 level=raid1 num-devices=2 metadata=00.90 devices=/dev/sda3,/dev/sdb3

which does not work because FC12 enumerates the system partitions as /dev/sdi3 and /dev/sdj3.

How can I fix this?

Thanks!

Comment 16 Harald Hoyer 2010-01-26 14:14:10 UTC

install dracut-004-4.fc12 from
http://admin.fedoraproject.org/updates/dracut-004-4.fc12    

edit /etc/dracut.conf

hostonly="yes"
mdadmconf="yes"

make sure /etc/mdadm.conf is correct

rebuild the initramfs:

# dracut -a debug /boot/initramfs-<kernel version>.img <kernel version>

Comment 17 James.Schatzman@futurelabusa.com 2010-01-26 20:00:28 UTC
Followed the instructions. Forced rpm to install dracut-004-2 on top of dracut-2-13.4 (there is a dependency it complains about otherwise).

Created a new initramfs module.

Rebooted.

Exactly the same result.

I note that the SCSI disks are detected:

[sdi] Attached SCSI disk
[sdj] Attached SCSI disk

Sometime later....

Scanning devices sda1 sde sdf for LVM colume groups.

Sometime later....

Autoassembling MD Raid
mdadm: No devices listed in conf file were found.

Kicked into emergency shell (BTW, with -a debug, I note that there is now a "more" available but still no "df". Looking at /etc/mtab, I note the tempfs, proc, etc., but no system or boot partitions).

cat /dev/mdadm.conf

DEVICE /dev/sda3,/dev/sdb3
ARRAY /dev/md0 level=raid1 num-devices=2 metadata=00.90
devices=/dev/sda3,/dev/sdb3

Exactly as in previous attempts, this is wrong, and the system does not boot. Dracut IS NOT copying the mdadm.conf from the FC11 system, even though the /etc/dracut.conf file says

--------------------------------
# Sample dracut config file

# Specific list of dracut modules to use
#dracutmodules=""

# Dracut modules to omit
#omit_dracutmodules=""

# additional kernel modules to the default
#add_drivers=""

# list of kernel filesystem modules to be included in the generic initramfs
#filesystems=""

# build initrd only to boot current hardware
#hostonly="yes"
hostonly="yes"
#

# install local /etc/mdadm.conf
mdadmconf="yes"

# install local /etc/lvm/lvm.conf
lvmconf="yes"
----------------------------------


and, again, the FC11 /etc/mdadm.conf file is
---------------------------------
DEVICE partitions
ARRAY /dev/md0 metadata=0.90 UUID=6c5f2fe8:b54cb47f:132783e8:19cdff95
---------------------------------

which works perfectly well to boot several previous Fedora releases.

I did misstate one fact previously - the drives that FC12 enumerates earlier than FC11 are SATA. So... prior to the enumeration of sdi and sdj, I see the enumeration of all the SATA drives - no problems there.


BTW, as far as I can tell, dracut lacks the usual --version option. How can I confirm that I am actually running the correct version of dracut?

Comment 18 James.Schatzman@futurelabusa.com 2010-01-27 03:31:54 UTC
SUCCESS AT LAST!

Running dracut -v --debug ... &> dump

and looking at the resulting huge log file, I discovered that dracut was copying mdadm.conf from /etc/mdadm instead of /etc. This is odd, since mdadm itself uses /etc/mdadm.conf. For example, I can put erroneous information in my /etc/mdadm/mdadm.conf file, and it makes no difference whatsoever to mdadm.

So.... I copied my /etc/mdadm.conf to /etc/mdadm/mdadm.conf (which I didn't even know existed, but which had /dev/sda and /dev/sdb spelled out), ran dracut, and all was well.

I note from the debug output that dracut looks at both /etc/mdadm/mdadm.conf and /etc/mdadm.conf. I guess that if the config file exists in the first location, then it gets it from there. I didn't try deleting the /etc/mdadm/mdadm.conf file, but that might have worked for me as well. In any case, THIS IS NOT WHAT MDADM DOES!! Shouldn't dracut get its mdadm.conf the same way mdadm does?

For now - problem solved. I humbly suggest that it would be helpful if dracut's algorithm for creating mdadm.conf were documented - especially, as it is different from what mdadm does.  It would also be helpful if the kernel folks would warn people in big bold letters when the device enumeration algorithm gets changed. I would have had no problem with FC12 except that what happens is

FC11 - enumerates SCSI before SATA

FC12 - enumerates SATA before SCSI

Or maybe it has to do with the controller slot?  Or something else??

Thanks!

Jim

Comment 19 Harald Hoyer 2010-01-27 14:49:44 UTC
(In reply to comment #18)
> SUCCESS AT LAST!
> 
> Running dracut -v --debug ... &> dump
> 
> and looking at the resulting huge log file, I discovered that dracut was
> copying mdadm.conf from /etc/mdadm instead of /etc. This is odd, since mdadm
> itself uses /etc/mdadm.conf. For example, I can put erroneous information in my
> /etc/mdadm/mdadm.conf file, and it makes no difference whatsoever to mdadm.
> 
> So.... I copied my /etc/mdadm.conf to /etc/mdadm/mdadm.conf (which I didn't
> even know existed, but which had /dev/sda and /dev/sdb spelled out), ran
> dracut, and all was well.
> 
> I note from the debug output that dracut looks at both /etc/mdadm/mdadm.conf
> and /etc/mdadm.conf. I guess that if the config file exists in the first
> location, then it gets it from there. I didn't try deleting the
> /etc/mdadm/mdadm.conf file, but that might have worked for me as well. In any
> case, THIS IS NOT WHAT MDADM DOES!! Shouldn't dracut get its mdadm.conf the
> same way mdadm does?

nice! And yes, dracut should do the same. Thanks for finding this bug!

> 
> For now - problem solved. I humbly suggest that it would be helpful if dracut's
> algorithm for creating mdadm.conf were documented - especially, as it is
> different from what mdadm does.  It would also be helpful if the kernel folks
> would warn people in big bold letters when the device enumeration algorithm
> gets changed. I would have had no problem with FC12 except that what happens is
> 
> FC11 - enumerates SCSI before SATA
> 
> FC12 - enumerates SATA before SCSI
> 
> Or maybe it has to do with the controller slot?  Or something else??
> 
> Thanks!
> 
> Jim    

the numbering is a change in the kernel.

Comment 20 Harald Hoyer 2010-04-15 14:14:52 UTC
ok, fixed, besides of bug 559073


Note You need to log in before you can comment on or make changes to this bug.