Hide Forgot
Description of problem: On one machine I have, half the drives are on ahci and the other one on mvsas. During initialization, dracut attempts to assemble the boot drive (/dev/md1) when only the ahci drives are available, resulting in failure. Assembling them manually from the dracut error console works, after which boot proceeds. Version-Release number of selected component (if applicable): dracut-009-12.fc15.noarch kernel-2.6.40.6-0.fc15.x86_64 How reproducible: 100% Steps to Reproduce: 1. Set up a machine as described above 2. Install Fedora 15 (may require slipstreamed boot media as the mvsas driver on the F15 install medium doesn't work) 3. Try to boot Actual results: Expected results: Additional info:
Created attachment 531962 [details] Boot messages (after manual restart of md1)
Created attachment 531963 [details] Screenshot - dropping to dracut shell
Created attachment 531964 [details] Screenshot - /proc/mdstat
Created attachment 531965 [details] Screenshot - manual restart of /dev/md1
Can you try dracut from Fedora 16? I changed the mdadm assemble strategy there. http://koji.fedoraproject.org/koji/buildinfo?buildID=271877 # rpm -Uvh http://kojipkgs.fedoraproject.org/packages/dracut/013/18.fc16/noarch/dracut-013-18.fc16.noarch.rpm # dracut -f # reboot
Sorry for the delay... this is a machine on which I have to schedule reboot, so I will try it as soon as I can get a reboot window.
Bad news... this change actually made it worse, not better. Now instead of failing to assemble the root device, /dev/md1, it now partially assembles and fails on all four md devices, requiring them all to be torn down and re-assembled manually from the dracut shell.
Furthermore, after this change systemd doesn't boot all the way to the login prompt anymore; logging in via ssh it shows the following things in ps: 2762 ? Ss 0:00 /bin/plymouth --wait 2763 ? Ss 0:00 /bin/plymouth quit
Please add "rd.debug log_buf_len=1M" to the kernel command line and attach dmesg.
Created attachment 533306 [details] dmesg, as requested
[ 0.000000] Command line: ro root=UUID=28d969db-6776-497f-8bea-f967fd464a6e vga=0x317 selinux=off SYSFONT=latarcyrheb-sun16 LANG=en_US.utf8 KEYTABLE=us nomodeset rd.debug log_buf_len=1M So, you did not specify rd.md.uuid=<md raid uuid>. So dracut tries to assemble _every_ raid device it sees. $ man dracut.kernel ... rd.md.uuid=<md raid uuid> only activate the raid sets with the given UUID. This parameter can be specified multiple times. ... Because no rd.md.uuid exists on the kernel command line and /etc/mdadm.conf exists (was copied in the initramfs), dracut is calling: # mdadm -As --auto=yes several times, but mdadm fails to add the new (appearing) devices to the array, which, in my humble opinion it should do.
Peter, What is your RAID config? I presume you are using standard mdadm RAID with recent metadata? RAID1, RAID5, or? Once the RAID is assembled, could you post the output of mdadm --detail? I have a box here with an hpt controller in it, I might be able to setup a RAID that spans both controllers (if I can convince the hpt to not export the devices in AHCI mode). Cheers, Jes
Peter, One more note, I fixed a race in the assembly code in mdadm-3.2.2-10. Could you verify that you have at least -10 or later on the system as well? Thanks, Jes
(In reply to comment #11) > Because no rd.md.uuid exists on the kernel command line and /etc/mdadm.conf > exists (was copied in the initramfs), dracut is calling: > > # mdadm -As --auto=yes > > several times, but mdadm fails to add the new (appearing) devices to the array, > which, in my humble opinion it should do. Humble opinion aside, if you are calling mdadm -A and expecting it to add devices to an already existing array, then your code is broken. Assemble does one thing and one thing only: takes a list of currently free devices and tries to make runnable arrays out of them. It does not touch already assembled (or partially assembled) arrays, and it does not touch component devices that are already claimed in some way. It would take a major rearchitecting of assemble mode to support adding drives to existing arrays, which is why when we wanted to support that, we wrote a new mode: incremental. If you want mdadm to support adding newly found devices to already created and partially populated arrays, then use incremental support. Anything else is a bug.
The system has a mix of RAID1 (/boot) and RAID6 (the others); the metadata is 0.90 because these drives were pulled from an older system as-is. And no, I haven't specified rd.md.uuid, although with the earlier dracut it would successfully assemble all arrays *except* /dev/md1 (/).
mdadm is: mdadm-3.2.2-9.fc15.x86_64 There is no -10 in the F15 repos. I guess I could try upgrading to 16.
Well, I upgraded to Fedora 16, and it made absolutely no difference, including systemd never giving me a shell prompt (which has been the case since getting the fc16 dracut as requested in #5). Note that it never actually shows any kind of graphical display. ps still shows: 2112 ? Ss 0:00 /bin/plymouth --wait 2115 ? Ss 0:00 /bin/plymouth quit
(In reply to comment #15) > The system has a mix of RAID1 (/boot) and RAID6 (the others); the metadata is > 0.90 because these drives were pulled from an older system as-is. > > And no, I haven't specified rd.md.uuid, although with the earlier dracut it > would successfully assemble all arrays *except* /dev/md1 (/). v0.90? uh oh, then you really want to be careful - you will want mdadm-3.2.2-14 (should be the latest) as there is a bug in the old version where you can destroy your raids if you try to upgrade to new drives > 2TB and then grow the raid beyond 2GB. For F15 the new version will limit you to 2TB per drive, and it should do 4TB per drive in F16 - that said, I would recommend migrating to newer at some point when you can. I was trying to reproduce your problem using my hpt622 but I wasn't able to get it to run in non ahci mode, so I will try with a sil controller as soon as it arrives. Cheers, Jes
HPA, the underlying problem appears to be that, for whatever reason, modprobe scsi_wait_scan is broken and not waiting until all scsi scans are complete. If dracut were doing assembly incrementally instead of relying on scsi_wait_scan to let it know that it can now run mdadm -As, then you would be OK as the incremental assembly would happen eventually and the boot could continue after that. To test that idea, can you add rd_MD_UUID= lines to the boot command line in grub and see if dracut's incremental mode works any better than it's assemble mode? Harald, maybe the proper thing to do here would be to use incremental assembly always, and if your root isn't available in the 180 second or so timeout, then drop to the rdshell. In any case, due to dracut's reliance on scsi_wait_scan which appears broken, combined with using mdadm -As which is *not* tolerant of a broken scsi_wait_scan and can *not* be used incrementally, we aren't booting.
dracut-013-19.fc16 has been submitted as an update for Fedora 16. https://admin.fedoraproject.org/updates/dracut-013-19.fc16
please retry with dracut-013-19.fc16
Package dracut-013-19.fc16: * should fix your issue, * was pushed to the Fedora 16 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing dracut-013-19.fc16' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2011-16098/dracut-013-19.fc16 then log in and leave karma (feedback).
Tested. Worked marvellously.
I was thinking some more about this change, and something that really concerns me (if done wrong) would be that it would seem to probabilistically start the system with one or more arrays in degraded mode, even though all the drives are there. Even if those drives are added later, it would mean they are now stale and require an entire resynchronization cycle, during which the array is not redundant.
What change would seem to probalistically start the system with one or more arrays in degraded mode? You have to be more specific with your comments.
My understanding of the dracut change (in dracut-013-19.fc16) was that instead of relying on scsi_wait_scan it would run mdadm incrementally until the array can be started. This doesn't mean the array is complete, however, and arguably there isn't any way to know if the array is ever going to complete. Consider the case of a RAID6: the array is startable with N-2 drives, but it isn't complete until N drives. Do you start it at N-2? Do you wait for N (what is a drive is missing?) Do you wait for N but time out after some time T if you have at least N-2 drives available?
For the record: I have verified that the system can boot even with one drive physically removed.
mdadm arrays now start as soon as possible in a state called auto-read-only. This state will not dirty the array unless the filesystem above initiates a write. In this way, we keep the array clean until the last possible minute, and if no final drives have shown up by the time the file system finally starts issuing writes, then we go ahead and switch to read-write mode and treat the array as degraded. As a practical matter, since udev processes events sequentially during boot up, we generally have all of our devices before the filesystem ever writes to the device (this is because the queued up device events are generally before the queued up filesystem available udev event, although this behavior is not guaranteed). However, in the event that the device goes live before all devices are present, and you want to minimize resync time, then I suggest you add a bitmap to the device as that will limit resyncs to just those sections of drives that were dirtied prior to the drive being readded. This can reduce resync times from days on huge arrays to just minutes. It does, however, come at the cost of some small overhead and latency on write requests.
dracut-013-19.fc16 has been pushed to the Fedora 16 stable repository. If problems still persist, please make note of it in this bug report.
This fix as currently implemented is NOT SAFE. I just had an array failure because of this -- dracut brought up the array with 3 drives (insufficient for making the disk operate, as this is a RAID-6 with 6 drives) and now the kernel refuse to load the rest of the drives as their serial numbers don't match. mdadm --assemble --force seems to work, and I'm hoping for minimal loss of actual content, but in effect this "fix" has promoted a boot failure into a data loss event.
There should be no data loss, but there also shouldn't have been an event counter update. Can you elaborate on dracut bringing the raid array up with three devices please? What happened afterwards, did the machine attempt to continue booting up, was it power cycled, how did you end up getting to the point that the drives no longer saw each other as in sync? If you could relay the entire sequence of events, starting from the last time the machine was shut down until you issued the --assemble --force, that would be most helpful.
This is what I *know* of the sequence: - The machine went down for a kernel update on Feb 4. - The machine never came back on. I incorrectly guessed that this was due to an SELinux relabel. - I didn't have a console on the machine, so I reset it after about 24 hours. - When it didn't come back online after several hours, I attached a monitor, and found that it was sitting at the dracut shell, with three drives pulled into /dev/md1 and /dev/md3. Oddly enough /dev/md2 (on the same drives) was correctly assembled.