Bug 466534
Summary: | can't find /dev/root - scsi + smp ??? | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | John Ellson <john.ellson> |
Component: | mkinitrd | Assignee: | Peter Jones <pjones> |
Status: | CLOSED DUPLICATE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 10 | CC: | amann, bertrand.benoit, dcantrell, eharrison, fred99, hdegoede, jdunn, katzj, kernel-maint, mwc, pallas, pjones, tilmann, tjb, wtogami, yaneti |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2008-12-10 20:27:03 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 438944 | ||
Attachments: |
Description
John Ellson
2008-10-10 19:18:35 UTC
Can you add a more complete log, please? Yes.. give me few hours. BTW This might be caused by: BZ #461850 Well that was a really frustrating exercise! Whats the trick to getting a serial connection running these days! All the Howtos are useless. /etc/initrd has changed, /var/lock has changed, .... I had minicom to minicom working briefly, then nothing! I couldn't get them to do it again. Is something else locking up /dev/ttyS0? ----------------- Anyway, I took some pics instead. The attached shows boot failing .. followed by sdb waking up. /dev/sdb is the second drive in the striped pair forming /dev/md0 with the root filesystem on it. Perhaps something is waiting only for the first scsi drive response? My other system at work isn't raid, but it does have 6 scsi drives, so again if something is waiting for the first response only it would fail. Created attachment 320082 [details]
final screen of failed boot sequence
Created attachment 320104 [details]
Full text capture of failed boot
[Isn't there a standard that requires ttyS0 on the top connector? !!! And where is my DB9 breakout box? ]
Finally got the serail cable to work. Here is the capture of the failed boot.
bug #466607 might be the same problem [OK, thats how to get bugs to hyperlink...] Possibly the same problem: bug #466607 bug #464636 bug #462233 bug #461850 bug #459109 bug #454663 Going by <https://fedoraproject.org/wiki/QA/ReleaseCriteria> I propose that this bug is an F10 blocker. Still doesn't boot: kernel-2.6.27.2-23.rc1.fc10.i686 mkinitrd-6.0.67-1.fc10.i386 Still doesn't boot: kernel-2.6.27.4-58.fc10.i686 mkinitrd-6.0.68-1.fc10.i386 I just did a clean install of rawhide on a Dell Precision 670 and had the same problem. Booting rescue and remaking the initrd worked around the problem. I should have said remaking the initrd with the --with=scsi_wait-scan worked around the problem. Confirming that: mkinitrd --with=scsi_wait-scan ... worked for me too. kernel-2.6.27.4-68.fc10.i686 mkinitrd-6.0.69-1.fc10.i386 How does this help someone installing from a Fedora-10-Live DVD ? Can this option be provided as a kernel option? I was hoping that this would the problem: mkinitrd-6.0.70-1.fc10 ---------------------- * Tue Nov 4 17:00:00 2008 Peter Jones <pjones redhat com> - 6.0.70-1 ... - Make scsi waiting happen on any device with a scsi modalias. but no such luck. Using: kernel-2.6.27.4-79.fc10.i686 mkinitrd-6.0.70-1.fc10.i386 The "mkinitrd --with=scsi_wait-scan ..." still provides a workaround. Capture of failed boot coming next... Created attachment 322760 [details]
Full text capture of failed boot - mkinitrd-6.0.70-1.fc10.i386
I don't fully understand the implications of ".. with a scsi modalias" ? In case its relevant, my /etc/modprobe.conf contains: alias eth0 e100 alias scsi_hostadapter aic7xxx install snd-emu10k1 /sbin/modprobe --ignore-install snd-emu10k1 && /usr/sbin/alsactl restore >/dev/null 2>&1 || : alias char-major-81 bttv alias usb-controller uhci-hcd I'm kicking this one over to F10Target. I just had the problem again with the latest kernel update. Is this not a blocker because it's not happening to everyone with scsi/lvm? That's correct. So far, very few people are reporting this. We'd take a fix for it, but I don't believe we'd delay the release for it. Created attachment 323519 [details] make scsi_scan_wait be used by default > That's correct. So far, very few people are reporting this. We'd take a fix > for it, but I don't believe we'd delay the release for it. I respectfully disagree; turning on SCSI_SCSAN_ASYNC and removing the scsi_wait_scan from mkinitrd is a bad combination that has caused quite a few problems in Fedora: bug #471903 bug #470726 bug #466607 bug #466071 bug #466534 bug #465225 bug #454663 And RHEL: bug #464636 bug #461850 bug #459109 These are only the ones assigned to mkinitrd; there are probably others that didn't (yet) get correctly assigned. Some bugs above have many 'me too's. Attached patch will add scsi_scan_wait by default. This is how it used to be done, before scsi_mod was built into the kernel (see bug #454663 for the first report). Another way to fix this is to add more tests to trigger the use of nash's stabilized() call. But LKML guidance suggests scsi_scan_wait is the right way (TM), and the current stabilized() call is not long enough in at least a couple of cases (bug #461850 and bug #466607) (In reply to comment #20) > bug #471903 sorry, that should be bug #471093 Created attachment 323920 [details]
Fix to mkinitrd to wait for scsi
Hello,
the attached patch fixes mkinitrd for me, but serves mainly to show the main
reason, which is twofold. First, there is a case typo which results in scsi
devices not being recognized and the consequently variable wait_for_scsi is not
set to "yes".
The second cause is due to emitmodules() being called twice (once for
GRAPHICSMODS modlist, and once for MODULES modlist -- the default). The problem
is that when the GRAPHICSMODS modlist is being processed, the wait_for_scsi
variable will get unset and will not be used later, when MODULES modlist is
being handled.
The patch fixes the first typo and prevents wait_for_scsi variable to be unset
during handling the GRAPHICSMODS modlist.
I consider it rather dirty, but well, so is the emitmodules() function that has
side effect on global variables and is suddenly (I assume with advent of
plymouth) called twice.
Please fix this problem. Keep in mind, that an unfixed version of fedora is not able to be installed on an SCSI system and probably other systems. Offering an updated version through a online repository is useless, because the system ist not able to boot after the primary installation. I suffered from this problem too and found "scsi_wait-scan" to be a workaround to use the system at last. This will affect quiete a few people. I just went through a several hour process of diagnosing and hacking around this on an install of the fedora 10 pre-release. Similar system configuration to what has already been reported: dual core proc, scsi controller (3ware 9550SX), lvm. I'd like to add in the factors that make this hard to diagnose: - The new graphical boot that's enabled by default goes to 100% (or a full white progress bar) and stops there. Hitting esc to bring up the text display just shows a blank screen through the whole process - I presume due to the 'quiet' kernel parameter. - The default grub.conf has a timeout of 0, so the kernel boots immediately giving you no time to alter the kernel boot parameters (there might be a key you can hold during boot to stop this, but my grub-fu is weak). - Once you do strip rhgb and quiet from the boot parameters, it's not very obvious that it's the delayed scsi device identification that causes lvm to fail. Typically the first checks are to make sure the lvm structures are ok disk, and that the appropriate drivers are being loaded in the initrd. - The real difficulty here will be that there is no obvious string to search on to find this problem. Only after diagnosing the problem was I able to relate it to this bug. To most, this will be "fedora 10 doesn't boot after install," a rather vague problem. On the up side, I now know more about the fedora boot process :) This bug appears to have been reported against 'rawhide' during the Fedora 10 development cycle. Changing version to '10'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping This fixes this issue for Adaptec SCSI cards: # rpm -q mkinitrd mkinitrd-6.0.71-2.fc10.i386 --- /sbin/mkinitrd.orig 2008-12-09 17:00:49.000000000 -0800 +++ /sbin/mkinitrd 2008-12-09 17:30:54.000000000 -0800 @@ -1518,6 +1518,7 @@ -o "BusLogic" == "$module" \ -o "mptbase" == "$module" \ -o "pata_" == "${module::5}" \ + -o "aic7" == "${module::4}" \ -o "qla" == "${module::3}" \ -o "sata_" == "${module::5}" \ ]; then http://kojipkgs.fedoraproject.org/packages/mkinitrd/6.0.71/3.fc10/ This was fixed in a more generic way (without hard coding controller names) in this build. Please test it and report back. Works for me. kernel-2.6.28-0.121.rc7.git5.fc11.i686 mkinitrd-6.0.73-5.fc11.i386 Do you want the console log? Easiest path forward I see for a new install affected by this: * Symptom: F10 installs OK, but your SCSI system won't boot after install is done. * Workaround: - Hit ESC (?) early enough to interrupt the boot; - Add "scsi_mod.scan=sync" to the kernel command line, - After boot and firstboot complete, update mkinitrd - After updating mkinitrd, you must rebuild your /boot/initrd (run /usr/libexec/plymouth/plymouth-update-initrd as root). Could this be added to the common bugs page at fedoraproject.org? (In reply to comment #27) > http://kojipkgs.fedoraproject.org/packages/mkinitrd/6.0.71/3.fc10/ > This was fixed in a more generic way (without hard coding controller names) in > this build. Please test it and report back. Confirmed that mkinitrd-6.0.71-3.fc10 works for me. *** This bug has been marked as a duplicate of bug 470628 *** *** This bug has been marked as a duplicate of bug 466607 *** |