Description of problem: I built a system with a supermicro P4DC6+ motherboard, dual Xeon 2.2ghz processors, adaptec 2005s ZCR Raid card, 4 Quantum SCA 36GB hds, and 1024MB ram. I have attempted (about a hundred times) to install several different versions of Redhat, including 7.3 (+ errata updates) and 8.0. Every single time, no matter what I do during installation, or what boot parameters I give the kernel, I get the exact same behavior- The installation process completes just fine, and the system goes for reboot... on boot, it goes through and does the hardware detection, finding the cdrom, the floppy, etc. At this point, on this and all subsequent reboots, I get the exact same error message: loading scsi_mod module kmod failed: /sbin/kmod -k -s block-major-8, errno=2 VFS: cannot open root device on "sda2" or 08:02 please append a correct "root=" boot option Kernel Panic: VFS: unable to mount root FS on 08:02 However, if I boot to the "linux rescue" or to a boot disk, booting is fine, and it mounts my sysimage, and I can access all my mounted partitions, including / and /boot. The failure ONLY occurs on a normal boot, and it occurs when choosing either the "2.4.18-3SMP" or the "2.4.18-3" kernel from LILO or GRUB. Now, through the course of all my investigations, I have determined the following things: 1. the dpt_i2o driver (which takes the place of the generic aic7xxx scsi driver) is the appropriate driver (according to adaptec) for the raid scsi sub- system. I confirm this by seeing when the "rescue" or "boot disk" kernels load, they refer to loading the dpt_i2o driver, and all works fine. I however do not see the "dpt_i2o" driver referred to during normal boot, I suspect because the boot is failing too early, though it would make sense that it would need to load earlier to mount the root FS. 2. in /etc/modules.conf, I do see: alias scsi_hostadapter dpt_i2o so, I know the installation is telling the kernel to load the correct scsi driver. I do not know where the "loading scsi_mod module" line is coming from in the boot process, but it appears right after the line about "RedHat Nash" starting, so maybe its in there? 3. There are hundreds of newsgroups that I have found on this subject, most of which seem to hint at the fact that maybe the RedHat installation is not correctly building in the dpt_i2o driver into the kernel, and that the system is then trying to load the generic "block-major-8" scsi driver instead, which fails to give access to the root FS. also, hints have been that "loading scsi_mod module" followed by kmod means its trying to load the scsi driver as a module, from the not-yet-mounted FS, so it obviously can't access the driver from an FS that isn't loaded yet. 4. I have tried about every possible kernel command out there, like "apic", "noapic", "noapm", and a whole host of others, turning off power APM modes, forcing single proc, etc etc etc. None have any affect whatsoever. 5. I have however found several different listings from people who HAVE successfully installed redhat on nearly identical configurations (motherboard, etc), and I was very careful to check all the hardware compatibility lists for any known conflicts before venturing in this direction. 6. I also updated the motherboard's bios to the latest rev, 1.2c. 7. I found several newgroups indicating that maybe "hyperthreading" was causing a problem, so I disabled that. no go. I have stripped down the system completely, taking out the NIC, swapping video cards, taking down the system to one processor, taking out the RAID card (and installing to just one SCSI hd), disabling the SCSI subsystem all together (and trying to install to an IDE hd). None of this has made any difference in solving my problem! I have spent hours on the phone with redhat support, with adaptec support, and now with supermicro tech support. Everyone seems to be baffled as to what might be the cause. It seems to me that a default install onto hardware which DOES otherwise work with other OS's like microsoft, and getting this error persistently, it in my mind seems like it has to be some bug or undocumented conflict. I would certainly appreciate some guidance in this, I'm at the end of my rope and my neck is on the line with my employer to figure this out! Version-Release number of selected component (if applicable): kernel-2.4.18-3SMP and kernel-2.4.18-3 How reproducible: Everytime Steps to Reproduce: 1. Install Redhat 7.3 or 8.0 2. reboot 3. choose either SMP or -up Kernel (from either GRUB or LILO) Actual results: booting began as normal, detecting the cdrom, floppy, etc. It gets to the line that says RedHat Nash starting, then things go haywire. Expected results: the booting should have continued, mounting the root FS on /dev/sda2 Additional info: This system has the supermicro P4DC6+ motherboard, with dual Xeon 2.2ghz's, 1024MB ram, the Adaptec 2005s ZCR Raid card, and 4 Quantum SCA 36GB hds. The scsi disks are setup to be a RAID-5 array, with about 105 GB in the array. I am partitioning the array as follows: /dev/sda1 /boot 128 MB /dev/sda2 / 3000 MB /dev/sda4 /usr 10000 MB /dev/sda5 /tmp 1000 MB /dev/sda6 /home 5000 MB /dev/sda7 /swap 2048 MB /dev/sda8 /var 85300 MB
What's the output of `/sbin/mkinitrd -v -f /tmp/initrd.test 2.4.18-3smp` if you boot into rescue mode and run it from chrooted in /mnt/sysimage
using modules: ./kernel/drivers/scsi/scsi_mod.o ./kernel/drivers/scsi/sd_mod.o ./kerne l/drivers/scsi/dpt_i2o.o using loopback device /dev/loop1 /sbin/nash -> /tmp/initrd.ZQF4K3/bin/insmod `/lib/modules/2.4.18-3smp/./kernel/drivers/scsi_mod.o' -> `/tmp/initrd.ZQF4K3/lib/scsi_mod.o' `/lib/modules/2.4.18-3smp/./kernel/drivers/sd_mod.o' -> `/tmp/initrd.ZQF4K3/lib/sd_mod.o' `/lib/modules/2.4.18-3smp/./kernel/drivers/dpt_i2o.o' -> `/tmp/initrd.ZQF4K3/lib/dpt_i2o.o' Loding module scsi_mod Loading module sd_mod Loading module dpt_i2o
Do you see the dpt_i2o module loaded when you boot? Does it find the drives correctly?
Created attachment 90851 [details] JPG screen shot of the first page show during a normal (non-rescue) boot... Sorry this one is so blurry, something weird happened in the conversion... i tried adjusting the colors to make it somewhat possible to make out the words (don't stare too long it'll hurt your eyes)... you can make out things like the "kswapd" and "apm disabled - amp not SMP safe" and the "PIIX4" lines referring to loading IDE stuff right before it recognizes the CDROM on the IDE bus.
Created attachment 90852 [details] JPG screen shot of the next page shown during a normal (non-rescue) boot... This one is quite a bit clearer (for some weird reason) and you can make out most of what's happening, including the RAMdisk call and the "md" driver loading and detecting the RAID arrays and disks, then farther down, RedHat Nash starts, then VFS mounts root (as ext2), then "Loading scsi_mod module" then it continues with the Kernel panic error I originally posted.
so, based on what I see in those screen shots, as well as what I see on the screen in person, I do not see any reference to the "dpt_i2o" driver being loaded, but do see references to "scsi_mod" and "md" (which I am wondering if that is the same as the sd_mod)... however, as I mentioned before, during the rescue boot and the boot-disk boot, the blue text-GUI screen pops up (shortly after the place that the kernel panic is happening in the regular boot) and refers to "Loading dpt_i2o"... i figured it would be easier to give you the screen shots (blurry or not) to show you where in the boot process the error occurs, cause I am not familiar enough with it to understand all of what I'm seeing.
So it's been 9 days since my last post, and I still haven't heard back from you all. Do you need more information? What can I do?
It's been a month, and I still haven't heard from anyone... what's going on, have you all given up on this problem? What can I do to get a resolution to it?
Does the new initrd (/ newer kernel erratas) work any better?
well, about a month ago, we decided (after trying everything else and having no luck) to try to install windows on the machine, and had similar problems with its install. That led us to believe there was some hardware problem. After swapping in and out everything else we could, we were left with trying different RAM. We bought 4 256MB chips of ram (to replace 2 512MB chips and 2 CRIMMS). Supermicro indicated that this board would accept up to 512MB chips, but the memory vendor we purchased from said that the board should only be used with 256MB chips. So we bought those, installed them, and subsequent installs of RedHat and Windows worked flawlessly. To me this indicates either corrupt RAM chips (which I have no way to test) or a flaw in how Supermicro lists the specs for that motherboard. In either case, apparently the root of the problem was RAM related, and had nothing to do with the OS. I do apologize for the miscommunication of it being a bug to you. I wish there was some way for an OS installation (which all along was going just fine, and the problem occured only on boot AFTER a successful install) to detect the hardware problem instead of it just waiting for the first boot before the problem occurs. BUT, microsoft's installations were just hanging in the middle, without even a boot, so I guess it's not something I could expect of either OS. <shrugs> Oh well, thanks anyway.