+++ This bug was initially created as a clone of Bug #253538 +++ Description of problem: Installing kernel-2.6.18-25.el5 in a VMware ESX guest instance boots fine. In the same guest instance, installing kernel-2.6.18-26.el5 or later results in a panic on boot: Loading ext3.ko module mkrootdev: label / not found Mounting root filesystem mount: error 2 mounting ext3 mount: error 2 mounting none Switching to new root switchroot: mount failed: 22 umount /initrd/dev failed: 2 Kernel panic - not wyncing: Attempted to kill init! Between 25 and 26 is this changelog entry: - [scsi] update MPT Fusion to 3.04.04 (Chip Coldwell ) [225177] VMware emulates a LSI 1030 hba.
If possible, please provide a crash dump, or at least a stack trace.
I'm building a kernel with some MPT debug flags set http://brewweb.devel.redhat.com/brew/taskinfo?taskID=955347 when the build finishes, could somebody with VMware try it out and post the kernel messages here. Chip
(In reply to comment #3) > I'm building a kernel with some MPT debug flags set > > http://brewweb.devel.redhat.com/brew/taskinfo?taskID=955347 > > when the build finishes, could somebody with VMware try it out and post the > kernel messages here. Well, that didn't build. Let's try this: http://brewweb.devel.redhat.com/brew/taskinfo?taskID=955422 > > Chip >
The moment that task is done I'll test it.
I tested kernel-2.6.18-45.el5.bz279571 but I still see the same panic as VolGroup00 can not be found.
Created attachment 189931 [details] console from kernel-2.6.18-45.el5.bz279571 panic
Eric -- could you have a look at the debug info in comment #7 above and see if it sheds any light on why the VMware virtual mptspi adapter would stop working after the most recent driver update (in both RHEL-4 and RHEL-5)? The only thing I noticed that looked like an error/warning was this: mptbase: ioc0: IOC operational unexpected mptbase: whoinit 0x2 statefault 0 force 0
(In reply to comment #8) > Eric -- could you have a look at the debug info in comment #7 above and see if > it sheds any light on why the VMware virtual mptspi adapter would stop working > after the most recent driver update (in both RHEL-4 and RHEL-5)? BTW the RHEL-4 bug is bug 253538. Chip
(In reply to comment #6) > I tested kernel-2.6.18-45.el5.bz279571 but I still see the same panic as > VolGroup00 can not be found. OK, great. Now, could you bring up the same kernel on bare metal and post the dmesg right after boot? You may need to increase the size of the kernel ring buffer in order to hold all the debugging data. Chip
(In reply to comment #8) > Eric -- could you have a look at the debug info in comment #7 above and see if > it sheds any light on why the VMware virtual mptspi adapter That should be "mptsas" not "mptspi". Chip
(In reply to comment #11) > (In reply to comment #8) > > Eric -- could you have a look at the debug info in comment #7 above and see if > > it sheds any light on why the VMware virtual mptspi adapter > > That should be "mptsas" not "mptspi". I'm building another debug kernel (unfortunately, with the same name) that adds some additional debugging info for the mptsas driver. Could someone please boot this on both bare metal and VMware and post the dmesg boot log. http://brewweb.devel.redhat.com/brew/taskinfo?taskID=956312 Chip
Sure, I'll load it asap.
Created attachment 190331 [details] console from kernel-2.6.18-45.el5.bz279571 panic take 2
(In reply to comment #15) > FWIW, there is a Fedora BZ on this: > https://bugzilla.redhat.com/show_bug.cgi?id=230703 > > Eric Moore (eric.moore) > I confirm that your intuition is true. max_id is indeed zero. If I change it to > MPT_MAX_SCSI_DEVICES, devices are probed correctly. Below are the Port Facts > returned by the VMWare firmware. PortSCSIID is equal to 7, but MaxDevices is 0, > and that's what ioc->devices_per_bus is computed from (is this computation > correct with respect to the semantic of the PortFacts values ?). If I understand Eric Moore correctly, then the kernel here (when it finishes building) http://brewweb.devel.redhat.com/brew/taskinfo?taskID=958967 implements a workaround for this VMWare bug. Chip
Created attachment 191681 [details] workaround VMWare bug This is the patch in the kernel at http://brewweb.devel.redhat.com/brew/taskinfo?taskID=958967
(In reply to comment #17) > Created an attachment (id=191681) [edit] > workaround VMWare bug > > This is the patch in the kernel at > > http://brewweb.devel.redhat.com/brew/taskinfo?taskID=958967 This build has finished now ... Chris can you grab the kernel and let me know what happens?
Chip, you should be using mptspi, instead of mptsas, for vmware. I looked at the log in comment #7, indicates no devices. Probably the fix in #16 would fix it. Christoph Hellwig rejected the patch. THere are newer ESX servers that have fix the problem, you should talk to Ed Goggin. I believe this same issue was covered previously in bugzilla 230703.
(In reply to comment #19) > Chip, you should be using mptspi, instead of mptsas, for vmware. The log in comment #7 was loading both, so I got confused. > I looked at the log in comment #7, indicates no devices. Probably the fix in > #16 would fix it. Christoph Hellwig rejected the patch. THere are newer ESX > servers that have fix the problem, you should talk to Ed Goggin. I believe > this same issue was covered previously in bugzilla 230703. Thanks. Chip
Created attachment 191801 [details] mptspi.c patch
(In reply to comment #18) > (In reply to comment #17) > > Created an attachment (id=191681) [edit] [edit] > > workaround VMWare bug > > > > This is the patch in the kernel at > > > > http://brewweb.devel.redhat.com/brew/taskinfo?taskID=958967 > > This build has finished now ... Chris can you grab the kernel and let me know > what happens? Ignore that one. Use this one instead http://brewweb.devel.redhat.com/brew/taskinfo?taskID=959303
The patch in comment #21 should do. I don't have access to brewweb.devel.redhat.com, I get a "Bad Gateway" error.
(In reply to comment #23) > The patch in comment #21 should do. I don't have access to > brewweb.devel.redhat.com, I get a "Bad Gateway" error. When the build finishes, I'll copy the kernel out to an external web server where you can reach it. Thanks for reviewing the patch, Chip
(In reply to comment #25) > Chip, > > That latest kernel from > http://brewweb.devel.redhat.com/brew/getfile?taskID=958970&name=kernel-2.6.18-45.el5.bz279571.i686.rpm > boots just fine. That one implements a fix that is somewhat more similar to the (rejected) upstream patch here http://marc.info/?l=linux-scsi&m=117432237404247 > Did you also want me to test > http://brewweb.devel.redhat.com/brew/taskinfo?taskID=959303 Actually, I think what I want to do is to use the literal patch from the link above and see if that fixes the problem. Sorry about all the churn. Chip
Created attachment 191841 [details] patch as submitted upstream by Eric Moore of LSI.
Chip, yeah, I believe we will need the last patch, because pfacts->PortSCSIID would of been zero (due to vmware emulation not initializing this in the config page), and when the driver did IocInit, we would of told firmware that we don't support any devices.
(In reply to comment #28) > Chip, yeah, I believe we will need the last patch, because pfacts->PortSCSIID > would of been zero (due to vmware emulation not initializing this in the > config page), and when the driver did IocInit, we would of told firmware that > we don't support any devices. You're referring to the patch in comment #27, right? I'm building two kernels with that patch, one for RHEL-4 (bug 253538) and one for RHEL-5. If you will sign-off on that patch (or even test the kernels), that will grease our internal code review process. Thanks-a-million, Chip
Build finished. RPMs are available from http://brewweb.devel.redhat.com/brew/taskinfo?taskID=959396 for folks on the internal Red Hat network, and at http://people.redhat.com/coldwell/kernel/bugs/279571/ for folks outside the Red Hat network. Thanks for any and all testing. Chip
James: Can you comment on this (from Doug Ledford): > + case SPI: > + default: > + max_id = MPT_MAX_SCSI_DEVICES; Aside from this little bit that would appear to set the max devices as though the card is wide SCSI without actually checking that it is, which then implies that if there ever was a narrow SCSI MPT controller, and you ran this driver on it, it better not break when scanned for devices that are too large to be on a narrow bus, it looks fine to me. And this could be fine too, I just don't know enough about the MPT hardware to know (and for that matter, the chances of someone *still* running a narrow SCSI controller are somewhat slim, although not non-existent). If you are sure this is safe, then ACK.
in 2.6.18-47.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
*** Bug 280301 has been marked as a duplicate of this bug. ***
(In reply to comment #32) > James: > > Can you comment on this (from Doug Ledford): > > > + case SPI: > > + default: > > + max_id = MPT_MAX_SCSI_DEVICES; > > Aside from this little bit that would appear to set the max devices as > though the card is wide SCSI without actually checking that it is, Follow-up from Chip: I think we're OK, or at least safe from regressions. The MPT update patch which introduced the problem with VMWare contained this @@ -943,14 +1354,13 @@ mptspi_probe(struct pci_dev *pdev, const struct pci_device_id *id) * max_lun = 1 + actual last lun, * see hosts.h :o( */ - sh->max_id = MPT_MAX_SCSI_DEVICES; + sh->max_id = ioc->devices_per_bus; IOW, all MPT SPI devices used to set sh->max_id to MPT_MAX_SCSI_DEVICES, and the patch above restores this behavior to the post-update driver.
There's only one controller that works with the mptspi driver, which is wide.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0959.html