From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7) Gecko/20040514 Description of problem: We did several tests (same machine and adapter). Please find below several combination and results: Booting on LSI22320-R: - RHAS2.1-U4 : KO - RHAS3-U3 (early release) : KO - RHAS2.1-U3 : OK - RHAS3-U1 : OK Booting on Adaptec SCSI CARD 39160: - OK on all four RHAS versions listed above. For RHAS2.1, Issue Track #43391 tracks this problem. Version-Release number of selected component (if applicable): kernel-2.4.21-15.5.EL How reproducible: Always Steps to Reproduce: 1. To use LSI22320-R as boot device. 2. To try to boot ... 3. Same boot with Adaptec SCSI works fine. Actual Results: Unable to boot. Expected Results: We should boot ... Additional info: Issue traker #43391 identified this problem to be fixed on RHEL2.1-U4 too. A workaround has been found by using previous driver version 2.05.05 instead of current version 2.05.11. This regression found on RHEL2.1-U4 (vs. U3) and RHEL3-U3 (vs. U1) should be fixed in the current RHEL3-betaU3.
A version has been posted for RH2.1. Could we expect to get same fix for RHEL3 soon ? Thanks in advance.
We're waiting the fix to be integrated in RHEL3-U3 as soon as possible. We just got the "RHEL 2.1 Update 5 Beta Preview ISOs" but it seems the fix was not integrated either.
We are trying to reproduce this. We have several adapters that use the same driver, but we do not seem to have an LSI22320-R adapter. I will continue to investigate. Does the LSI22320-R device work correctly when the system boots off something else? That is, when you boot one of the failing systems (like RHAS2.1-U4 or RHAS3-U3) from the 39160 adapter, does the LSI22320-R adapter work correctly after the system is up, when it is used to access secondary storage?
On both of our machines (Bull HW and Intel HW) we experienced problems with LSI22320-R adapter when going through it to access disk boot device. I've been told that another Bull team hit this problem too when there were booting on adaptec card but a LSI22320-R adapter was plugged to another disk (not the boot disk). I didn't check myself their configuration and which LSI22320-R adapter/FWversion was used; so I cannot be 100% sure either. If you could get such LSI22320-R adapter to set up an in-house configuration, it will be easier for you to investigate this problem. Also, it has been well identified that it's linked to LSI22320-R driver level (see IT #43391: 2.05.05 version is OK, 2.05.11 version is NOT). Thanks in advance for your investigation.
I just booted RHEL 3 U3 on an Intel Tiger. The boot device is an: Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07) I know the LSI22320-R is also a dual Ultra320 SCSI, but I am not sure if this is an exact match or not. The driver version is: Fusion MPT SCSI Host driver 2.05.16 scsi0 : ioc0: LSI53C1030, FwRev=01030600h I would like to compare my system to yours in detail. Please post a sysreport for your Intel system with the LSI22320-R installed. Ideally, this would be from an o.s. version that fails to boot LSI22320-R but works with the 39160 adapter. If this is not readily available, then a sysreport from any o.s. you can boot on the Intel box will provide a starting point.
It has already been posted on IT #43391. Please chack comment "Event posted 07-06-2004 07:00am by Pierre.Fumery". Explanation and sysreport are already available there. Thanks for your investigation.
Thanks for the pointer to the sysreport in IT #43391, I had not seen it. The sysreport (cbrunet.975.tar.bz2) shows a succcesful boot of a disk attached to an LSI Logic adapter: - boot disk is sdb at scsi2, channel 0, id 2, lun 0 - scsi2 : ioc2: LSI53C1030, FwRev=01030a00h - disk is Vendor: MAXTOR Model: ATLAS10K4_73SCA Rev: DFV0 - RHEL 3 pre-U3 (-15.18.EL). - Fusion MPT SCSI Host driver 2.05.16 - lspci shows two LSI Logic adapters: Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 08) Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07) (not sure if the rev difference matters) So, in fact, it is possible to boot some storage from an LSI Logic adapter with a recent driver version. The problem, as you have said, occurs when you try to boot the Chaparral SR0812 storage on the LSI Logic Adapter. Can you confirm that the Chaparral is connected to a different LSI Logic Adapter adapter than the MAXTOR ATLAS? Is it possible to try it on the same adapter as the MAXTOR ATLAS?
So, if you would, please try booting from the MAXTOR ATLAS, with the Chaparral attached to the other port on the same adapter as the MAXTOR ATLAS. Then try it again with the Chaparral on one of the ports on the other mpt fusion adapter. Please post any error messages that occur when the system tries to configure the Chaparral. Thanks.
Thanks for your investigation. I apologize to not answering your questions sooner but July 14th is "Bastille"-day here and nobody was on site to have a look at your request. Unfortunately, Claude who did all tests investigation since one month now, left for vacation last Tuesday and he won't be able to answer himself on all tests he performed to identify/reproduce/isolate this problem. However, I've checked with him before he left and he confirmed his machine did boot on an Adaptec card only and it didn't boot on a LSI22320-R one. From your investigation (Additional Comment #7 From Tom Coughlan), it seems you discovered this machine successfully booted from scsi2 : ioc2: LSI53C1030 (Fusion MPT SCSI Host driver 2.05.16) and I would suspect it's from its motherboard. Anyway, I'll need to further investigate and to ask our test team to get another people assigned to make other tests if/when needed. It'll be harder as Claude already spent one month and we'll have to start investigating/testing again.
Tom, I just double-checked Claude's logbooks and it seems "stlinux9" was his victim and I found the following: stlinux9 motherboard ioc2: LSI53C1030, FwRev=01030a00h, Ports=1, MaxQ=255 ioc3: LSI53C1030, FwRev=01030a00h, Ports=1, MaxQ=255 It would confirm what you saw in his traces, I mean that booting from motherboard drivers did work (LSI internal chip) but booting from LSI22320-R (add-on card) did fail. Do you know if LSI53C1030 (internal chip) uses same driver than LSI22320-R (LSI adapter card) ? Because I also discovered that your traces said : driver 2.05.16 We know that this LSI22320-R card was working well (driver 2.05.05) on RHAS2.1-U3 and RHAS3-U1. We know that this LSI22320-R card did NOT work (driver 2.05.11) on RHAS2.1-U4 and RHAS3-betaU3. We know that this LSI22320-R card did NOT work (driver ??.??.??) on RHAS2.1-betaU5. Do you know which LSI driver version is integrated in RHAS2.1-betaU5 ? I would expect 2.05.11 as for RHAS3-betaU3 that would explain why it does not work. Could we upgrade LSI driver version on both RHAS2.1-betaU5 and RHAS3-betaU3 to 2.05.16 ? And I would expect it could work. Could you agree on that ? Could you check on your side if you use 2.05.16 as well when it works ?
Yes, I was aware of your holiday. No problem at all and, Happy Bastille day! Both the internal chip and the add-on card use the same driver (mpt fusion). RHEL 3 U3 and AS 2.1 U5 both contain mptfusion driver 2.05.16. (Bastien, please update the Issue Tracker, I do not have write access to it.) So at this point we have: 2.05.05 is OK, 2.05.11 fails with add-on card connected to Chaparral storage 2.05.16 fails, as above. I am currently investigating a possible issue that is specific to the Chaparral. I'll be looking at driver differences as well. If you are able at some point to test the Chaparral connected to the internal chip, that may be helpful. Also, if there are any additional error messages when the failure occurs, that will help. (I understand about delays due to vacations. No problem.) Tom
I'll try to figure out how (people+machine) to test what you asked for. In the mean time, I grabbed a little bit further in Claude's logbooks and I found the following (I just translated his sentences): During another test, if you try to install a RHL AS2.1 U4 on a SR0812 connected through a LSI22320-R, installation hangs when the mptscsih driver is being loaded. I'm not sure but if it can bring another little clue ...
Also some other tests done with SJ0812 gave different results but it's not obvious to understand what did work and what didn't. As far as I understood and as far as I know, when using RAID feature (SJ0812) it could have worked one time through LSI22320-R (with RHEL3 Update pre-beta3 (=kernel 2.4.21-15.18)). Are there different configuration paths ? I'm not sure at all about what I tried to understand and I wouldn't like to add confusion. So, please don't put too much importance on this current note. But if it could help ...
I investigate a little more around here and I think I could answer your initial question resulting from your sysreport analysis. Can you confirm that the Chaparral is connected to a different LSI Logic Adapter adapter than the MAXTOR ATLAS? ===> Yes, the MAXTOR ATLAS is internal boot disk which is linked/reached through LSI Logic adapter (on-board LSI53C1030 chip). ===> "The LSI22320-R adapter is not used to access to the boot disk; it's only used to access data in a SCSI disk subsystem (SR0812 from Chaparral)." (see IT #43391). Is it possible to try it on the same adapter as the MAXTOR ATLAS? ===> I'm not sure we can connect the Chaparral to the on-board LSI53C1030 chip. I need to ask people who work on these machines to know (1) if it can be done and then (2) to try it. It'll be tomorrow as it's already late here.
I apologize for my bad statement this morning (on 2004-07-15 05:18) which could have mistaken you, when I wrote: "he confirmed his machine did boot on an Adaptec card only and it didn't boot on a LSI22320-R one." In fact, tests were done by using (accessing data and not booting) SR0812 from Chaparral either through an Adaptec adapter (result OK), either through a LSI22320-R adapter (result KO). Boot was done on internal MAXTOR ATLAS disk reached through on-board LSI53C1030 chip.
I have a theory about the cause of this problem. The most interesting difference between the 2.05.05 driver that works, and the next version, 2.05.11.03, that does not work, is: -#define MPT_LAST_LUN 31 +#define MPT_LAST_LUN 255 I have seen some external SCSI RAID subsystems that to not handle being probed for LUNs > 31 very well at all. One way to test this theory is to change the SCSI "whitelist" so that the system will only probe the Chaparral box for sequential LUNs, up until it finds a gap, where it will stop probing. This change is shown in the attached patch. I have also attached a re-built scsi_mod.o with this patch applied. It is built for -15.18.EL. If you need something different let me know. To test this, 1. mv /lib/modules/2.4.21-15.18.EL/kernel/drivers/scsi/scsi_mod.o to a safe place. 2. put the attached scsi_mod.o in /lib/modules/2.4.21-15.18.EL/kernel/drivers/scsi/scsi_mod.o 3. make a new initrd (mkinitrd), and add a new entry to elilo.conf. 4. boot the new initrd, with the Chaparal attached to the LSI22320-R. Thanks. Tom
Created attachment 102053 [details] patch to Chaparral entry in the SCSI whitelist
Created attachment 102054 [details] scsi_mod.o for 2.4.21-15.18.EL, with modified entry for Chaparral Use bunzip2 to restore the file.
Created attachment 102055 [details] Patch for scsi_scan, to limit LUN scan on Chaparral (second try) Oops. That first patch had an extra change in it that I did not intend. Only the last hunk was intended. The scsi_mod.o file is okay as-is.
Hi Tom, We did try without success to use your scsi_mod.o binary. "mkinitrd" failed before I discovered that your binary was an ia32/x86 binary. We're using IA64 boxes and we need ia64/Itanium binaries. Could you please provide us such ia64 scsi_mod.o binary to let us try your patch ? People testing it are test people and they don't have the whole right environment to compile and to get right binary format. Thanks in advance.
I took these following comments from IT #43391 that was for RHEL2.1. This defect is for RHEL3 and this patch has been built against 2.4.21-15.18.EL, so it should be for RHEL3. My guess is that the Chaparral is having a hard time when it is probed for LUNs > 31. This patch will prevent this by making the system stop probing when it funds the first undefined LUN. - {"CNSi", "JSS122", "*", BLIST_SPARSELUN}, // Chaparral SR0812 SR1422 + {"CNSi", "JSS122", "*", BLIST_FORCELUN}, // Chaparral SR0812 SR1422 The attached scsi_mod.o has this patch. This module is built for 2.4.21-15.18.EL SMP ia64. Please test it by following these steps: 1. mv /lib/modules/2.4.21-15.18.EL/kernel/drivers/scsi/scsi_mod.o to a safe place. 2. bunzip2 the attached file and put the resulting scsi_mod.o in /lib/modules/2.4.21-15.18.EL/kernel/drivers/scsi/scsi_mod.o 3. make a new initrd (mkinitrd), and add a new entry to elilo.conf. 4. boot the new initrd, with the Chaparal attached to the LSI22320-R. Tom
Tom, Could you please (re)post your ia64 binary in this defect as we don't succeed to extract it from IT #43391 ? Thanks.
Created attachment 102205 [details] 64bit version of this tentative patch Gotthis patch by E-mail.
Good news: With the new scsi_mod release (patch delivered by email to Pierre Fumery), the error message "MID not found" does no longer appear and the server correctly boots with a SR0812 disk subsystem linked to a LSI22320-R adapter. The tests have been done using an internal disk as system disk (not a disk in the SR0812). In the same configuration, the server couldn't boot correctly with the scsi_mod release delivered in RHEL3 Update 3. Questions: 1- With this patch, it is mandatory that the LUNs in the SR0812 are numbered consecutively from 0 (with no hole). Is my understanding correct? 2- What will be the status of the "official" delivery? Regards, Claude.
1. Yes, with this patch it is mandatory that the LUNs be numbered consecutively, starting at zero. 2. In the latest RHEL 3 U3 respin we have restored the 2.05.05 driver, in addition to the 2.05.11.03 and 2.05.16 drivers. This was done so that customers can switch to the older driver, in case we are not able to ship a better solution in time. AS 2.1 x86 and IPF also have all three driver versions. 3. Now that we know what the problem is, we need to pick the best solution for U3 and U5, assuming that we are able to make any changes at this late stage. I am still investigating our options here. I hope to have an answer on this tomorrow. Tom
Here is an update. I don't have a final resolution yet. Recent versions of the mpt fusion and aic79xx drivers increase the max_lun parameter from 64 to 256. When SPARSELUN is set for a device, the SCSI midlayer unconditionally probes LUN values up to max_lun, so high-numbered LUNs are being probed on these devices. The problem occurs because on a parallel SCSI bus, the driver must use the packetized protocol to address LUNs > 63. These drivers are apparently not doing this, and instead, they are using the non-packetized protocol for LUN 64 and above. This can cause a system hang, or non-existent devices to be configured, depending on the details of the device and the driver. The right solution to this is to fix the drivers. I have started a discussion with the driver maintainers on this. It is not likely that we will be able to make a significant changes to these drivers at this late stage in U3/U5. Instead we will look for a workaround, like the patch that removes the SPARSELUN flag from some devices. The problem is knowing which devices. I will work with the driver maintainers to determine what our options are, and pick the best one for U3/U5.
I have confirmed that the problem is in the mpt fusion and aic79xx drivers. They are probing LUNs > 63 on devices that do not support the packetized protocol. I expect to have a fix for this in U4/U6, but it is too late to make a change like this in U3/U5. It would also be inappropriate to remove the SPARSELUN setting for the JSS122 Charparral storage device in U3/U5, because there is nothing wrong with the way the device is behaving, and because if we change it, then some customers who are running it with different drivers will likely find that their LUNs are no longer configured. The best solution available for U3/U5 is to manually switch to the older mpt fusion v2.05.05. This can be selected during install with the "expert noprobe" option. If you are not installing to the mpt fusion device, then edit /etc/modules.conf and re-make the initrd after installation. There is no simple workaround for the aic79xx driver, but we have not currently received any bug reports, and neither has the Adaptec maintainer. Given the fact that U3/U5 have essentially shiped at this point, it is best to fix the drivers in U4/U6.
It looks like the needed module is not present in the BOOT kernel because of the limit size of floppy. In order to workaround this issue, we will have to create a driver disk with the old driver inside. A driver disc with the lsi 2.05.05 driver will be created in a couple of days (our goal is to have it done by the end of the current week if possible). As some machine do not have a floppy drive the actual medium will be cdrom based.
Created attachment 103031 [details] ISO image driver disk for AS2.1 U5 This driver disk should work for AS2.1 U5. Boot the system using the option noprobe (aka, type linux noprobe at the elilo prompt), when asked to select drivers, tell it you have a driver disk on CD, put in this CD image, load the mptscsih_20505 driver, should be able to proceed after that.
Created attachment 103032 [details] Driver disk for RHEL3 U3 Same thing, RHEL3 U3. If either disk fails to work, please report the exact problem back here in this bugzilla and I'll get it taken care of.
The driver disk you deliver us, is declared as "bad" when we tried an installation in noprobe mode (elilo linux noprobe, or elilo linux expert noprobe). We got the following error messages: - screen from "Ctrl Alt F4": <4>FAT: bogus logical sector size 0 <4>VFS: Can't find a valid FAT filesystem on dev 03:00 <4>VFS: Can't find a valid ext2 filesystem on dev ide0(3:0) - screen from "Ctrl Alt F3": trying to mount hda not a new format driver, checking for old can't find either disk identifier, bad driver disk. Remark: We encountered another kind of pb with the driver CD for AS2.1 U5 (see IT#43391) Regards, Claude.
Could you tell us what might be the final solution for this pb that is major for Bull? What do you think about the proposal done by Didier Marcon (see its email to Susan S. Denham)? Could Red Hat deliver to Bull a specific CD1 using the old driver (2.05.05) during the boot phase and, after, as the default driver, and insure full support for this delivery? Regards, Claude.
To the Bull folks: Since late last week, we've been running into a few problems making this driver disk. (Unfortunately, yesterday was the Labor Day holiday in the U.S.) We expect to have an update on the problem today. The goal is obviously to provide you with a driver disk as quickly as possible so that you can use it to run the certification tests on the NS5160 and 6160.
Trying this now.
This is still a no go.. elilo linux dd Prompted for the Driver Disk, inserted and received this error: No Devices of the appropriate type were found on this driver disk. Would you like to manually select the driver, continue anyway, or load another driver disk? On virtual console 3 I have this output: modules to insert e100 e1000 mptbase_20505 mptscsih_20505 qla2300 module(s) e100 e1000 mptbase_20505 mptscsih_20505 qla2300 not found load module set done I then selected the "Manually choose" option and scrolling to the bottom of the list is: mptfusion SCSI driver module (mptscsih_20505) And again on VC3: modules to insert mptbase_20505 mptscsih_20505 module(s) mptbase_20505 mptscsih_20505 not found load module set done
Created attachment 103552 [details] Yet another disk. Found a bug in the creation of the modules.cgz file on the disk image. Try and see if this image solves the problem.
Created attachment 103554 [details] New AS2.1 disk image Same bug existed on the AS2.1 disk image, so new image uploaded.
Gets further! Driver disk recognizes the mptscsi_20505 driver and loads it. Further on in the install right after I fill in the info for Network install (NFS server and directory) I get a page fault. I'll post a picture of the panic but I'm afraid the interesting info has scrolled off. I'll get serial console going and capture it again.
Created attachment 103560 [details] Picture of Panic
Created attachment 103561 [details] serial dump of the panic. I didn't see it in the picture, but in the serial output you can see mpt_base_replay is referenced.
Created attachment 103565 [details] Yet another RHEL3 driver disk iso I would have expected the base scsi module to have been loaded by the loader already, I didn't see that in the output, so I put scsi_mod.o into the modules.cgz file and explicitly called it out in modules.dep. See if that solves your problem.
No Change... Pid: 0, comm: swapper EIP is at mpt_base_reply [mptbase_20505] 0x290 (2.4.21-20.EL) psr : 0000101008022038 ifs : 800000000000040b ip : [<a0000000002f0a30>] Not tainted unat: 0000000000000000 pfs : 000000000000040b rsc : 0000000000000003 rnat: 0000000000000000 bsps: e000000004cafd00 pr : 80000000af756927 ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f b0 : a0000000002f0990 b6 : e0000000044a0b60 b7 : a0000000002f07a0 f6 : 1003e00000000000000de f7 : 0ffdd8000000000000000 f8 : 1003e0000000000005340 f9 : 1003e0000000000000060 r1 : a000000000306938 r2 : 0000000000000000 r3 : 0000000000000008 r8 : e00000000645fd08 r9 : 0000000000005340 r10 : e0000000066128a8 r11 : 00000000000000de r12 : e0000000049f7ca0 r13 : e0000000049f0000 r14 : 0000000000000060 r15 : e00000000498c4e8 r16 : e0000000044a0b60 r17 : 0000000000000060 r18 : 0000000000000000 r19 : a000000000307c78 r20 : 000000000000000f r21 : 0000000000000060 r22 : e0000000064500b8 r23 : e0000000064500c0 r24 : 0000000000000060 r25 : e00000007f41804a r26 : 0000000000000001 r27 : 0000000000000060 r28 : e00000007f41804e r29 : 0000000000000001 r30 : 0000000078f34040 r31 : e0000000064e4000 Call Trace: [<e0000000044158e0>] sp=0xe0000000049f78a0 bsp=0xe0000000049f1578 show_stack [kernel] 0x80 [<e000000004431ae0>] sp=0xe0000000049f7a70 bsp=0xe0000000049f1550 die [kernel] 0x200 [<e000000004451330>] sp=0xe0000000049f7a70 bsp=0xe0000000049f14f8 ia64_do_page_fault [kernel] 0x310 [<e00000000440e9a0>] sp=0xe0000000049f7b00 bsp=0xe0000000049f14f8 ia64_leave_kernel [kernel] 0x0 [<a0000000002f0a30>] sp=0xe0000000049f7ca0 bsp=0xe0000000049f14b8 mpt_base_reply [mptbase_20505] 0x290 <0>Kernel panic: Aiee, killing interrupt handler! In interrupt handler - not syncing
Did you get a serial console dump from the updated driver disk? I'm curious to know if the scsi subsystem initialization message shows up this time.
Created attachment 103596 [details] another serial console capture... this is the serial console output.. I don't think your going to see the scsi subsystem initialization from the serial console. When I load the driver disk with the console going to video I can switch to console 4 (I think its console 4) and see the scsi disks (sda1, sda2, etc...)
When is the panic happening then? If you are using the driver disk, and you switch to console 4 and see the SCSI disks, at what stage does the kernel oops?
OK, I see it now. This is a bug in Anaconda I think. Bill, can you try this on your machine and see if it works (it does here): boot with linux noprobe dd load the driver disk, select the mptscsih_20505 driver select the proper network driver(s) run the install that should work. The problem appears to be that if you don't disable the autoprobing of devices then anaconda tries to load the mptscsih module (the new one that's on our boot disks, not the version 2.05.05 from the driver disk that we've already loaded) and when you try to load that driver twice, both copies of the driver end up trying to access the same hardware and of course things break horribly. Using the noprobe option here settles that issue, but does mean all devices have to be selected from the list, none are found automatically.
Yup, adding noprobe prevents the system panic. Its installing now. This will work but I'm going to open an anaconda bug stating that it should remove internal pci-ids that are provided by driver disks. That won't help for this time though.
Pierre, Can you have the Bull team confirm that by using the boot syntax noted in comment #61 that the install can be accomplished? By communicating this information to your customers you should then have a viable workaround.
As we explained some time ago, and by phone with some peoples at Redhat, as the machine dont have any floppy drive, it is not possible to insert any floppy in the machine. for AS 2.1: There are (from my opinion) only tree ways to get the correct boot: 1- a new version of the mpt driver without regression is build in the boot CD 2 the 2.05.05 version is build in the boot CD 3 the loader releases the CD to enable the possibility to get a new driver CD. For RHEL 3 AS U3 we are always waiting for a driver CD Regards
For RHEL3, the driver CD on this bugzilla is *the* driver CD. It's done. You have to use the noprobe option or the machine will flake out, but that can't be avoided without breaking other things, so this is the best that it's going to get without spinning a new U3 install ISO that has a fixed version of anaconda.
What you deliver us, DOES NOT WORK. Could you confirm me that your "final" proposal is: =================================================== - EFI command: elilo linux noprobe dd - Driver CD from the attachment delivered in "Comment #56" (2004-09-07 17:54) Content of this CD: =================== [root@stlinux11 root]# ll /mnt/cdrom total 231 -r-xr-xr-x 1 root root 63 Aug 24 18:34 modinfo -r-xr-xr-x 1 root root 233278 Sep 7 23:52 modules.cgz -r-xr-xr-x 1 root root 39 Sep 7 23:52 modules.dep -r-xr-xr-x 1 root root 541 Aug 24 18:45 pcitable -r-xr-xr-x 1 root root 51 Aug 24 18:08 rhdd-6.1 [root@stlinux11 root]# ERROR Message returned: ======================= No devices of the appropriate type were found on this driver disk. Would you like to manually select the driver, continue anyway or load another driver. REMARK: ======= - using "elilo linux noprobe expert", we got the same error - using "elilo linux noprobe", the CD1 cannot be ejected when the window "Insert your Driver Disk" is displayed. - the CD itself has a readable content (no transfer pb). In modules.cgz, there is a mptscsih_20505.o file but under the "2.4.21-20.EL" directory. Is it OK? Could you tell me what to do?
The missing peice here is a proper document describing the steps needed. I will attempt to fill in the best I can but Docs will have to produce a proper procedure for this. When you recieved your "erorr message" you then have to choose to manually select the driver. This is because auto-probing has to be turned off to prevent the broken driver from loading later. You will be presented with a list of different drivers, Scroll all the way to the bottom and you will see an entry for mptscih_20505 driver. Hit return here and the driver will load. If the install is being done from CDROM you can stop loading drivers and continue with the install. If a network install is desired then the network drivers will have to be loaded in this fasion as well.
GOOD NEWS: it is OK! It was not obvious to find the way to get the mptscsih_20505 driver: - go on after the "error message" choosing the "Manually choose" option - look until the end of the long list to find mptscsih_20505 (and not the line for "MPT Fusion" at the beginning of the list) ... after that the installation is OK. I will have to ask you for some other precisions about the final result. But I don't want to wait more before sending you these good news. Regards, Claude.
And now, my questions about the installation result: 1- In /etc/modules.conf, there is no line for "mptbase". Usually there is such a line. Even after a standard RHEL3 Update3 installation. Is this line useless? Was this line useless in the previous releases for RHEL3 and RHEL2.1? 2- The qla2300 and aic7xxx drivers are not automatically loaded by this installation. It is not the standard behaviour. Moreover these drivers are defined in /etc/modules.conf, but not present in /boot/efi/efi/redhat/initrd-2.4.21-20.EL.img (they are not in linuxrc). It is not homogeneous. A final "mkinitrd" command seems to have been forgotten at the end of the installation phase. Warning! The /etc/modules.conf file has to be modified before running mkinitrd because the 2 MPT Fusion releases are simultaneously present: mptscsih and mptscsih_20505. mkinitrd with a not modified modules.conf produces a kernel panic when the server is rebooted. 3- To have a completly correct installation, I had to: - suppress the "alias scsi_hostadapter1 mptscsih" line in /etc/modules.conf - run mkinitrd - reboot. The installation itself in "noprobe dd" mode is not simple at all. After that, it is very hard to demand at our customers to modify the /etc/modules.conf file, run a mkinitrd command and reboot again the server. What could you propose us to simplify all of that?
Back to the actual patch issue, restated here again to re-ground everyone in the discussion for what version of the mptfusion driver is requested in RHEL 3 U4 and RHEL 2.1 U6: "LSI mptfusion driver 2.05.16 included in RHEL 3 U3 and RHEL 2.1 U5 provided new features but introduced a regression from at least version 2.05.11 that prevents several of Bull's NovaScale systems from booting. As a result, several Bull NovaScale systems that contain the LSI adapter are in manufacturing Stop Ship. This regression has been discussed on both RHEL2.1-U4 (IT #43391) and RHEL3-U3 (BZ #127385). Bull needs an updated LSI mptfusion driver version into RHEL 2.1 U6 that fixes the regression." Here is the status of including the fix in a RHEL update: Red Hat *will* include a patch in RHEL 3 U4 and RHEL 2.l U6 that updates the mptfusion driver from 2.05.16 to 2.05.16.02. This patch addresses BZ 127385, FZ 131392 (AS2.1) and FZ 131393 (RHEL3). And this from the mptfusion maintainer Eric Dean Moore at LSI on 15 September; note that LSI says that the 2.05.23 driver **is not** ready for submission upstream: "Please apply the 2.05.16.02 driver to your Red Hat kernels. This is the driver which solved the "max_lun on non-packetized SCSI devices" issue reported by Tom Coughlan back in July. This fix was implemented by Larry Stephens, and perhaps Larry could submit his changes/patch upstream to Kernel.org. Regarding the 2.05.23 driver. I spoke to my manager, Terry Gibbons, just yesterday on having this submitted upstream. Terry suggested that we hold off on submitting that driver, as this driver version hasn't been widely accepted by various customers." Sue here again: As a result (and to repeat), RH will not include the 2.05.23 driver in RHEL 3 U4 and RHEL 2.1 U6 and *will* include the LSI-recommended 2.05.23 driver.
I think you meant "*will* include the LSI-recommended 2.05.16.02 driver." Thanks.
Sue here: Yes, that's what I meant. Bad cut and paste : ( Thanks for correcting.
Posted to IT 43391 by Jeremy Katz 9/28: There isn't a way to automatically run a script on the normal install path. But, if you take a different approach, then it can work easily. Instead of replacing the mptfusion modules, you will want to do the following. * Copy mptbase_20505.o and mptscsih_20505.o into modules.cgz * Do as before without the rename * Edit /modules/pcitable. * Replace all instances of mptscsih with mptscsih_20505 * Edit /modules/modinfo. * Replace all instances of mptscsih with mptscsih_20505. * Replace all instances of mptbase with mptbase_20505. * Edit /modules/modules.dep * Replace all instances of mptscsih with mptscsih_20505. * Replace all instances of mptbase with mptbase_20505. * Recreate boot CD based on boot.img with these changes. Then, the old module will get loaded during the install. It will also be set up for use post-install.
Also posted to IT 43391 by Susan Denham 9/28: We assume that you are taking the following steps in order to ensure that your RHEL 2.1 U5 Itanium systems have the correct mptfusion driver (v2.05.16.02) on the boot image. Please confirm that you are: 1. Shipping the unmodified RHEL 2.1 U5 CDs. 2. Creating and including in your Bull RHEL 2.1 U5 package a new boot CD that contains a Bull-modified initrd. You will not be calling or labelling this "Red Hat Enterprise Linux" but will instead call this a "Bull Boot CD for Intel Itanium2 systems running Red Hat Enterprise Linux 2.1 U5" or some such name. 3. You will create this modified initird (for the Bull Boot CD) using the instructions that Jeremy Katz, RHEL installer maintainer, provided above in the previous event. We will officially support Bull's customer using this modified initrd. I do, for obvious reasons, hope that it is indeed only one customer! It will no longer be necessary for Bull to ship this modified initrd once RH delivers RHEL 2.1 U6 because U6 will contain the correct version of the mptfusion driver (v2.05.16.02) that solves the failure to boot problem you're currently seeing on your RHEL 2.1 U5 Itanium systems.
And just to cover the RHEL 4 angle for the mpt fusion driver in this bugzilla as well: RH QA has tested RHEL 4 beta 1 on the NovaScale 6160 in Westford and has confirmed that the mpt fusion patch that is required on RHEL 2.1 and 3 (included in mpt fusion 2.05.16.02) for the Bull system is _not_ needed for RHEL 4. Background: the patch ensures that the mpt fusion driver will not scan LUNs that are larger than the storage device can address. This problem is not seen in RHEL 4 because the LUN scanning in RHEL 4 is completely different from 2.4 kernels. In RHEL 4, the SCSI layer requests a LUN inventory (Report LUNs) from the storage device, and it only configures those specific LUNs, rather than scanning the whole LUN number space. The only time that the bug in the mpt fusion driver will exhibit itself on RHEL 4 is when all of the following are true with respect to the storage device: - it does not support Report LUNs - it does not support the packetized SCSI protocol, and, - it is on the SCSI whitelist with the SPARSELUN flag. The Chaparral storage in the Bull system meets the second two, but not the first. LSI Logic is planning to incorportate this fix into an upstream version. RH wants to see the fix get upstream, and then inherit the fix in a RHEL 4 update at the earliest opportunity.
Please, could you give me answers about the questions in my comment #73? - is there a way to simplify the installation, specially when there is a QLogic adapter in the machine? (As the driver for this board is not automatically loaded, it is mandatory to create a new initrd file and to reboot again!). - erroneous content of the /etc/modules.conf file. - is mptbase useful in /etc/modules.conf? Regards, Claude.
This BZ spawned BZ 131393 and BZ 131392, to represent the specific fix needed in RHEL 3 and RHEL 2.1 IPF. Those BZ are now in the modified state. I think it is time to close the BZ.
Tom, thanks a lot for your investigation and your help to address this issue. We did good progress on it and we only need to check now this problem has been fixed on x86 binaries as well.
Pierre, the fix for this problem is in RHEL 2.1 IPF U6 and RHEL 3 U4 (all architectures, of course). We did not receive any reports of this problem on RHEL 2.1 x86, presumably because this combination of hardware and driver are not being used there. As a result, we did not include the fix in RHEL 2.1 x86 U6. If you need this fix in RHEL 2.1 x86 U7, I have suggested that JoAnne open a new Bugzilla.
I already opened IT #54259 and JoAnne created BZ #139042 but I have no access to this BZ#. Could you put me in Cc: to let me track progress on this issue for x86 ? Thanks in advance.
BZ #139042 is a "Featurezilla". I guess you should just track status in the IT.
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2004-504.html