A server with redundant qlogic HBA's can be installed just fine as long as you boot the installer with "linux noprobe" and then manually load the qla2300 module with the option "ql2xfailover=1" (this prevents you from seeing 2 instances of the SAN disks and I'd recommend that if possible, the installer should add this option automatically as the failover version of the driver was the default in previous releases). However, in order to get the driver module configured to properly support the redundant paths and failover features of a SAN, it's necessary to use a gui program (QLogic SANSurfer or IBM FASTT_MSJ depending on the branding) which ultimately adds a number of nearly-unintelligible parameters to modules.conf. In order to save the settings, the qlremote program triggers something that tries to load the ql2300_conf.o kernel module which is missing from the RHEL distribution. It looks like some of the C files are in the kernel sources, but enough files are missing that this module cannot be compiled. Additionally, the version of the ql2300 driver that Red Hat is using doesn't seem to be readily available for me to add these files back and compile just a ql2300_conf module that is compatible with the version in RHEL ES3. Unfortunately, without the qla2300_conf.o module, having the qla2300.o module in the distribution doesn't allow the creation of a reliable environment. Would it be possible to get the qla2300_conf.o module added to future kernels? Yes, a 3rd-party binary is needed to make use of this module, but I can't think of any other practical way of supporting a SAN with the qla2300 drivers.
In case there needs to be an actual "bug" to help justify adding the "feature", it might be worthwhile to note that as shipped, the qla driver *will* allow a group of linux boxes to failover (after a minute or two), but when service is restored a second failover will hang every linux box that failed over the first time.
Another note... After spending a day with an IBM support engineer on site, I've learned that the qla2300_conf is not only required to write an updated configuration, but it's also needed to read that configuration before the qla2300 modules itself loads. My "IBM approved" current modules.conf looks like this: alias scsi_hostadapter0 qla2300_conf alias scsi_hostadapter1 qla2300 alias scsi_hostadapter2 qla2300_conf alias scsi_hostadapter3 qla2300 alias block-major-2 off options scsi_mod max_scsi_luns=128 post-remove qla2300 rmmod qla2300_conf options qla2300 ConfigRequired=1 ql2xfailover=1 ql2xuseextopts=1 It might be good for Anaconda to add the qla2300_conf lines at install time once the modules ship with the kernel. Hope this info is useful.
Jeremy, would this be an anaconda or kudzu issue ?
First it is a kernel issue, because we do not build the qla2300_conf module. Adding this module has not been a high priority for us, because it adds to the maintenance and QA burden, just to enable binary-only features that we have seen cause problems for some customers. The preferred way to do multipath failover is to use MD driver. In fact, when the qlogic FC driver is added to the 2.6 kernel it will be stripped of its multipath functionality. This does not suggest that we should invest effort into it in RHEL.
Did I overlook a way to set up a bootable Multipath failover MD array in anaconda? I also assume that this means that in order to upgrade a server farm from Red Hat 2.1 (with QLA Failover support) to Red Hat 3 requires a complete re-formatting of the entire SAN? The IBM BladeCenter needs to boot off the SAN b/c there is not room in system for redundant hard drives. The qlogic solution worked in 2.1, and works under 3 with qlogic's drivers. However, without the qla2300_conf module in the RPMs, it becomes almost impossible to upgrade the kernel after enabling qlogic-style failover. I'll admit that I don't know much about the MD driver yet, so I'm sorry if I'm overlooking something, but it looks like the exclusion of this one module turns a simple upgrade into a nightmare and an enterprise-quality SAN system into a big hack.
I understand the preference for the use of the MD driver, however there are supported configurations that preclude the use of Software Raid, specifically RedHat Cluster Suite. In this case, it is my understanding that failover must be done in the driver and not the os. My configuration is 2 IBM xSeries 445 each with 2 fc2-133 HBAs (aka. ql2300), each HBA is directly connected to a single FasTt600 enclosure. Since the ql2300 is in the "Supported" drivers pkgs, and there are other "Supported" components that require this functionality, shouldn't the full functionality of the driver be included? It is my understanding that to use my configuration, I'll now have to rebuild the driver from sources, for each and every kernel update. This is something I'd very much prefer to avoid due to impacts on TCO.
My apologies if my last comment was overly harsh. The comment from Red Hat arrived right while I was trying to deal with the issue in upgrading to the Update 1 kernels, so it rubbed me the wrong way. :-) Anyway, I've really been trying to understand the reluctance to build this module from Red Hat... so would someone mind answering these questions which might give me some insight into the rationale for the current status of the module? - What were the "issues" that were seen by using the qla2300_conf.o module? I've heard complaints that the qla2300.o itself is not the best-written module in the world, but how does _conf complicate matters? Searching for qla2300_conf on google only brings up my own bug report and mailing list complaints. :-) Are there other entries in bugzilla that I've overlooked? - Can the Multipath MD driver be configured to 1) be bootable, 2) failover in 2-3 seconds rather than 30, and 3) set a preferred path per-lun (e.g. sda on controller 0 and sdb on controller 1)? These features would be required to match the features set of qla2300 & qla2300_conf. - If the _conf module was a known "issue", was there then a reason that Red Hat did not continue to use the 6.05.00 driver which does not require that module rather than 6.06? Finally, is this issue still being actively considered, or is the "too much effort" statement the final word? The Assigned status of the bug seems to contradict the last comment from anyone at redhat. Thanks
I am in a similar position My configuration is 2 IBM xSeries 365 each with 2 fc2-133 HBAs (aka. ql2300), each HBA is directly connected to a single StorageTek D280 (aka FasTt900)enclosure. I also need all the functionality for this driver. For support from my storage vendor StorageTek, I need to run RHEL 2.1, compile this driver from source against current kernel source: qla2x00-v6.05.60-fo-dist.tgz, make a new initrd. This is a non-trivial task that will need to be performed fairly regularly. Any additional supported configurations in the Red Hat native module would be a tremendous help. Thanks.
I will take another look at the _conf module and see whether it is possible to include it in a quarterly update. I'll keep you posted.
Created attachment 97289 [details] qla2300 failures on uni-processor /var/log/messages extract Much Appreciated... This will help me, I spent all weekend tearing my hair out on this stuff. I Got the conf module to compile on AS 3 update 1 and it seems to work, unable to get qla2300 and qla2200 to work... Unresolved symbols. I am currently running the qla2300_conf I built and the RH distributed qla2300.o running together, but I've only just begun testing. I also stumbled on another issue. Under the uni and BOOT kernels the driver takes timeouts attempting to initialize the luns... See attached extract from /var/log/messages. This may be parameter related, as noted above , I've just begun to test things. But these errors occur in the modules as distributed by RH. These are not fatal in my configuration, since I only need these kernels for disaster recovery, but I thought you should be aware of it, since you're taking a second look any way. Feel free to contact me if you require more information.
I am in the same boat with the QLA2342 HBA's It is really unfortunate I have to rebuild a kernel every time I upgrade parts of the OS. HP is only supporting ES 2.1 so this is really confusing to get it to work the way it is suppose to be. We have unplugged the second cable to the Brocade switches and will need to wait for a workable solution or rebuild the kernel ourselves. That will cause another problem that could have to do with security, if it is too time consuming to upgrade the kernels we will have to run with one cable attached.
Created attachment 98322 [details] ql2300 info and /var/log/messages
We are experiencing similar problems on SMP/Hugemem kernels on both RHEL 3 and RHEL 3 Update 1. Compiling the latest source from QLogic appears to solve some of our problems.... but this is not the prefered method of maintaining our systems. It would be nice to see this work out of the box from RedHat. Attached is a snip from our messages. Bug 109403 is related.
Fyi: I was able to compile the Qlogic version of the driver on the latest AS 3 U1 errata kernel: 2.4.21-9.0.1.ELsmp. I've retested both the RedHad version and the QLogic version that I built on the Uni-Kernel and the SCSI failure condition still exists on the Uni. Everything is working well on hugemem and smp kernels.
We are planning to update the QLogic driver in the next RHEL 3 quarterly update, assuming all goes well during qualification. We will also provide the _conf.o modules. We are not planning to change the installer to automatically configure or use the QLogic multipath functionality. This wil need to be done manually. Jim, I will look into the problem you are having with the uniprocessor kernel. What QLogic driver version are you using?
Thanks to the Red Hat crew for listening to our concerns and adding this. It will certainly make my life easier. :-)
Glad to here about the inclusion. This will be a big help. Doing the qla build for 3 kernels on 2 systems each time the kernel is updated is a medimum size thorn in my side. I understand about the multi-path, and that's probably best anyway. BTW problem only happens on the uni kernel, smp with maxcpus=1 works fine. I am currently running: qla2x00-v6.06.10-dist.tgz I am using a pair of IBM fc2-133 cards (aka qla2312), which is not listed on the qlogic site, I pulled the driver from the qla2340 page.
Would it be pushing our luck too much to ask about maybe seeing some pre-release kernels with this module included at some point in the near future? Ideally, I'd like to be able to test migrating from our current setup to the red hat-built setup, and then upgrading between two red hat kernels. Just thought it might be easier to release the kernels and get some feedback earlier rather than as part of the beta cycle. And since compiling the module in ourselves void support, it's not like we're losing anything testing these kernels out. :-) Just a suggestion. Thanks.
I made a pre-beta-release kernel rpm available to Dana. If anyone else is interested and able to test this before the RHEL 3 beta ships (planned for 3/31) please contact me.
Following up in bugzilla as Tom suggested... he said he was unable to get SANSurfer working with the beta kernels running the 6.07.whatever qlogic driver. I can confirm that although I can connect to the qlremote program running on a box with a the U2beta kernel, it cannot update the configuration with this kernel. I get no useful debugging output... only that updating the configuration failed. I have had no problems with connecting and configuring -9 and earlier kernels in which I've replaced Redhat's module with a generic qlogic/IBM 6.06.60-fo kernel, however. My results are the same in both cases using both SANSurfer and IBM's FAStT_MSJ (same product, repackaged). I've never been able to figure out where redhat is getting their qla2x00 driver modules from, as I've never found those versions of the drivers on qlogic's web site. So, I am unable to test whether the 6.07 version works in the same way that the modules have worked for me in the past (building the module from sources after the kernel is installed). Quite willing to help with any testing to get this included in Update 2. This will save me about 10 hours work (by the time I update 22 boxes) every time there is a kernel update. :-)
I got this response from Qlogic: "The GUI/agent looks for the qla2300_conf.o and qla2200_conf.o in the ../scsi directory and not the ../addon directory." We are not going to change the driver location in RHEL 3, but you can manually copy them, or create a symlink. Please post the results.
Doing this does allow the configuration file (/etc/qla2300.conf) to be written, but it fails from there. Normally, once I build the qlogic driver to get ql2300_conf, I add these 4 lines to modules.conf: alias scsi_hostadapter0 qla2300 alias scsi_hostadapter1 qla2300_conf alias scsi_hostadapter2 qla2300 alias scsi_hostadapter3 qla2300_conf and then the qlremote/gui adds this to modules.conf options qla2300 ConfigRequired=1 ql2xfailover=1 ql2xuseextopts=1 From there, after I've made the configuration changes in the gui (to distribute LUNS across the interfaces) I normally have to do a mkinitrd and then reboot. With this driver/kernel, the startup fails with unable to open root device. Any thoughts?
BTW, is there anything more I can be doing to move this along? Do I need to pester IBM in some way? I have names for regional support managers and such from a recent debacle and they "owe us one," so I'm quite willing to call anyone I need to. But, if this is simply something that has to be sorted out on the redhat/qlogic front, then I guess there's little I can do. At this point I'm so un-eager to spend another 10-15 hours upgrading another round of kernels, that I'm willing to get a BladeCenter from the IBM loaner pool, get in my car & drive to North Carolina, and have Red Hat folks watch over my shoulder while I put this stuff together and try whatever you tell me to make this work. :-)
Can you attached copy of your qla2300_conf file, so I can take a look at it and duplicate the issue here. I thinking that mkinitrd is picking up the wrong qla2300_conf file. Did you copy the files over to ../scsi or did a symlink?
Created attachment 100057 [details] qla2300.conf file from unbootable blade I'm assuming you meant qla2300.conf rather than qla2300_conf, and it is attached. I tried a symlink, then I tried copying the module. Tomorrow I'll experiment some more, but I don't know how I could be using the wrong qla2300_conf.o module as I'm testing on a freshly installed blade... the only qla2300_conf.o that should be anywhere on the system is the new one from kernel -14. Can you tell me how closely tied the qlremote/sansurfer are to the kernel version? Is there a very specific version I need to be using? I tried both the latest IBM FASTt_MSJ and SanSurfer from the 3.0.0 CD.
For good measure, I move qla2300.o and qla2300_conf.o into the scsi directory, and then moved the addons/qla2200* directories to /tmp so they wouldn't be found. Ran qlremote and configured the disks, then ran mkinitrd and rebooted. System still stopped on reboot, unable to find /.
I like to have the actual binary file qla2300_conf, so I can do a "od -xc qla2300_conf.o" and verify qlremote/sansurfer wrote the correct configuration. Good test, I'm surprise it didn't work. Is mkinitrd picking up the right modules - try adding -v to the mkinitrd command and pipe to a file, so we can see what it is doing. I I would think both versions of sansurfer should work with the 6.07.02_RH2 driver. But, I need to verify this with our gui guys.
Created attachment 100087 [details] Output of mkinitrd -v & modules.conf New install of the latest sansurfer from download.qlogic.com, so hopefully that is ruled out as a possible cause. I'm booted in to the 2.4.21-14.ELsmp kernel, with the addons/qla2200* directories removed and the qla2300*.o files in drivers/scsi. I ran qlremote and SANSurfer, configured with lun 0 on one path and lun 1 on the other path. After saving the configuration, the timestamp on qla2300_conf.o was indeed updated. Still the same result. Unable to locate / partition.
ok, it seems we need to reverse the order of loading between the qla2300 module and qla2300_conf module. qla2300_conf must be loaded first. Change /etc/modules.conf to look like the following: alias eth0 tg3 alias eth1 tg3 alias scsi_hostadapter0 qla2300_conf alias scsi_hostadapter1 qla2300 alias scsi_hostadapter2 qla2300_conf alias scsi_hostadapter3 qla2300 options scsi_mod max_scsi_luns=32 options qla2300 ConfigRequired=1 ql2xfailover=1 ql2xuseextopts=1 alias usb-controller usb-ohci THe mkinitrd should lokk like this: Looking for deps of module scsi_mod Looking for deps of module sd_mod scsi_mod Looking for deps of module scsi_mod Looking for deps of module unknown Looking for deps of module qla2300_conf Looking for deps of module qla2300 scsi_mod Looking for deps of module scsi_mod Looking for deps of module qla2300_conf Looking for deps of module qla2300 scsi_mod Looking for deps of module scsi_mod Looking for deps of module aic7xxx scsi_mod Looking for deps of module scsi_mod Looking for deps of module ide-disk Looking for deps of module ext3 jbd Looking for deps of module jbd Using modules: ./kernel/drivers/scsi/scsi_mod.o ./kernel/drivers/scsi/sd_mo d.o ./kernel/drivers/addon/qla2200/qla2300_conf.o ./kernel/drivers/add on/qla2200/qla2300.o ./kernel/drivers/scsi/aic7xxx/aic7xxx.o ./kernel/ fs/jbd/jbd.o ./kernel/fs/ext3/ext3.o Using loopback device /dev/loop0 /sbin/nash -> /tmp/initrd.rHFXDc/bin/nash /sbin/insmod.static -> /tmp/initrd.rHFXDc/bin/insmod `/lib/modules/2.4.21-12.EL/./kernel/drivers/scsi/scsi_mod.o' -> `/tmp/initrd.rHFXDc/lib/scsi_mod.o' `/lib/modules/2.4.21-12.EL/./kernel/drivers/scsi/sd_mod.o' -> `/tmp/initrd.rHFXDc/lib/sd_mod.o' `/lib/modules/2.4.21-12.EL/./kernel/drivers/scsi/qla2300_conf.o' -> `/tmp/initrd.rHFXDc/lib/qla2300_conf.o' `/lib/modules/2.4.21-12.EL/./kernel/drivers/scsi/qla2300.o' -> `/tmp/initrd.rHFXDc/lib/qla2300.o' `/lib/modules/2.4.21-12.EL/./kernel/drivers/scsi/aic7xxx/aic7xxx.o' - > `/tmp/initrd.rHFXDc/lib/aic7xxx.o' `/lib/modules/2.4.21-12.EL/./kernel/fs/jbd/jbd.o' -> `/tmp/initrd.rHFXDc/lib/jbd.o' `/lib/modules/2.4.21-12.EL/./kernel/fs/ext3/ext3.o' -> `/tmp/initrd.rHFXDc/lib/ext3.o' Loading module scsi_mod Loading module sd_mod Loading module qla2300_conf Loading module qla2300 with options ConfigRequired=1 ql2xfailover=1 ql2xuseextopts=1 Loading module aic7xxx Loading module jbd Loading module ext3
OK, brown paper bag on my head. On this "clean test box" I transposed the 4 lines in the modules.conf, just as you mention (in fact, I was writing this as you were submitting your response, it seems). And now it boots as expected with the configured qla2300_conf. So, comments #27-33 are moot. My apologies. I'm still concerned about kernel upgrades, though. I'm going to go revert this box to the -11 or -12 kernel and test the upgrade process. Duane says it should work, but I really don't understand how. My understanding is this: - qlremote writes the configuration to /etc/qla2300.conf and into the binary qla2300_conf.o module. - mkinitrd allows you the new configuration to take place at boot time. - If compiling a new kernel, the make install writes the information from /etc/qla2300.conf into the newly-compiled qla2300_conf.o. So, I don't understand how that will happen when I upgrade the kernel using RPM. RPM is just going to install a new qla_2300.o in a new directory. What tells the system to write my old config into the new qla2300_conf.o? I might be way out in left field, but I wanted to voice that concern even before testing, as it is reason to leave the bug open.
Installed 2.4.21-11.ELsmp on a box, configured it with qlremote, and made it boot happily. Then rpm -Uvh kernel-smp-2.4.21-14.ELsmp and rebooting failed. In the best case, I'm assuming there is something I need to run after upgrading the kernel in order to migrate the configuration. In the worst case, it looks like I need to comment out configrequired=1 and qluseextopts=1 in modules.conf before upgrading kernels, then reboot, configure with qlremote in the new kernel, mkinitrd, and reboot again. Suggestions, anyone?
There is a program called "qla_opts" that is invoke by qlogic makefile when you do a "make install". install_qla_opts: qla_config qla_opts install -d -o root -g root /usr/local/sbin/ install -o root -g root qla_opts /usr/local/sbin/ @if [ -f "/etc/$${drv}.conf" ] ; then \ echo "qla2x00: Updating qla_config module with /etc/ $${drv}.conf data..." ; \ ./qla_opts -w qla_config ; \ fi I don't think this utility is included as a part of the RedHat distribution, so you may have to install it.
Two questions, then... For Duane, is qla_opts specific to the kernel version? For Tom, is there any chance of getting this added to the kernel RPM? It seems to be open source and included in the driver package. If nothing else, could the source and makefile be included in the kernel-source package so it is somewhat trivial to build this ourselves? Thanks for all the help guys... off to test it...
It works! I'm entirely too excited about this. :-) I suppose it would be pushing my luck entirely too far to ask if the kernel RPM could just include the qla_opts program and then as part of %postinstall simply do what qlogic's makefile does? Namely, the section that does: @if [ -f "/etc/$${drv}.conf" ] ; then \./qla_opts -w qla_config ; fi It seems a pretty trivial addition in order to make rpm's work cleanly for us SAN users. In any case, I'll write up some instructions for this and send it out to the groups I think might get a use out of it. Thanks again.
One thing that is interesting about the qla2300_conf.o module and the FAStT MSJ tool is that the use of the MSJ tool to configure the driver actually does a binary patch against the qla2300_conf.o file to poke in configuration. This probably does not matter a whole lot to the kernel upgrade process, except the user needs to remember to re-run MSJ after a kernel upgrade and re-make the initrd.
I thought that was exactly what the SANsurfer did, too, both via the qlremote module (in fact, I've used MSJ and SANSurfer interchangably with the same qlremote running on the remote host). And having to re-run MSJ and re-make the initrd is exactly what I'm trying to avoid. If the postinstall script can use the qla_opts scriptlet, it would save a step that, if overlooked, will make a system unbootable. If it's something that can be prevented using the open-source tools, I would consider that a bug worth fixing, regardless of the underlying causes.
From the help output from qla_opts, it looks like it could be ran during the kernel upgrade process to pull settings from the qla2300_conf module in the current running kernel and write those to the qla2300_conf module in the new kernel just before building the initrd. A quick scan of the qla_opts.c that comes in the driver source makes it look like it's possible you could need 2 copies of qla_opts - one for the old kernel release, and one for the new. It'd be a convenient fix for those of us using the qla2300 driver, but it seems like a lot of hacking on the install process for one driver.
The amount of special-case code that is required for the installer to deal with qla_opts is not practical, especially since this is not a long-term (post 2.4) solution. The _conf modules are being provided in RHEL 3 (as of U2 and later), as originally requesteed. I'm closing this bug.