Description of problem: When using a driver disk its possible that a newer driver will be provided than is avaialble internally to anaconda. But if an older driver is available from anaconda it will load that driver as well. This will cause a panic (see bug # 127385) If anaconda dynamicly removed the pci-ids from its internal list which are listed on the driver disk then we should be able to proceed without resorting to noprobe. Version-Release number of selected component (if applicable): 9.1.3-3.RHEL How reproducible: Everytime Steps to Reproduce: 1. use a driver disk that has a newer driver than whats on anaconda 2. it also has to have a different name than the driver on anaconda 3. Actual results: The system will most likely panic when it probes the device and tries to load another driver for the same device. Expected results: Anaconda should not load drivers for pci-ids which are referenced on a driver disk from anywhere but the driver disk. Additional info:
Honestly, I think that in pretty much all cases, this should be considered a driver bug. Even if we did something like this, there's still the chance that you'll end up loading the other driver anyway (for different hardware) and that will still cause the problem. Unless notting has a better idea anyway.
The driver panics because they both try to access the same resources? Last time I checked the kernel had resource locking, perhaps it should use it. :) As Jeremy stated, this doesn't help the case where you still end up loading the driver for some other hardware.
No, the panic is different and can't be resolved by the kernel. So, there are a number of situations where a driver disk is needed that we should consider: 1) Driver disk to provide driver not on install media at all. Probably don't need to worry about PCI table conflicts, the hardware in question is most likely not supported by any other driver on our disks. New driver still needs copied over to installation media after kernel RPM install and /etc/modules.conf and /etc/modprobe.conf need handled correctly. 2) Driver disk to provide updated version of driver that is on install media. Needed when we froze the kernel before a driver gets in, and new version enables new hardware, etc. Generally speaking, this is typically a backward compatible driver and anything the old driver supported, the new one will as well. So, completely replacing usage of the old with the new both during the install and on the post installation media is needed. 3) Driver disk to provide backdated version of driver that exists on install media (usage that triggered this bug). The new driver and respun isos regressed and quit supporting hardware in the field, we had to provide a driver disk to fix that. However, because we had saved the old driver in the new respun isos under a different name, mptscsih_20505 instead of the default mptscsih, extra problems ensued. If I name the old driver mptscsih_20505 on the driver disk, then it will match the post installation media driver name and things will work. If I name it mptscsih, then it will overwrite the updated driver which regressed and work on the initial install, but the first security errata or kernel update that hasn't fixed the regression will overwrite the driver with the new one again and the system dies. Catch 22. We named the driver according to the new driver name scheme so that any update kernels will pick up the right driver. Anaconda, although it likely detected the conflicting PCI table entries, didn't consider them the same driver because one was mptscsih and the other mptscsih_20505. For disks created in the wild, there is likely nothing we can do about that. For any disks we create, since renamed drivers are always <drivername>_<version> when we make a backup driver like that, doing strncmp() on the two driver names and just making sure that they match up to any _<version> component *would* catch this and keep anaconda from trying to load the same module twice. I know Bill brought up the kernel resource locking as the right way to handle this, but that doesn't work in the case of mpt drivers in particular and possibly a few other drivers. Specifically, the problem happens prior to any possible resource locking in the kernel, it happens at link time with insmod. Because the mpt driver is split into mptbase and mptscsih and mptlan, where the base driver is basically nothing more than an access control driver and the lan and scsih drivers are what implement stacks on the shared hardware, the mptscsih and mptlan drivers link against exported symbols in the mptbase driver. When you try to load this driver twice, you end up with two copies of mptbase in memory, each exporting identical symbols. When you then go to load mptscsih, insmod doesn't know which one to link against, links against the first found, and the system goes boom because you are linking a version <foo> mptscsih file against a version <bar> mptbase file. It's basically the same as if you loaded a version 2.6.6 scsi_mod.o and a version 2.6.8 scsi_mod.o and then tried to load some actual scsi drivers. Resource allocation won't help, the problem is being hit much sooner than that. So, here's my recommendation on how to handle these different scenarios in Anaconda: 1) if the new driver pcitable is a proper superset of any other pcitable from the install media, disable the superceded driver entirely and make sure it doesn't show up in the /etc/modules.conf or /etc/modprobe.conf files post install 2) if the new driver pcitable is only a union with some other driver's pcitable *and* it has a unique name not counting the _<version> portion, then for shared pci entries treat the new one as the default and load it first, and only if there are additional pci entries present in lspci that are not handled by the new driver should Anaconda even consider loading the superceded driver, and in that case it should ask first and load on user confirmation (IMO, others may say just do it, but the mptscsih thing is an example of when the just do it method breaks). Only if the user confirmed the load of the old driver and the old driver actually attached to some hardware do you let the old driver name leak into the modules conf files in /etc. 3) if neither of the pcitables are a superset of each other, and the names are not unique, then loading both modules at the same time should be considered an expert option and require something like booting anaconda with some "really_unsafe_module_loading_allowed" flag to enable. Do what you want with the modules config files in this case, just don't send me the bugzilla entries. That's my opinion anyway. One last thing, if a driver disk has a module dependancy on a module that we don't autoload at startup (such as scsi_mod.o), and the needed module exists on our install media, a modprobe on the driver disk doesn't detect that. For example, we don't autoload libata.o by default off the install media even though it's there, so a driver disk for some sata driver that needs libata.o has to include it on the driver disk in addition to the specific driver file in case we haven't loaded it already. This is a pain, but more specifically it can lead to exactly the sort of double loading of certain modules that I was referring to in #3. Probably should be fixed as well.
This bug is filed against RHEL 3, which is in maintenance phase. During the maintenance phase, only security errata and select mission critical bug fixes will be released for enterprise products. Since this bug does not meet that criteria, it is now being closed. For more information of the RHEL errata support policy, please visit: http://www.redhat.com/security/updates/errata/ If you feel this bug is indeed mission critical, please contact your support representative. You may be asked to provide detailed information on how this bug is affecting you.