132107 – anaconda should remove internal pci-ids which are provided by driver disks

Bug 132107 - anaconda should remove internal pci-ids which are provided by driver disks

Summary: anaconda should remove internal pci-ids which are provided by driver disks

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	anaconda
Sub Component:
Version:	3.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Jeremy Katz
QA Contact:	Mike McLean
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2004-09-08 21:11 UTC by Bill Peck
Modified:	2007-11-30 22:07 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-10-19 19:18:54 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Bill Peck 2004-09-08 21:11:30 UTC

Description of problem:

When using a driver disk its possible that a newer driver will be
provided than is avaialble internally to anaconda.

But if an older driver is available from anaconda it will load that
driver as well.  This will cause a panic (see bug # 127385)

If anaconda dynamicly removed the pci-ids from its internal list which
are listed on the driver disk then we should be able to proceed
without resorting to noprobe.

Version-Release number of selected component (if applicable):
9.1.3-3.RHEL

How reproducible:
Everytime

Steps to Reproduce:
1. use a driver disk that has a newer driver than whats on anaconda
2. it also has to have a different name than the driver on anaconda
3. 
  
Actual results:
The system will most likely panic when it probes the device and tries
to load another driver for the same device.

Expected results:
Anaconda should not load drivers for pci-ids which are referenced on a
driver disk from anywhere but the driver disk.

Additional info:

Comment 1 Jeremy Katz 2004-09-10 17:11:20 UTC

Honestly, I think that in pretty much all cases, this should be
considered a driver bug.  Even if we did something like this, there's
still the chance that you'll end up loading the other driver anyway
(for different hardware) and that will still cause the problem.  

Unless notting has a better idea anyway.

Comment 2 Bill Nottingham 2004-09-10 18:41:29 UTC

The driver panics because they both try to access the same resources?

Last time I checked the kernel had resource locking, perhaps it should
use it. :)

As Jeremy stated, this doesn't help the case where you still end up
loading the driver for some other hardware.

Comment 3 Doug Ledford 2004-09-24 16:11:53 UTC

No, the panic is different and can't be resolved by the kernel.

So, there are a number of situations where a driver disk is needed
that we should consider:

1)  Driver disk to provide driver not on install media at all. 
Probably don't need to worry about PCI table conflicts, the hardware
in question is most likely not supported by any other driver on our
disks.  New driver still needs copied over to installation media after
kernel RPM install and /etc/modules.conf and /etc/modprobe.conf need
handled correctly.

2)  Driver disk to provide updated version of driver that is on
install media.  Needed when we froze the kernel before a driver gets
in, and new version enables new hardware, etc.  Generally speaking,
this is typically a backward compatible driver and anything the old
driver supported, the new one will as well.  So, completely replacing
usage of the old with the new both during the install and on the post
installation media is needed.

3)  Driver disk to provide backdated version of driver that exists on
install media (usage that triggered this bug).  The new driver and
respun isos regressed and quit supporting hardware in the field, we
had to provide a driver disk to fix that.  However, because we had
saved the old driver in the new respun isos under a different name,
mptscsih_20505 instead of the default mptscsih, extra problems ensued.
 If I name the old driver mptscsih_20505 on the driver disk, then it
will match the post installation media driver name and things will
work.  If I name it mptscsih, then it will overwrite the updated
driver which regressed and work on the initial install, but the first
security errata or kernel update that hasn't fixed the regression will
overwrite the driver with the new one again and the system dies. 
Catch 22.  We named the driver according to the new driver name scheme
so that any update kernels will pick up the right driver.  Anaconda,
although it likely detected the conflicting PCI table entries, didn't
consider them the same driver because one was mptscsih and the other
mptscsih_20505.  For disks created in the wild, there is likely
nothing we can do about that.  For any disks we create, since renamed
drivers are always <drivername>_<version> when we make a backup driver
like that, doing strncmp() on the two driver names and just making
sure that they match up to any _<version> component *would* catch this
and keep anaconda from trying to load the same module twice.  I know
Bill brought up the kernel resource locking as the right way to handle
this, but that doesn't work in the case of mpt drivers in particular
and possibly a few other drivers.  Specifically, the problem happens
prior to any possible resource locking in the kernel, it happens at
link time with insmod.  Because the mpt driver is split into mptbase
and mptscsih and mptlan, where the base driver is basically nothing
more than an access control driver and the lan and scsih drivers are
what implement stacks on the shared hardware, the mptscsih and mptlan
drivers link against exported symbols in the mptbase driver.  When you
try to load this driver twice, you end up with two copies of mptbase
in memory, each exporting identical symbols.  When you then go to load
mptscsih, insmod doesn't know which one to link against, links against
the first found, and the system goes boom because you are linking a
version <foo> mptscsih file against a version <bar> mptbase file. 
It's basically the same as if you loaded a version 2.6.6 scsi_mod.o
and a version 2.6.8 scsi_mod.o and then tried to load some actual scsi
drivers.  Resource allocation won't help, the problem is being hit
much sooner than that.

So, here's my recommendation on how to handle these different
scenarios in Anaconda:

1) if the new driver pcitable is a proper superset of any other
pcitable from the install media, disable the superceded driver
entirely and make sure it doesn't show up in the /etc/modules.conf or
/etc/modprobe.conf files post install

2) if the new driver pcitable is only a union with some other driver's
pcitable *and* it has a unique name not counting the _<version>
portion, then for shared pci entries treat the new one as the default
and load it first, and only if there are additional pci entries
present in lspci that are not handled by the new driver should
Anaconda even consider loading the superceded driver, and in that case
it should ask first and load on user confirmation (IMO, others may say
just do it, but the mptscsih thing is an example of when the just do
it method breaks).  Only if the user confirmed the load of the old
driver and the old driver actually attached to some hardware do you
let the old driver name leak into the modules conf files in /etc.

3)  if neither of the pcitables are a superset of each other, and the
names are not unique, then loading both modules at the same time
should be considered an expert option and require something like
booting anaconda with some "really_unsafe_module_loading_allowed" flag
to enable.  Do what you want with the modules config files in this
case, just don't send me the bugzilla entries.

That's my opinion anyway.  One last thing, if a driver disk has a
module dependancy on a module that we don't autoload at startup (such
as scsi_mod.o), and the needed module exists on our install media, a
modprobe on the driver disk doesn't detect that.  For example, we
don't autoload libata.o by default off the install media even though
it's there, so a driver disk for some sata driver that needs libata.o
has to include it on the driver disk in addition to the specific
driver file in case we haven't loaded it already.  This is a pain, but
more specifically it can lead to exactly the sort of double loading of
certain modules that I was referring to in #3.  Probably should be
fixed as well.

Comment 4 RHEL Program Management 2007-10-19 19:18:54 UTC

This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
 
For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/
 
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.

Note You need to log in before you can comment on or make changes to this bug.