Bug 232526

Summary: Urgent: RHEL4u5 beta1/beta2 backport of irq_handler_t breaks all e1000, ixgb compile
Product: Red Hat Enterprise Linux 4 Reporter: Auke Kok <auke-jan.h.kok>
Component: kernelAssignee: Jeff Garzik <jgarzik>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.0CC: agospoda, bruce.w.allan, charles_rose, fnovak, jesse.brandeburg, jfeeney, john.ronciak, karen.skweres, konradr, linville, ltroan, nhorman, peterm, sbenjamin, thomas_chenault, wwlinuxengineering
Target Milestone: ---Keywords: Regression, Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2007-0304 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-05-08 05:02:10 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Auke Kok 2007-03-15 23:25:06 UTC
Description of problem:

RHEL4u5 beta1 and beta2 break out-of-tree drivers due to backport of 2.6.19
kernel symbol irq_handler_t.

in rhel4u5 update 1 and 2 the following line(36) was added to linux/interrupt.h:

+typedef irqreturn_t (*irq_handler_t)(int, void *, struct pt_regs *);

This symbol was needed for a libata subsystem irq handler. However, out-of-tree
drivers already require to provide this symbol to be able to compile against
2.6.9 and e1000 has this exact define for this reason for 2.6.18 and lower kernels.

Now that RHEL backported this line to 2.6.9-49.EL every single e1000 out-of-tree
driver tarball fails to compile. Backwards compatibility is completely broken.


Version-Release number of selected component (if applicable):

rhel4u5 beta1 / beta2

How reproducible:

Attempt to compile any of the e1000 drivers from http://e1000.sf.net/ against
rhel4u5 beta1/2 kernel rpms.

Steps to Reproduce:
1.
2.
3.
  
Actual results:
compile fails on "/home/ahkok/work-e1000/src/kcompat.h:1120: error: redefinition
of typedef 'irq_handler_t'"

Expected results:
compile should succeed for out-of-tree drivers for 2 reasons:
1) backwards compatibility
2) troubleshooting and debugging driver issues

Additional info:
I am the e1000 driver maintainer.

Also affected: ixgb, possibly all other out-of-tree drivers that have interrupt
handlers.


Suggested fix:
provide the typedef in question in only the subsystem that needs it so that
out-of-tree drivers don't collide with it. (In this case libata).

Comment 1 Ronald Pacheco 2007-03-20 17:37:34 UTC
Per Intel, this is a regression.

Comment 4 RHEL Program Management 2007-03-20 18:05:36 UTC
This bugzilla has Keywords: Regression.  

Since no regressions are allowed between releases, 
it is also being proposed as a blocker for this release.  

Please resolve ASAP.

Comment 7 Ronald Pacheco 2007-03-26 17:35:14 UTC
Intel:

Our engineering team has reviewed this request.  Based on our analysis, there is
no regression from the drivers that Red Hat currently ships and supports on
RHEL4.    The report, as we understood it, is with out-of-tree drivers, not RHEL
provided drivers.  If you beleive this analysis is in error, then please provide
a clarfication.  For now, this will be closed as notabug.

Comment 8 John Ronciak 2007-03-26 18:08:33 UTC
While it is true that this problem only effects out of tree drivers it does
affect _all_ out of tree drivers, not just those from Intel.  All (and I mean
all!) of our OEMs use our out of tree drivers on this version of RHEL.  They
need it and want it since Red Hat has not resobably driver update mechanism.  So
some OEMs even ship these drivers, like Dell.  They are going to be very unahppy
with this decision.  I wil be sending mail to all of them today explaining what
RH has done here.  Please be prepared to answer there mails.  There will be
plenty of them.

Know also that change will break all out of tree wireless drivers as well.  This
_is_ very much a problem.

Also, I guess it's OK for RH to break kABI but when others rely upon it not
being broke RH ignores that.

Comment 9 Amit Bhutani 2007-03-26 22:10:42 UTC
If the out-of-tree drivers for RHEL4 fail to compile, this would be a *huge*
deal for Dell as we have several of those being loaded in the factory and our
redeployment media on the basis of our baseline being RHEL 4.0 Gold (Yes, that
would be kernel 2.6.9-5.EL).

This is a first-pass request that this issue be reopened and the suggested fix
by Ron be implemented or RHEL 4.5 will be DOA (Dead On Arrival) to Dell. Please
note that in the interest of time, I am making the above statements on behalf of
the Dell RHEL4 project lead Charles Rose. I am sure Charles wil have more to say
once he is brought into the loop with this issue which should be in the next 12
hours (as he's in the Indian Time Zone).

Comment 10 Charles Rose 2007-03-27 04:00:17 UTC
Reopening this bug. To add to Comment #9, This is a blocker for Dell.

Comment 11 Jeff Garzik 2007-03-27 07:09:23 UTC
Can Dell be more specific about precisely what problems you are seeing, with
which specific drivers?

And is Dell not using the RHEL4.5 e1000 and ixgb drivers?

Comment 12 Charles Rose 2007-03-27 07:56:50 UTC
(In reply to comment #11)
> Can Dell be more specific about precisely what problems you are seeing, with
> which specific drivers?

e1000 version 7.4.35 from http://e1000.sf.net/ throws out this message when
compiled against 2.6.9-50:

In file included from /home/charles/rhel4.5-problem/e1000-7.4.35/src/e1000.h:35,
                 from
/home/charles/rhel4.5-problem/e1000-7.4.35/src/e1000_main.c:51:
/home/charles/rhel4.5-problem/e1000-7.4.35/src/kcompat.h:1119: error:
redefinition of typedef 'irq_handler_t'
include/linux/interrupt.h:36: error: previous declaration of 'irq_handler_t' was
here
make[2]: *** [/home/charles/rhel4.5-problem/e1000-7.4.35/src/e1000_main.o] Error 1
make[1]: *** [_module_/home/charles/rhel4.5-problem/e1000-7.4.35/src] Error 2
make: *** [default] Error 2

> 
> And is Dell not using the RHEL4.5 e1000 and ixgb drivers?

While we would ideally like to support our platforms with the native drivers on
RHEL, there are usually bug fixes and/or support to newer hardware in the
vendor's drivers (in this case, Intel) which are not committed to the RHEL
products at the required time. For this reason, the vendor's stable drivers are
always newer than the native drivers in RHEL. 

Vendors package the driver in the DKMS
(http://linux.dell.com/projects.shtml#dkms) format which is consumed by the Dell
factory, the Dell Server Assistant CDROM and published on the Dell support
website. Customers download these drivers and use the DKMS framework to update
their drivers.

The issue in discussion would break this driver update in our factories, the
Dell Server Assistant based Installs and customer initiated driver updates.

Comment 13 Charles Rose 2007-03-27 08:40:43 UTC
To add to comment #12, DKMS drivers are packaged to work with all kernels of a
particular product as the kabi would be maintained across updates. So, we expect
the same dkms driver package to compile on RHEL 4 GA through RHEL 4.5 and future
versions.

Comment 15 Jeff Garzik 2007-03-27 17:42:54 UTC
Please note this is /not/ a kABI issue.

This is a source code API issue.

Once compiled, the out-of-tree Intel drivers will load just fine on RHEL4.5.


Comment 16 John Ronciak 2007-03-27 18:00:39 UTC
The API used by out of tree drivers has now changed due a change made to
interrupt.h which out to tree need to include.  This prevents the driver from
compiling (see #12 above).  Call it whatever you'd like,  the API that a lot of
out of tree drivers (not just Intel drivers) rely on changed which prevents them
compiling.

Comment 17 Jeff Garzik 2007-03-27 18:10:58 UTC
To be more specific, this only affects out-of-tree drivers that randomly pull
new symbols from recent upstream kernels, without notifying or coordinating with
Red Hat.

Comment 18 Neil Horman 2007-03-27 18:13:57 UTC
I think what Jeff is saying is that because this is _not_ a kabi issue, the out
of tree drivers that you may be shipping alongside RHEL should not need to be
recompiled at all to work properly with RHEL4.5.  Modprobe the previously
compiled driver (possibly with a -f and all should be well).  If, in the event
you do have some reason to recompile an out of tree driver for RHEL4.5, you
should be able to take the extra 5 minutes to remove the (now redundant)
definition of irq_handler_t from your sources.

To summarize:

1) This isn't a kabi break.  Any out of tree modules that you have previously
built for RHEL4 should continue to work without recompilation on the RHEL4.5
kernel update.

2) In the event that you do have some need to recompile your out of tree driver,
then you are clearly modifying yoursource code anyway, and you should be able to
take the extra few minutes to remove the redundant irq_handler_t definition from
your sources and your compile problem will be resoved moving forward.

All that needs to be done here is to fix up any out of tree drivers with this
problem, and this is all resolved.  And even that effort can be deferred, since
this is not a binary compatibility problem, but rather a source code
compatibility issue.  So don't rebuild anything until you have some other reason
to do so, and when you do, go ahead and fix this up as well.

Comment 19 John Ronciak 2007-03-27 18:27:47 UTC
The e1000 driver did not add any new symbols at all nor were any randomly pulled
into it.  The stand-alone drivers which Intel provide to Dell compile and run on
a very wide range of kernel version.  Everything from RHEL3U8 to RHEL5.1 and all
of the 2.6 upstream kernels as well as 2.4.21 and later kernels.  Jeff very much
knows how this works for our e1000 driver.  This is done for Dell and other
because RH is unable to rev the drivers fast enough for Dell's HW release.  So
Dell and other OEMs provide these new drivers to support new Intel HW which are
on their motherboards.  RH is always offered to take the driver updates at the
same time the driver chagnes are accepted ustream.  Most times RH decides not to
include the changes for one reason or another which puts the OEMs into this
position.  

We made no change to e1000 driver but due to the interrupt.h change in a 4.5
beta the smae driver will no longer compile.  This is what Dell as pointed out
above.  So the API has changed (via interrupt.h change) which breaks drivers
that were using it. I thought that was why changes like this were not accepted
into updates.  Additions are fine but changes to the existing exported symbols
were not.  Guess I was wrong.

Comment 20 Jeff Garzik 2007-03-27 18:56:26 UTC
Responding to Neil (comment #18):

Close.  The issue is not simply that this is a source code issue.  In most
cases, especially kernel-wide cases like this one, Red Hat should not /change/
the general facilities in the kernel-wide source code API.

It's about forseeability.

It is logically impossible for Red Hat to determine whether NEW ADDITIONS to the
C namespace will break out-of-tree drivers over which we have no control.


Comment 21 Jeff Garzik 2007-03-27 19:23:05 UTC
Responding to John (comment #19):
irq_handler_t was most definitely pulled from a more recent upstream kernel. 
The irq handler signature that has existed since before Linux 1.x.x days was
only changed in Linux 2.6.19 (Nov 29, 2006), in this commit:

> commit da482792a6d1a3fbaaa25fae867b343fb4db3246
> Author: David Howells <dhowells>
> Date:   Thu Oct 5 13:06:34 2006 +0100
> 
>     IRQ: Typedef the IRQ handler function type


The current symbol conflict exists because both Intel and Red Hat independently
pulled a symbol from a more-recent upstream kernel.

Every RHEL 2.1, 3, 4, and 5 update stream has included NEW ADDITIONS to the C
namespace.  Every single one.  The open issue is mainly to educate everyone on
precisely why this broke, and to develop a good process to deal with this issue.


Comment 22 Jeff Garzik 2007-03-27 19:51:51 UTC
More background:

irq_handler_t was added to out-of-tree e1000 version 7.3.20, dated Dec 15, 2006,
to the C header file specifically designed by Intel to mitigate changes between
different kernel versions: src/kcompat.h

The relevant code in Intel's src/kcompat.h clearly indicates it was pulled from
recent upstream kernel 2.6.19, as I indicated:

#if ( LINUX_VERSION_CODE < KERNEL_VERSION(2,6,19) )
...
typedef irqreturn_t (*irq_handler_t)(int, void*, struct pt_regs *);
...



Comment 23 John Ronciak 2007-03-27 21:13:10 UTC
Jeff, we never said taht we didn't add the typedef for irqreturn_t.  We did as
you show.  The point is and what Dell is asking as well is the exact version of
this driver compiled and worked on RHEL4.4 and early RHEL4.5 betas.  Then the
libata support is backported and the same exact version that compile fine
earlier is now broke due to the backport of the interrupt.h typedef.  This is
_the_ API change that broke the out of tree drivers.  It's not just ours, it's a
lot of them.  Dell needs to be able to have these same drivers (some might just
need to be recompiled as there is no update for them) compile across all of the
versions of RHEL4 that they are shipping.  It's more than any one RHEL version.
 They need the same driver to work across them all.  Our stand-alone driver does
this but now it's been broke due to a change made to the latest RHEL4.5 beta kernel.

Also, in reply to #18 above,we took the driver (e1000.ko) that came with REHL4.4
and tried to insmod it into RHEL4.5 GA Snapshot3 and it doesn't load due to a
symbol error.  This was the stock driver from 4.4, not touched at all.  So I
think the claim being made there is incorrect.

More on #18, having to add more to the compat stuff to support individual
updates has not been done before.  As I stated above we support a very large set
to kernels but have never had to change support for RHEL updates which is the
case here.


Comment 24 Neil Horman 2007-03-28 00:59:53 UTC
"This is _the_ API change that broke the out of tree drivers..."
As far as I know, we don't support Source level API consistency, only ABI
compatibility.  We simply can't guarantee that an out of tree driver will always
compile with a given RHEL stream, even if it is itself unchanged.

"Also, in reply to #18 above,we took the driver (e1000.ko) that came with REHL4.4
and tried to insmod it into RHEL4.5 GA Snapshot3 and it doesn't load due to a
symbol error.  This was the stock driver from 4.4, not touched at all.  So I
think the claim being made there is incorrect."
It shouldn't be.  If it is that could be a real problem.  What symbol are you
having difficulty with?


Comment 25 Jeff Garzik 2007-03-28 01:26:24 UTC
Agreed with comment #24: please give us more info on the e1000.ko loading
problem (perhaps in a fresh bugzilla, since its a separate issue?).

That definitely should not be happening, and warrants investigation.

Comment 26 Auke Kok 2007-03-28 15:08:58 UTC
In reply to #24 and #25, the tester who reported this issue to us initially
retested the issue and reports (quote):

"The reason I was having a problem before is because I was running the “up” 
driver on the “smp” kernel which is incorrect."

Apologies for the confusion this caused.

Comment 27 Neil Horman 2007-03-28 16:13:37 UTC
Ok, so from comment #26, can we take then that using the proper module from a
release n-1 then works properly on release n?  Thats what I would expect.  

As such, this problem then is back to the summary I provided in comment #18. 
You don't need to rebuild modules across releases since we maintain ABI.  The
only reason you should need to rebuild is if you are providing source code
changes in your out-of-tree driver, in which case problems like the one being
addressed here can be fixed up at the same time. 

 The only thing we have left to tackle is the "forseeablility" issue, which Jeff
raised in comment 22.  It would be nice if we could in some way, shape of form,
provide a heads up to partners to let them know that a change we make will
result in a required change to out-of-tree drivers so that buildable source code
can be released for those drivers in parallel with the new kernels.  I haven't
thought this all the way though yet, but just to get it out there, what about a
mechanism by which partners register a list of header files to monitor with us
to watch for changes.  We could automatically scan the commits list for matches
on those files when they change and use those triggers to generate an email to
partners to warn them that they should do a rebuild, and that code modification
may be needed in response to bz X (where X is extracted from the commit message).

Comment 28 Trung V Nguyen 2007-03-29 15:17:06 UTC
From IBM System x, it is critical that RH team works with Intel team to help 
resolve / prevent any potential issue from this situation, as RHEL 4.5 is in 
System x server plan and we have many Intel-based GbE products. Thanks !

Comment 29 Jeff Garzik 2007-03-29 15:37:10 UTC
Responding to IBM in Comment #28: RHEL 4.5 supports Intel-based GbE products out
of the box.  You don't need a third party, unsupported driver.

Comment 30 John Ronciak 2007-03-29 15:47:21 UTC
The third party driver being talked about is fully supported and an OEM can use
them with full support from Intel.  An OEM may want to use a newer driver which
contain a bug fix that RH has either chosen not to take or due to RH's long
delays in getting fixes out.  They may also need to use a newer fully supported
driver to support new HW on there system that RH has not updated their for.  The
fixes or HW additions are normally already upstream.  This is clearly the case
here for IBM as well as Dell has stated above.  There are others that use our
out-of-tree drivers as well for the same reasons stated here.

Comment 31 Samuel Benjamin 2007-03-29 16:37:20 UTC
What Dell has informed me in this matter is that they do not wish to maintain
two separate code packages for the same driver. One that will have to be
provided for versions prior to 4.5 and another one for versions 4.5 and forward.
This will create confusion and potential customer calls for support  if they
encounter the compiler error when building a newer driver.

Comment 32 Neil Horman 2007-03-29 17:25:18 UTC
Sammy, they won't have to if intel updates this properly.  Intel can fix this by
ifdefing out the definition of irq_handler_t in their driver for all kernel
releases greater than or equal too the 4.5 kernel.  We just need to provide them
release level granularity to check against in the preprosessor.  Dell will be
able to compile the same source package from RHEL4 GA forward after that is fixed.

Comment 33 John Ronciak 2007-03-29 17:33:30 UTC
We are working on a solution for the e1000 driver that will indeed allow a
single driver to be compiled for all RHEL4 (and other) versions of the kernel
including 4.5 where this problem was introduced.  Intel OTC engineers are
driving this and once it's finalized Intel LAD will rev our out-of-tree driver
to support the new solution.  This will be resolved to allow a single driver to
work everywhere just as it has been with our out-of-tree (stand-alone) driver. 
Once we rev the driver I'll reply again to this BZ with the fixed version number.

Comment 34 Jeff Garzik 2007-03-29 19:06:31 UTC
Responding to comment #30:

The out-of-tree Intel driver is ABSOLUTELY NOT SUPPORTED by Red Hat.  Please
stop trying to confuse the issue.

Red Hat Bugzilla is not an Intel driver support channel.

Red Hat supports the Intel GbE ethernet driver shipped with RHEL.  If there are
missing bug fixes or features in that driver, open a separate bugzilla.

Comment 35 Konrad Rzeszutek 2007-03-29 20:20:34 UTC
(In reply to comment #32)
> releases greater than or equal too the 4.5 kernel.  We just need to provide them
> release level granularity to check against in the preprosessor.  Dell will be

FYI: Neil: This "release level granuality" is actually tracked in BZ 232533 for
RHEL4 U6.


Comment 36 Neil Horman 2007-03-29 20:34:32 UTC
perfect.  Thanks Konrad.  That should be what exactly what we need to provide to
vendors with a way to provide common source bases accross RHEL update which may
potentially alter API's

Comment 37 John Ronciak 2007-03-29 20:48:10 UTC
Commenting on #34, please read the entire sentance.  The part you didn't read is
"with full support from Intel.".  Nothing was stated about RH support of our
stable out of tree drivers.  So it is very clear as to what I said and meant. 
We have never nor do we expect RH to support our stand-alone (out-of-tree)
drivers.  We just don't expect compiles to break between updates which is what
happened here.  Our (RH and Intel) OEM's demand that be the case.  Procedures
are being put in place to correct this going forward.

Comment 38 Frank Novak 2007-03-29 21:23:06 UTC
re: comment #36, now just need to get this into 4.5.... :-)

Comment 39 Neil Horman 2007-03-29 23:26:15 UTC
Responding to comment #37:
Setting the support issue aside (since it indeed clear that intel is supporting
this out of tree variant of their driver), You're expectation that compiles wont
break is in error.  We guarantee ABI, not API, as we have said several times in
this thread now.  For us to maintain an API to the degree required to  guarantee
that any and all out of tree drivers continue to compile accross all updates of
a given RHEL stream severely inhibits our ability to incorporate upstream
changes to any of our RHEL releases.  

We are providing a method to precisely determine  the exact release of a given
kernel in bz 232533, as per comment 35.  Given that feature, out of tree
maintainers can selectively ifdef their code to accomodate API changes in
forward updates.  In comment 12 its clear that your code is released to your
OEMs periodically to accomodate new hardware.  From this we can see that:

a) you can manage API changes as they occur in the kernel
b) you have the opportunity to release them to your oems on a regular basis

So despite what you demand, you don't seem to actually need a stable API.  All
you need (in addition to (a) and (b) above) is the fair warning in advance of an
update release on our part to ensure that your latest driver builds with our
latest kernel.  Once you have that sufficent lead time, you can correct for any
API changes, and release to your OEMs to prevent the problem described in this
bug, without impacting Red Hats ability to keep up with upstream changes.

This lead time, or "forseeability" as Jeff has previously described is what we
need to focus on and improve.

Comment 40 John Ronciak 2007-03-29 23:39:16 UTC
Agreed.  A #define (or something like it) will work as you describe and we can
work with that.  Hopefully with enough notice that we won't have broke drivers
going out to our OEMs.

Comment 41 Jay Turner 2007-04-04 23:39:34 UTC
QE ack for 4.5.  Very simple changes to maintain some version information in the
package.  Is really quite late in the game for this kind of change, but since
the risk is minimal and there's customers/partners banking on the change, QE is
OK with it.

Comment 42 John Jarvis 2007-04-05 18:22:18 UTC
*** Bug 232533 has been marked as a duplicate of this bug. ***

Comment 43 Jason Baron 2007-04-05 19:21:10 UTC
committed in stream U5 build 53. A test kernel with this patch is available from
http://people.redhat.com/~jbaron/rhel4/


Comment 46 Issue Tracker 2007-04-19 21:34:41 UTC
A fix for this issue has been included in RHEL4.5. Please test the Release
Candidate of RHEL4.5, which was released today to Partners, and let us know
if the problem is resolved. The Release Candidate can be downloaded from
here:

ftp://partners.redhat.com/af38ac4316ba20df2dec5f990913396d

Internal Status set to 'Waiting on Customer'
Status set to: Waiting on Client
Resolution set to: 'RHEL 4.5'

This event sent from IssueTracker by gcase 
 issue 117192

Comment 48 Red Hat Bugzilla 2007-05-08 05:02:11 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0304.html