Bug 232526
Summary: | Urgent: RHEL4u5 beta1/beta2 backport of irq_handler_t breaks all e1000, ixgb compile | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Auke Kok <auke-jan.h.kok> |
Component: | kernel | Assignee: | Jeff Garzik <jgarzik> |
Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 4.0 | CC: | agospoda, bruce.w.allan, charles_rose, fnovak, jesse.brandeburg, jfeeney, john.ronciak, karen.skweres, konradr, linville, ltroan, nhorman, peterm, sbenjamin, thomas_chenault, wwlinuxengineering |
Target Milestone: | --- | Keywords: | Regression, Reopened |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | RHBA-2007-0304 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2007-05-08 05:02:10 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Auke Kok
2007-03-15 23:25:06 UTC
Per Intel, this is a regression. This bugzilla has Keywords: Regression. Since no regressions are allowed between releases, it is also being proposed as a blocker for this release. Please resolve ASAP. Intel: Our engineering team has reviewed this request. Based on our analysis, there is no regression from the drivers that Red Hat currently ships and supports on RHEL4. The report, as we understood it, is with out-of-tree drivers, not RHEL provided drivers. If you beleive this analysis is in error, then please provide a clarfication. For now, this will be closed as notabug. While it is true that this problem only effects out of tree drivers it does affect _all_ out of tree drivers, not just those from Intel. All (and I mean all!) of our OEMs use our out of tree drivers on this version of RHEL. They need it and want it since Red Hat has not resobably driver update mechanism. So some OEMs even ship these drivers, like Dell. They are going to be very unahppy with this decision. I wil be sending mail to all of them today explaining what RH has done here. Please be prepared to answer there mails. There will be plenty of them. Know also that change will break all out of tree wireless drivers as well. This _is_ very much a problem. Also, I guess it's OK for RH to break kABI but when others rely upon it not being broke RH ignores that. If the out-of-tree drivers for RHEL4 fail to compile, this would be a *huge* deal for Dell as we have several of those being loaded in the factory and our redeployment media on the basis of our baseline being RHEL 4.0 Gold (Yes, that would be kernel 2.6.9-5.EL). This is a first-pass request that this issue be reopened and the suggested fix by Ron be implemented or RHEL 4.5 will be DOA (Dead On Arrival) to Dell. Please note that in the interest of time, I am making the above statements on behalf of the Dell RHEL4 project lead Charles Rose. I am sure Charles wil have more to say once he is brought into the loop with this issue which should be in the next 12 hours (as he's in the Indian Time Zone). Reopening this bug. To add to Comment #9, This is a blocker for Dell. Can Dell be more specific about precisely what problems you are seeing, with which specific drivers? And is Dell not using the RHEL4.5 e1000 and ixgb drivers? (In reply to comment #11) > Can Dell be more specific about precisely what problems you are seeing, with > which specific drivers? e1000 version 7.4.35 from http://e1000.sf.net/ throws out this message when compiled against 2.6.9-50: In file included from /home/charles/rhel4.5-problem/e1000-7.4.35/src/e1000.h:35, from /home/charles/rhel4.5-problem/e1000-7.4.35/src/e1000_main.c:51: /home/charles/rhel4.5-problem/e1000-7.4.35/src/kcompat.h:1119: error: redefinition of typedef 'irq_handler_t' include/linux/interrupt.h:36: error: previous declaration of 'irq_handler_t' was here make[2]: *** [/home/charles/rhel4.5-problem/e1000-7.4.35/src/e1000_main.o] Error 1 make[1]: *** [_module_/home/charles/rhel4.5-problem/e1000-7.4.35/src] Error 2 make: *** [default] Error 2 > > And is Dell not using the RHEL4.5 e1000 and ixgb drivers? While we would ideally like to support our platforms with the native drivers on RHEL, there are usually bug fixes and/or support to newer hardware in the vendor's drivers (in this case, Intel) which are not committed to the RHEL products at the required time. For this reason, the vendor's stable drivers are always newer than the native drivers in RHEL. Vendors package the driver in the DKMS (http://linux.dell.com/projects.shtml#dkms) format which is consumed by the Dell factory, the Dell Server Assistant CDROM and published on the Dell support website. Customers download these drivers and use the DKMS framework to update their drivers. The issue in discussion would break this driver update in our factories, the Dell Server Assistant based Installs and customer initiated driver updates. To add to comment #12, DKMS drivers are packaged to work with all kernels of a particular product as the kabi would be maintained across updates. So, we expect the same dkms driver package to compile on RHEL 4 GA through RHEL 4.5 and future versions. Please note this is /not/ a kABI issue. This is a source code API issue. Once compiled, the out-of-tree Intel drivers will load just fine on RHEL4.5. The API used by out of tree drivers has now changed due a change made to interrupt.h which out to tree need to include. This prevents the driver from compiling (see #12 above). Call it whatever you'd like, the API that a lot of out of tree drivers (not just Intel drivers) rely on changed which prevents them compiling. To be more specific, this only affects out-of-tree drivers that randomly pull new symbols from recent upstream kernels, without notifying or coordinating with Red Hat. I think what Jeff is saying is that because this is _not_ a kabi issue, the out of tree drivers that you may be shipping alongside RHEL should not need to be recompiled at all to work properly with RHEL4.5. Modprobe the previously compiled driver (possibly with a -f and all should be well). If, in the event you do have some reason to recompile an out of tree driver for RHEL4.5, you should be able to take the extra 5 minutes to remove the (now redundant) definition of irq_handler_t from your sources. To summarize: 1) This isn't a kabi break. Any out of tree modules that you have previously built for RHEL4 should continue to work without recompilation on the RHEL4.5 kernel update. 2) In the event that you do have some need to recompile your out of tree driver, then you are clearly modifying yoursource code anyway, and you should be able to take the extra few minutes to remove the redundant irq_handler_t definition from your sources and your compile problem will be resoved moving forward. All that needs to be done here is to fix up any out of tree drivers with this problem, and this is all resolved. And even that effort can be deferred, since this is not a binary compatibility problem, but rather a source code compatibility issue. So don't rebuild anything until you have some other reason to do so, and when you do, go ahead and fix this up as well. The e1000 driver did not add any new symbols at all nor were any randomly pulled into it. The stand-alone drivers which Intel provide to Dell compile and run on a very wide range of kernel version. Everything from RHEL3U8 to RHEL5.1 and all of the 2.6 upstream kernels as well as 2.4.21 and later kernels. Jeff very much knows how this works for our e1000 driver. This is done for Dell and other because RH is unable to rev the drivers fast enough for Dell's HW release. So Dell and other OEMs provide these new drivers to support new Intel HW which are on their motherboards. RH is always offered to take the driver updates at the same time the driver chagnes are accepted ustream. Most times RH decides not to include the changes for one reason or another which puts the OEMs into this position. We made no change to e1000 driver but due to the interrupt.h change in a 4.5 beta the smae driver will no longer compile. This is what Dell as pointed out above. So the API has changed (via interrupt.h change) which breaks drivers that were using it. I thought that was why changes like this were not accepted into updates. Additions are fine but changes to the existing exported symbols were not. Guess I was wrong. Responding to Neil (comment #18): Close. The issue is not simply that this is a source code issue. In most cases, especially kernel-wide cases like this one, Red Hat should not /change/ the general facilities in the kernel-wide source code API. It's about forseeability. It is logically impossible for Red Hat to determine whether NEW ADDITIONS to the C namespace will break out-of-tree drivers over which we have no control. Responding to John (comment #19): irq_handler_t was most definitely pulled from a more recent upstream kernel. The irq handler signature that has existed since before Linux 1.x.x days was only changed in Linux 2.6.19 (Nov 29, 2006), in this commit: > commit da482792a6d1a3fbaaa25fae867b343fb4db3246 > Author: David Howells <dhowells> > Date: Thu Oct 5 13:06:34 2006 +0100 > > IRQ: Typedef the IRQ handler function type The current symbol conflict exists because both Intel and Red Hat independently pulled a symbol from a more-recent upstream kernel. Every RHEL 2.1, 3, 4, and 5 update stream has included NEW ADDITIONS to the C namespace. Every single one. The open issue is mainly to educate everyone on precisely why this broke, and to develop a good process to deal with this issue. More background: irq_handler_t was added to out-of-tree e1000 version 7.3.20, dated Dec 15, 2006, to the C header file specifically designed by Intel to mitigate changes between different kernel versions: src/kcompat.h The relevant code in Intel's src/kcompat.h clearly indicates it was pulled from recent upstream kernel 2.6.19, as I indicated: #if ( LINUX_VERSION_CODE < KERNEL_VERSION(2,6,19) ) ... typedef irqreturn_t (*irq_handler_t)(int, void*, struct pt_regs *); ... Jeff, we never said taht we didn't add the typedef for irqreturn_t. We did as you show. The point is and what Dell is asking as well is the exact version of this driver compiled and worked on RHEL4.4 and early RHEL4.5 betas. Then the libata support is backported and the same exact version that compile fine earlier is now broke due to the backport of the interrupt.h typedef. This is _the_ API change that broke the out of tree drivers. It's not just ours, it's a lot of them. Dell needs to be able to have these same drivers (some might just need to be recompiled as there is no update for them) compile across all of the versions of RHEL4 that they are shipping. It's more than any one RHEL version. They need the same driver to work across them all. Our stand-alone driver does this but now it's been broke due to a change made to the latest RHEL4.5 beta kernel. Also, in reply to #18 above,we took the driver (e1000.ko) that came with REHL4.4 and tried to insmod it into RHEL4.5 GA Snapshot3 and it doesn't load due to a symbol error. This was the stock driver from 4.4, not touched at all. So I think the claim being made there is incorrect. More on #18, having to add more to the compat stuff to support individual updates has not been done before. As I stated above we support a very large set to kernels but have never had to change support for RHEL updates which is the case here. "This is _the_ API change that broke the out of tree drivers..." As far as I know, we don't support Source level API consistency, only ABI compatibility. We simply can't guarantee that an out of tree driver will always compile with a given RHEL stream, even if it is itself unchanged. "Also, in reply to #18 above,we took the driver (e1000.ko) that came with REHL4.4 and tried to insmod it into RHEL4.5 GA Snapshot3 and it doesn't load due to a symbol error. This was the stock driver from 4.4, not touched at all. So I think the claim being made there is incorrect." It shouldn't be. If it is that could be a real problem. What symbol are you having difficulty with? Agreed with comment #24: please give us more info on the e1000.ko loading problem (perhaps in a fresh bugzilla, since its a separate issue?). That definitely should not be happening, and warrants investigation. In reply to #24 and #25, the tester who reported this issue to us initially retested the issue and reports (quote): "The reason I was having a problem before is because I was running the “up” driver on the “smp” kernel which is incorrect." Apologies for the confusion this caused. Ok, so from comment #26, can we take then that using the proper module from a release n-1 then works properly on release n? Thats what I would expect. As such, this problem then is back to the summary I provided in comment #18. You don't need to rebuild modules across releases since we maintain ABI. The only reason you should need to rebuild is if you are providing source code changes in your out-of-tree driver, in which case problems like the one being addressed here can be fixed up at the same time. The only thing we have left to tackle is the "forseeablility" issue, which Jeff raised in comment 22. It would be nice if we could in some way, shape of form, provide a heads up to partners to let them know that a change we make will result in a required change to out-of-tree drivers so that buildable source code can be released for those drivers in parallel with the new kernels. I haven't thought this all the way though yet, but just to get it out there, what about a mechanism by which partners register a list of header files to monitor with us to watch for changes. We could automatically scan the commits list for matches on those files when they change and use those triggers to generate an email to partners to warn them that they should do a rebuild, and that code modification may be needed in response to bz X (where X is extracted from the commit message). From IBM System x, it is critical that RH team works with Intel team to help resolve / prevent any potential issue from this situation, as RHEL 4.5 is in System x server plan and we have many Intel-based GbE products. Thanks ! Responding to IBM in Comment #28: RHEL 4.5 supports Intel-based GbE products out of the box. You don't need a third party, unsupported driver. The third party driver being talked about is fully supported and an OEM can use them with full support from Intel. An OEM may want to use a newer driver which contain a bug fix that RH has either chosen not to take or due to RH's long delays in getting fixes out. They may also need to use a newer fully supported driver to support new HW on there system that RH has not updated their for. The fixes or HW additions are normally already upstream. This is clearly the case here for IBM as well as Dell has stated above. There are others that use our out-of-tree drivers as well for the same reasons stated here. What Dell has informed me in this matter is that they do not wish to maintain two separate code packages for the same driver. One that will have to be provided for versions prior to 4.5 and another one for versions 4.5 and forward. This will create confusion and potential customer calls for support if they encounter the compiler error when building a newer driver. Sammy, they won't have to if intel updates this properly. Intel can fix this by ifdefing out the definition of irq_handler_t in their driver for all kernel releases greater than or equal too the 4.5 kernel. We just need to provide them release level granularity to check against in the preprosessor. Dell will be able to compile the same source package from RHEL4 GA forward after that is fixed. We are working on a solution for the e1000 driver that will indeed allow a single driver to be compiled for all RHEL4 (and other) versions of the kernel including 4.5 where this problem was introduced. Intel OTC engineers are driving this and once it's finalized Intel LAD will rev our out-of-tree driver to support the new solution. This will be resolved to allow a single driver to work everywhere just as it has been with our out-of-tree (stand-alone) driver. Once we rev the driver I'll reply again to this BZ with the fixed version number. Responding to comment #30: The out-of-tree Intel driver is ABSOLUTELY NOT SUPPORTED by Red Hat. Please stop trying to confuse the issue. Red Hat Bugzilla is not an Intel driver support channel. Red Hat supports the Intel GbE ethernet driver shipped with RHEL. If there are missing bug fixes or features in that driver, open a separate bugzilla. (In reply to comment #32) > releases greater than or equal too the 4.5 kernel. We just need to provide them > release level granularity to check against in the preprosessor. Dell will be FYI: Neil: This "release level granuality" is actually tracked in BZ 232533 for RHEL4 U6. perfect. Thanks Konrad. That should be what exactly what we need to provide to vendors with a way to provide common source bases accross RHEL update which may potentially alter API's Commenting on #34, please read the entire sentance. The part you didn't read is "with full support from Intel.". Nothing was stated about RH support of our stable out of tree drivers. So it is very clear as to what I said and meant. We have never nor do we expect RH to support our stand-alone (out-of-tree) drivers. We just don't expect compiles to break between updates which is what happened here. Our (RH and Intel) OEM's demand that be the case. Procedures are being put in place to correct this going forward. re: comment #36, now just need to get this into 4.5.... :-) Responding to comment #37: Setting the support issue aside (since it indeed clear that intel is supporting this out of tree variant of their driver), You're expectation that compiles wont break is in error. We guarantee ABI, not API, as we have said several times in this thread now. For us to maintain an API to the degree required to guarantee that any and all out of tree drivers continue to compile accross all updates of a given RHEL stream severely inhibits our ability to incorporate upstream changes to any of our RHEL releases. We are providing a method to precisely determine the exact release of a given kernel in bz 232533, as per comment 35. Given that feature, out of tree maintainers can selectively ifdef their code to accomodate API changes in forward updates. In comment 12 its clear that your code is released to your OEMs periodically to accomodate new hardware. From this we can see that: a) you can manage API changes as they occur in the kernel b) you have the opportunity to release them to your oems on a regular basis So despite what you demand, you don't seem to actually need a stable API. All you need (in addition to (a) and (b) above) is the fair warning in advance of an update release on our part to ensure that your latest driver builds with our latest kernel. Once you have that sufficent lead time, you can correct for any API changes, and release to your OEMs to prevent the problem described in this bug, without impacting Red Hats ability to keep up with upstream changes. This lead time, or "forseeability" as Jeff has previously described is what we need to focus on and improve. Agreed. A #define (or something like it) will work as you describe and we can work with that. Hopefully with enough notice that we won't have broke drivers going out to our OEMs. QE ack for 4.5. Very simple changes to maintain some version information in the package. Is really quite late in the game for this kind of change, but since the risk is minimal and there's customers/partners banking on the change, QE is OK with it. *** Bug 232533 has been marked as a duplicate of this bug. *** committed in stream U5 build 53. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/ A fix for this issue has been included in RHEL4.5. Please test the Release Candidate of RHEL4.5, which was released today to Partners, and let us know if the problem is resolved. The Release Candidate can be downloaded from here: ftp://partners.redhat.com/af38ac4316ba20df2dec5f990913396d Internal Status set to 'Waiting on Customer' Status set to: Waiting on Client Resolution set to: 'RHEL 4.5' This event sent from IssueTracker by gcase issue 117192 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0304.html |