Description of problem: From IT#50093: I've reproduced this locally. The crash is ultimately due to an assumption in e100_free_tcb_pool() that the pool is, indeed, allocated upon entry to the function. Or, it could be said that the calling function is at fault for calling to free the pool when none is allocated. Either way, e100 tries to dereference a NULL pointer, then Bad things happen. [snip] The ethtool implementation in this generation of e100 has some problems dealing with failures. The e100_open() function can fail if one of its memory allocations fails, and most ethtool commands that change parameters will do a "down up" cycle to free and reallocate the ethtool-modifiable parameters, in this case, the tx ring. I've generated a patch that eliminates the panic, and adds some error returns for a few ethtool commands (-G tx being one). This is still not quite right, though, as an "ethtool -g eth0" after such a failure will claim that there are however many tx buffers allocated as was requested in the previous (failing) -G request. The device also will not function, even though it's nominally up and running. This driver is outdated; the current driver from Intel is a complete rewrite, so I'm not sure how much effort we want to put into fixing this one. The patch is for the RH 2.4.21-18.EL kernel. (see the IT issue for a discussion of whether the current Intel driver can be used to replace this older version. Version-Release number of selected component (if applicable): RH 2.4.21-18.EL How reproducible: Sometimes Steps to Reproduce: 1.run "ethtool -G eth* tx 1024" on an e100 card 2. 3. Actual Results: Sometimes the kernel panics Expected Results: Should never panic Additional info:
Note - I didn't recreate this - the person who filed the issue did. Patch coming.
Created attachment 107227 [details] Patch supplied with Issue
I think the attachment is busted -- it looks like nothing but HTML to me...
Created attachment 107250 [details] The patch (really!) Try this. Operator error.
As you say, the current version (U4) of the driver is quite different -- e100_free_tcb_pool() doesn't even exist anymore. The attached patch won't apply to the current sources. While I appreciate the patch, I'm going to have to close this as NEXTRELEASE (U4)...
Since U4 is not the next release (RHEL4 is), and since U4 is not actually released yet (it's still in beta), I'm reverting this bug to MODIFIED state. The upgrade of the e100 driver (committed in kernel version 2.4.21-20.11.EL) has presumably resolved this bug, which will be set to CLOSED/ERRATA automatically when U4 is released.
Re-opening due to likely back-rev of e100 driver in RHEL3...
A fix for this problem has just been committed to the RHEL3 U4 patch pool this evening (in kernel version 2.4.21-27.EL). The fix was applied to the back-rev'ed (to 2.3.43-k1) e100 driver.
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2004-550.html