Bug 140405 (IT#50093)

Summary: panic: e100_free_tcb_pool() called when no pool allocated
Product: Red Hat Enterprise Linux 3 Reporter: Steve Conklin <sconklin>
Component: kernelAssignee: John W. Linville <linville>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: petrides, riel, tao
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-12-20 20:56:59 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Patch supplied with Issue
none
The patch (really!) none

Description Steve Conklin 2004-11-22 19:37:16 UTC
Description of problem:
From IT#50093:

I've reproduced this locally.  The crash is ultimately
due to an assumption in e100_free_tcb_pool() that the
pool is, indeed, allocated upon entry to the function.
 Or, it could be said that the calling function is at
fault for calling to free the pool when none is
allocated.  Either way, e100 tries to dereference a
NULL pointer, then Bad things happen.
[snip]
The ethtool implementation in this generation of
e100 has some problems dealing with failures.  The
e100_open() function can fail if one of its memory
allocations fails, and most ethtool commands that
change parameters will do a "down up" cycle to free
and reallocate the ethtool-modifiable parameters, in
this case, the tx ring.

I've generated a patch that eliminates the panic, and
adds some error returns for a few ethtool commands
(-G tx being one).  This is still not quite right,
though, as an "ethtool -g eth0" after such a failure
will claim that there are however many tx buffers
allocated as was requested in the previous (failing)
-G request.  The device also will not function,
even though it's nominally up and running.

This driver is outdated; the current driver from Intel is a complete
rewrite, so
I'm not sure how much effort we want to put into fixing this one.

The patch is for the RH 2.4.21-18.EL kernel.

(see the IT issue for a discussion of whether the current
Intel driver can be used to replace this older version.

Version-Release number of selected component (if applicable):
RH 2.4.21-18.EL

How reproducible:
Sometimes

Steps to Reproduce:
1.run "ethtool -G eth* tx 1024" on an e100 card
2.
3.
    

Actual Results:  Sometimes the kernel panics

Expected Results:  Should never panic

Additional info:

Comment 1 Steve Conklin 2004-11-22 19:38:44 UTC
Note - I didn't recreate this - the person who filed the issue did.

Patch coming.

Comment 2 Steve Conklin 2004-11-22 19:39:35 UTC
Created attachment 107227 [details]
Patch supplied with Issue

Comment 3 John W. Linville 2004-11-22 21:45:34 UTC
I think the attachment is busted -- it looks like nothing but HTML to
me...

Comment 4 Steve Conklin 2004-11-22 21:56:32 UTC
Created attachment 107250 [details]
The patch (really!)

Try this. Operator error.

Comment 5 John W. Linville 2004-11-22 22:07:23 UTC
As you say, the current version (U4) of the driver is quite different
-- e100_free_tcb_pool() doesn't even exist anymore.  The attached
patch won't apply to the current sources.

While I appreciate the patch, I'm going to have to close this as
NEXTRELEASE (U4)...

Comment 6 Ernie Petrides 2004-11-22 23:17:11 UTC
Since U4 is not the next release (RHEL4 is), and since U4 is not
actually released yet (it's still in beta), I'm reverting this
bug to MODIFIED state.  The upgrade of the e100 driver (committed
in kernel version 2.4.21-20.11.EL) has presumably resolved this bug,
which will be set to CLOSED/ERRATA automatically when U4 is released.


Comment 7 John W. Linville 2004-11-29 20:39:21 UTC
Re-opening due to likely back-rev of e100 driver in RHEL3...

Comment 8 Ernie Petrides 2004-12-02 03:03:05 UTC
A fix for this problem has just been committed to the RHEL3 U4
patch pool this evening (in kernel version 2.4.21-27.EL).

The fix was applied to the back-rev'ed (to 2.3.43-k1) e100 driver.


Comment 9 John Flanagan 2004-12-20 20:56:59 UTC
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-550.html