Bug 147039

Summary: System freezes shortly after activating 3c590 NIC card
Product: Red Hat Enterprise Linux 3 Reporter: Thomas Payerle <payerle>
Component: kernelAssignee: John W. Linville <linville>
Status: CLOSED CANTFIX QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 3.0CC: petrides, riel
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-09-16 14:48:17 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Assorted details re bug 147039
none
results of sysreport
none
jwltest-3c59x-3c905b_1-reset.patch none

Description Thomas Payerle 2005-02-03 21:38:39 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.3)
Gecko/20050104 Red Hat/1.4.3-3.0.7

Description of problem:
I have a Dell Optiplex GX1p had been running RHL7.x for years and
upgraded to REL AS 3 from update 1 CDs.  Only hardware change during
upgrade was to make old drive a slave and add new master HDD.

System has eth0 as 3c905B, and eth1 as 3c590
The former is on a public internet, the latter on a small private net.
Not set up to do routing or anything fancy.

Both were detected and configured during installation from CD, and
install process proceeded normally to first reboot.  After reboot,
graphical 1st boot up screen comes up, and system froze before I could
finish reading the license (a few minutes after X came up).  Screen
does not change, but no response from mouse or keyboard (numlock LED
would work, that is it.  CAp lock did not).  Eventually was able to
boot to old RH7x setup, and it behaved normally for several hours.
I could not even ping the machine after it froze.

Booting to single-user mode, the system seems to stay up indefinitely
(15+ minutes, 3-5 times longer than onset of problems when occur).
I manually ran S* scripts from /etc/rc5.d, and system froze several
minutes after done.  By selectively running scripts, narrowed down
to S10network script as culprit (running S0* system stays up, run
S10network after others system locks up).

I renamed /etc/sysconfig/network-scripts/ifcfg-eth1, and system would
come up in full X11 multiuser mode, and stay up indefinitely (had up
w/out probs on order 12 hours or more).  If I run ifconfig to config
the 3c590 NIC, system will freeze up several minutes later.

Updated with up2date will all available patches, so should be a 
current Update4/Taroon now, and problem persists.

Rebooting to single user mode, I try running S0* in /etc/rc5.d, then
ifup eth1 (after restoring original ifcfg-eth1).  So eth0 was not
configured.  System still locked up after a couple of minutes.




Version-Release number of selected component (if applicable):
kernel-2.4.21-9.EL

How reproducible:
Always

Steps to Reproduce:
1.boot without eth1 activated
2.ifconfig eth1 172.17.172.17 netmask 255.255.0.0 broadcast 172.17.255.255
3.system will freeze after a couple of minutes
    

Actual Results:  machine would not respond to keyboard, mouse, or
pings.  Screen
stayed normal

Expected Results:  system should still run

Additional info:

Additional details to be supplied as attachments

Comment 1 Thomas Payerle 2005-02-03 21:50:03 UTC
Created attachment 110627 [details]
Assorted details re bug 147039

Comment 3 Suzanne Hillman 2005-02-04 21:08:56 UTC
Internal RFE bug #147218 entered; will be considered for future releases.

Comment 4 Ernie Petrides 2005-02-07 22:06:28 UTC
The 3c59x driver was updated in RHEL3 U4.  Please retest this on a recent
release of RHEL3 and let us know if the problem is resolved.  The latest
RHEL3 kernel is 2.4.21-27.0.2.EL, which was released as RHSA-2005:043.

Thanks in advance.  -ernie


Comment 5 Thomas Payerle 2005-02-07 23:04:16 UTC
Just tried with 2.4.21-27.0.2.EL, and same symptoms.  (actually, some
of the previously reported stuff may also had been with that kernel ---
I waited until did a full update before submitting, but did not notice
that the new kernel was not default.  So had probably been mixing the
two kernel versions on previous reports).  But just now, I verified was
in the newer kernel and reproduced the symptoms.

Comment 6 John W. Linville 2005-02-22 21:59:21 UTC
3c590 and 3c905B are handled by same driver...very strange...I'm
curious what happens if you don't bring-up the 3c905B but bring-up the
3c590 instead?  Also curious about if you physically swap the cards'
positions?  What about when you remove the 3c905B (i.e. only use the
3c590)?

Could you attach the output from running sysreport on the box (w/ both
cards still installed)?  Thanks!

Comment 7 Thomas Payerle 2005-02-22 23:44:21 UTC
Created attachment 111316 [details]
results of sysreport

as per request

Comment 8 Thomas Payerle 2005-02-23 00:16:15 UTC
I realize use same driver.  The 3c905 (eth0) is on the motherboard, which makes
it awkward to remove or swap position with 3c590:) (3c590 is eth1 in a PCI slot).

As mentioned in last paragraph of initial problem description,  I had tried
bringing up system in single user mode, running init scripts in etc/rc5.d below
S10network, then restoring ifcfg-eth1 in /etc/sysconfig/network-scripts and
running ifup eth1 manually (eth0 should not been configured).  System still
crashed shortly afterwards.  (ifcfg-eth1 was renamed to keep it from being 
configured automatically).  This is as close as I can come to booting without
the 3c905 (the card is present and detected, but unconfigured).  On the other
hand, this system stays up and is stable with the 3c590 in it as long as I do
not attempt to configure it.

Despite the fact that the initial description states the kernel as
kernel-2.4.21-9.EL, I am fairly confident that the test above was done with
the 2.4.21-27.0.2.EL kernel.  (I did a full up2date, including kernel before
submitting report, so 2.4.21-27 was installed on the system, but I did not
immediately notice that grub was still set to use the old kernel by default.
Since I would have had to manually alter the grub settings to get into single
user mode, I am fairly confident the new kernel was used for that.  It was
just when I did not bother to hang around during the reboot afterwards that
the old kernel was run, and that's what appeared in the uname output when I
submitted the initial report).

Comment 9 John W. Linville 2005-02-28 21:00:14 UTC
Not really sure where to start...hmmm...

Let's try to start by reverting the 3c59x driver to what was available
in RH 7.3...maybe that will be informative?  Anyway, prebuilt kernels
and a source RPM are available here:

   http://people.redhat.com/linville/kernels/rhel3/

Please give that a try and let me know the results...thanks!

Comment 10 Thomas Payerle 2005-03-01 22:53:06 UTC
Tried your  kernel-2.4.21-28.EL.jwltest.2.i686.rpm and got same
results.  Booted with /etc/sysconfig/network-scripts/ifcfg-eth1 renamed
(so only 3c905, eth0 configured), and system comes up OK.  When 
ficonfig eth1 to set an IP address et al, system freezes shortly 
thereafter.  

Repeated the single-user test (boot to single user, run rc5.d scripts
before S10network manually, rename ifcfg-eth1 back, ifup eth1 (so 
eth1/3c590 is configured but not eth0/3c905), and system freezes

BTW, the previous OS (before upgrade) was RHL7.1 according to /etc
redhat-release, running kernel 2.4.3-12 (I was not using RHN or
up2date, so was manually patching, as things appeared necessary,
and as I was only user of the box and no network services ran on it
local priv escalations were not a big concern).  Not sure if this
means anything, but thought would mention it.

Comment 11 John W. Linville 2005-03-03 18:41:45 UTC
Thomas,

You stepped-off the update train a lot earlier than I thought! :-)

I revamped the patch to go all the way back to what shipped w/
2.4.3-12.  Pre-built test kernels are available at the same link that
was in comment 9.  Please give those a try and report the results. 
Thanks!

Comment 12 Thomas Payerle 2005-03-06 01:36:28 UTC
The older 3c590 patch seems to work.  System was rebooted, enabled
the interface, the system has been up for about 24hours now.  The
3c590 interface works.



Comment 13 John W. Linville 2005-03-07 22:10:09 UTC
Thomas,

Thanks for the data point!  Unfortunately, we are far from narrowing
this down... :-(

I have another version I'd like for you to try.  This one is based on
kernel version 2.4.9-37 which is at the next "change plateau" which
the 3c59x driver went through in RH7.1.

The kernels are available at the same link that was in comment 9. 
Please give them a try and report the results.  Thanks!

Comment 14 Thomas Payerle 2005-03-11 22:43:08 UTC
Sorry about the delay in responding.

The 2.4.9-37 patches reintroduced the problem --- system booted up
fine, but on configuring the 3c590 the system froze up a few minutes
later.  An additional datum, although the shell prompt came back after
the ifconfig and ifconfig -a showed the 3c590 appropriately configured
(at least from the quick glance I got), the interface was not working 
-- e.g. pinging a host failed.  Probably not surprising, but don't 
believe I tested that before.  I won't swear ifconfig -a showed the 
interface as UP and RUNNING, as I was feeling hopeful would stay up
and was mainly trying to determine if I entered incorrect netmask.


Comment 15 John W. Linville 2005-03-14 19:12:42 UTC
Thomas, I think we are getting closer... :-)

I now have a patch that is in between the last two patches.  I think
it will restore functionaliy, which will narrow this down to a problem
either with resetting the chip or with power management (probably the
latter).

Pre-built test kernels are available at the same link that
was in comment 9.  Please give those a try and report the results. 
Thanks!



Comment 16 Thomas Payerle 2005-03-28 20:44:52 UTC
Sorry about the delay in response.  Busy couple of weeks.

Booted to  2.4.21-31.EL.jwltest.11 and enabled eth1 (3c509), and machine appears
to be staying up (only about 10 minutes thus far, but problems usually appear
much sooner than that), and eth1 appears to be working normally

Comment 17 John W. Linville 2005-03-30 16:19:07 UTC
Thomas,

Before I spin another patch, I'd like you to go back to do some testing with the
2.4.21-27.0.2.EL kernel.  I'd like you to try booting with the various
combinations of "acpi=off" and/or "noapic" on the kernel command-line.

Please post the results of using various combinations of those kernel
command-line parameters with the 2.4.21-27.0.2.EL kernel.  Thanks!

Comment 18 Thomas Payerle 2005-04-11 21:50:31 UTC
Tried 2.4.21-27.0.2.EL with "acpi=off" and "noapic" singly and together and in
all three cases system still froze shortly after activating 3c590.  I did not
retest the case with neither option as that was what was tried in Comment #5 
(and behaved similarly).

Comment 19 John W. Linville 2005-06-01 17:17:15 UTC
As I indicated in comment 15, I thought the problem might relate to some power 
management changes.  There was a fairly recent patch relating to power 
management and the 3c59x driver.  It is a bit of a long shot, but I'd like you 
to try it.  I have it as part of the test kernels at the same location as in 
comment 9. 
 
Would you please try those kernels and post the results here?  Thanks! 

Comment 20 Thomas Payerle 2005-07-11 23:04:03 UTC
Tested kernel vmlinuz-2.4.21-32.9.EL.jwltest.36  and it also crashes about
a minute after 3c590 card is activated.


Comment 21 John W. Linville 2005-07-13 20:23:10 UTC
Created attachment 116721 [details]
jwltest-3c59x-3c905b_1-reset.patch

Comment 22 John W. Linville 2005-07-13 20:29:10 UTC
One of the changes in between the versions that work for you and the versions 
that don't is that resets were changed to reset less of the chip for better 
performance.  Later I had to put that back for some cards to work properly. 
 
It looks like this issue results in the reset logic in your card doing some 
screwy stuff (like receiving 8k frames on an interface w/ MTU of 1500).  I 
don't know precisely what might account for the crash, but I thought that a 
more extensive reset might be in order.  The patch in comment 21 implements 
that for your card. 
 
Test kernels w/ the above patch are available here: 
 
   http://people.redhat.com/linville/kernels/rhel3/ 
 
I don't know if this will solve the problem or not, but please give them a try 
and report the results...thanks! 

Comment 23 John W. Linville 2005-09-16 14:48:17 UTC
Closed due to lack of response.  Please re-open when the requested test 
results are available...thanks! 

Comment 24 Ernie Petrides 2005-10-06 23:39:02 UTC
*** Bug 147218 has been marked as a duplicate of this bug. ***