Bug 147039
Summary: | System freezes shortly after activating 3c590 NIC card | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | Thomas Payerle <payerle> | ||||||||
Component: | kernel | Assignee: | John W. Linville <linville> | ||||||||
Status: | CLOSED CANTFIX | QA Contact: | Brian Brock <bbrock> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 3.0 | CC: | petrides, riel | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | i686 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2005-09-16 14:48:17 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Thomas Payerle
2005-02-03 21:38:39 UTC
Created attachment 110627 [details] Assorted details re bug 147039 Internal RFE bug #147218 entered; will be considered for future releases. The 3c59x driver was updated in RHEL3 U4. Please retest this on a recent release of RHEL3 and let us know if the problem is resolved. The latest RHEL3 kernel is 2.4.21-27.0.2.EL, which was released as RHSA-2005:043. Thanks in advance. -ernie Just tried with 2.4.21-27.0.2.EL, and same symptoms. (actually, some of the previously reported stuff may also had been with that kernel --- I waited until did a full update before submitting, but did not notice that the new kernel was not default. So had probably been mixing the two kernel versions on previous reports). But just now, I verified was in the newer kernel and reproduced the symptoms. 3c590 and 3c905B are handled by same driver...very strange...I'm curious what happens if you don't bring-up the 3c905B but bring-up the 3c590 instead? Also curious about if you physically swap the cards' positions? What about when you remove the 3c905B (i.e. only use the 3c590)? Could you attach the output from running sysreport on the box (w/ both cards still installed)? Thanks! Created attachment 111316 [details]
results of sysreport
as per request
I realize use same driver. The 3c905 (eth0) is on the motherboard, which makes it awkward to remove or swap position with 3c590:) (3c590 is eth1 in a PCI slot). As mentioned in last paragraph of initial problem description, I had tried bringing up system in single user mode, running init scripts in etc/rc5.d below S10network, then restoring ifcfg-eth1 in /etc/sysconfig/network-scripts and running ifup eth1 manually (eth0 should not been configured). System still crashed shortly afterwards. (ifcfg-eth1 was renamed to keep it from being configured automatically). This is as close as I can come to booting without the 3c905 (the card is present and detected, but unconfigured). On the other hand, this system stays up and is stable with the 3c590 in it as long as I do not attempt to configure it. Despite the fact that the initial description states the kernel as kernel-2.4.21-9.EL, I am fairly confident that the test above was done with the 2.4.21-27.0.2.EL kernel. (I did a full up2date, including kernel before submitting report, so 2.4.21-27 was installed on the system, but I did not immediately notice that grub was still set to use the old kernel by default. Since I would have had to manually alter the grub settings to get into single user mode, I am fairly confident the new kernel was used for that. It was just when I did not bother to hang around during the reboot afterwards that the old kernel was run, and that's what appeared in the uname output when I submitted the initial report). Not really sure where to start...hmmm... Let's try to start by reverting the 3c59x driver to what was available in RH 7.3...maybe that will be informative? Anyway, prebuilt kernels and a source RPM are available here: http://people.redhat.com/linville/kernels/rhel3/ Please give that a try and let me know the results...thanks! Tried your kernel-2.4.21-28.EL.jwltest.2.i686.rpm and got same results. Booted with /etc/sysconfig/network-scripts/ifcfg-eth1 renamed (so only 3c905, eth0 configured), and system comes up OK. When ficonfig eth1 to set an IP address et al, system freezes shortly thereafter. Repeated the single-user test (boot to single user, run rc5.d scripts before S10network manually, rename ifcfg-eth1 back, ifup eth1 (so eth1/3c590 is configured but not eth0/3c905), and system freezes BTW, the previous OS (before upgrade) was RHL7.1 according to /etc redhat-release, running kernel 2.4.3-12 (I was not using RHN or up2date, so was manually patching, as things appeared necessary, and as I was only user of the box and no network services ran on it local priv escalations were not a big concern). Not sure if this means anything, but thought would mention it. Thomas, You stepped-off the update train a lot earlier than I thought! :-) I revamped the patch to go all the way back to what shipped w/ 2.4.3-12. Pre-built test kernels are available at the same link that was in comment 9. Please give those a try and report the results. Thanks! The older 3c590 patch seems to work. System was rebooted, enabled the interface, the system has been up for about 24hours now. The 3c590 interface works. Thomas, Thanks for the data point! Unfortunately, we are far from narrowing this down... :-( I have another version I'd like for you to try. This one is based on kernel version 2.4.9-37 which is at the next "change plateau" which the 3c59x driver went through in RH7.1. The kernels are available at the same link that was in comment 9. Please give them a try and report the results. Thanks! Sorry about the delay in responding. The 2.4.9-37 patches reintroduced the problem --- system booted up fine, but on configuring the 3c590 the system froze up a few minutes later. An additional datum, although the shell prompt came back after the ifconfig and ifconfig -a showed the 3c590 appropriately configured (at least from the quick glance I got), the interface was not working -- e.g. pinging a host failed. Probably not surprising, but don't believe I tested that before. I won't swear ifconfig -a showed the interface as UP and RUNNING, as I was feeling hopeful would stay up and was mainly trying to determine if I entered incorrect netmask. Thomas, I think we are getting closer... :-) I now have a patch that is in between the last two patches. I think it will restore functionaliy, which will narrow this down to a problem either with resetting the chip or with power management (probably the latter). Pre-built test kernels are available at the same link that was in comment 9. Please give those a try and report the results. Thanks! Sorry about the delay in response. Busy couple of weeks. Booted to 2.4.21-31.EL.jwltest.11 and enabled eth1 (3c509), and machine appears to be staying up (only about 10 minutes thus far, but problems usually appear much sooner than that), and eth1 appears to be working normally Thomas, Before I spin another patch, I'd like you to go back to do some testing with the 2.4.21-27.0.2.EL kernel. I'd like you to try booting with the various combinations of "acpi=off" and/or "noapic" on the kernel command-line. Please post the results of using various combinations of those kernel command-line parameters with the 2.4.21-27.0.2.EL kernel. Thanks! Tried 2.4.21-27.0.2.EL with "acpi=off" and "noapic" singly and together and in all three cases system still froze shortly after activating 3c590. I did not retest the case with neither option as that was what was tried in Comment #5 (and behaved similarly). As I indicated in comment 15, I thought the problem might relate to some power management changes. There was a fairly recent patch relating to power management and the 3c59x driver. It is a bit of a long shot, but I'd like you to try it. I have it as part of the test kernels at the same location as in comment 9. Would you please try those kernels and post the results here? Thanks! Tested kernel vmlinuz-2.4.21-32.9.EL.jwltest.36 and it also crashes about a minute after 3c590 card is activated. Created attachment 116721 [details]
jwltest-3c59x-3c905b_1-reset.patch
One of the changes in between the versions that work for you and the versions that don't is that resets were changed to reset less of the chip for better performance. Later I had to put that back for some cards to work properly. It looks like this issue results in the reset logic in your card doing some screwy stuff (like receiving 8k frames on an interface w/ MTU of 1500). I don't know precisely what might account for the crash, but I thought that a more extensive reset might be in order. The patch in comment 21 implements that for your card. Test kernels w/ the above patch are available here: http://people.redhat.com/linville/kernels/rhel3/ I don't know if this will solve the problem or not, but please give them a try and report the results...thanks! Closed due to lack of response. Please re-open when the requested test results are available...thanks! *** Bug 147218 has been marked as a duplicate of this bug. *** |