Bug 799021 - kernel 2.6.18-318 (RH 5.8) causes non-operation of the vlan with tg3 driver
kernel 2.6.18-318 (RH 5.8) causes non-operation of the vlan with tg3 driver
Status: CLOSED DUPLICATE of bug 797011
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.8
x86_64 Linux
urgent Severity urgent
: rc
: ---
Assigned To: John Feeney
Red Hat Kernel QE team
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-03-01 10:27 EST by philippe.camps
Modified: 2012-06-04 11:17 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-06-04 11:17:27 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
the messages when I start the network (75.51 KB, text/plain)
2012-03-01 10:27 EST, philippe.camps
no flags Details

  None (edit)
Description philippe.camps 2012-03-01 10:27:06 EST
Created attachment 566866 [details]
the messages when I start the network

Description of problem:
I have updated my server to the RedHat 5.8 and then reboot to the kernel 2.6.18-318. Now the network isn't working with my vlan interface eth1.34 and eth1.607. The network is still working with the eth0 interface which has no vlan.
The status of the interface is UP, but the ping command fails.
In /var/log/messages, I can see errors with the tg3 driver.
For instance:
kernel: tg3 0000:02:01.0: eth0: 0x00000000: 0x164814e4, 0x22b00146, 0x02000003, 0x00804010
...
kernel: tg3 0000:02:01.0: tg3_stop_block timed out, ofs=4800 enable_bit=2
kernel: tg3 0000:02:01.0: eth0: DMA Status error.  Resetting chip

When I reboot with the old kernel (2.6.18-274.18.1), the network is working perfectly

Version-Release number of selected component (if applicable):
filename:       /lib/modules/2.6.18-274.18.1.el5/kernel/drivers/net/tg3.ko
version:        3.116
-> it's OK

filename:       /lib/modules/2.6.18-308.el5/kernel/drivers/net/tg3.ko
version:        3.119
-> it's not workink

How reproducible:
# service network restart
# ping myserver
-> it is OK
# ping remoteserver
-> it is not OK

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 1 Dmitry Butskoy 2012-03-07 08:15:49 EST
A similar "DMA Status error.  Resetting chip" issue is present under RHEL-6.2 .

The current RHEL6 kernel 2.6.32-220.4.2 has tg3 driver version 3.119 .

A solution for me is to use the latest kernel from Fedora 14, 2.6.35.14-106, which has tg3 driver version 3.110

It seems something happen between 3.100 and 3.119

There is a proposed patch (possibly fixes this issue), found in some kernel-related maillists:
http://lkml.indiana.edu/hypermail/linux/kernel/1109.3/01938.html
but it seems that it was not applied yet.
Comment 2 philippe.camps 2012-03-08 07:39:50 EST
I meant kernel 2.6.18-308
Comment 3 Boris Aelen 2012-03-09 09:25:59 EST
I have the same problem.


I have a bond interface over 2 NIC's.
one nice has the bnx2x driver  and the other the tg3 driver.
On the bond i have created 2 interfaces, one for vlan 108 and one for vlan 110.
If the bond is running on the nic with the bnx2x driver everything works fine.
When i switch with to the tg3 nic with ifenslave -c bond1 eth3 both interfaces on the bond will stop responding.
Comment 4 Boris Aelen 2012-03-09 09:32:14 EST
Also I've found a bug report in CentOS 5.8 -> http://bugs.centos.org/view.php?id=5572
Just FYI.
They also refer to https://bugzilla.redhat.com/show_bug.cgi?id=798939
Which I am not allowed to access.

Weird thing is that the exact same driver version (3.119) is in RHEL6.2 but I don't have the problem there.
Comment 5 Dmitry Butskoy 2012-03-11 08:44:48 EDT
for comment #2 :
> I meant kernel 2.6.18-308
It seems that the same driver version was backported for 2.6.18 as well for 2.6.32 (note Fedora's 2.6.35 kernel still has the previous version).

What version you see when "modinfo tg3 | head" ?

for comment #4 :
I have problem with 3.119 in RHEL6.2, it looks similar to the links you mentioned above.

If anybody work on fix it, I have a "good reproduceable" issue on a "not very production" server, and I'm capable to compile external "test" drivers for kernel etc. and perform all needed tests.
Comment 6 philippe.camps 2012-03-12 05:51:16 EDT
for (In reply to comment #5)

#modinfo /lib/modules/2.6.18-308.el5/kernel/drivers/net/tg3.ko  |head
filename:       /lib/modules/2.6.18-308.el5/kernel/drivers/net/tg3.ko
version:        3.119
license:        GPL
description:    Broadcom Tigon3 ethernet driver
author:         David S. Miller (davem@redhat.com) and Jeff Garzik (jgarzik@pobox.com)
srcversion:     60107A5DC3274A475062EA0
alias:          pci:v0000106Bd00001645sv*sd*bc*sc*i*
alias:          pci:v0000173Bd000003EAsv*sd*bc*sc*i*
alias:          pci:v0000173Bd000003EBsv*sd*bc*sc*i*
alias:          pci:v0000173Bd000003E9sv*sd*bc*sc*i*

I verified for 2.6.32, it's right, it's the same version of tg3 module. But I don't know if there is the same issue:
]# modinfo tg3 | head
filename:       /lib/modules/2.6.32-220.4.1.el6.i686/kernel/drivers/net/tg3.ko
firmware:       tigon/tg3_tso5.bin
firmware:       tigon/tg3_tso.bin
firmware:       tigon/tg3.bin
version:        3.119
license:        GPL
description:    Broadcom Tigon3 ethernet driver
author:         David S. Miller (davem@redhat.com) and Jeff Garzik (jgarzik@pobox.com)
srcversion:     7459752B93C367486FB6547
alias:          pci:v0000106Bd00001645sv*sd*bc*sc*i*
Comment 7 Boris Aelen 2012-03-14 05:54:41 EDT
(In reply to comment #5)
> for comment #2 :
> > I meant kernel 2.6.18-308
> It seems that the same driver version was backported for 2.6.18 as well for
> 2.6.32 (note Fedora's 2.6.35 kernel still has the previous version).
> 
> What version you see when "modinfo tg3 | head" ?
> 
> for comment #4 :
> I have problem with 3.119 in RHEL6.2, it looks similar to the links you
> mentioned above.

---> Maybe you are right, i didn't test it on RHEL6.2, i thought I did. sorry for the confusion.

> 
> If anybody work on fix it, I have a "good reproduceable" issue on a "not very
> production" server, and I'm capable to compile external "test" drivers for
> kernel etc. and perform all needed tests.
Comment 9 John Feeney 2012-04-30 16:52:30 EDT
Thank you for all the input regarding this matter. A fix for this vlan problem 
has been posted for another bugzilla which is now in POST state for RHEL5.9. 
In addition, a z-stream fix for RHEL5.8 is being proposed so the fix should 
be available before RHEL5.9 ships.
Comment 10 philippe.camps 2012-06-01 10:01:14 EDT
The kernel 2.6.18-308.8.1.el5 fixes the tg3 modules problem with 801.Q. Thanks
Comment 11 John Feeney 2012-06-04 11:17:27 EDT
Given comment #10, I am closing this as a duplicate of bz797011.

*** This bug has been marked as a duplicate of bug 797011 ***

Note You need to log in before you can comment on or make changes to this bug.