Bug 809908

Summary: pch_gbe transfer length error with VLAN tagging and default MTU
Product: [Fedora] Fedora Reporter: Carsten Clasohm <clasohm>
Component: kernelAssignee: Veaceslav Falico <vfalico>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 16CC: arcress, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, peterm, sgruszka, vfalico
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 851682 (view as bug list) Environment:
Last Closed: 2012-09-04 19:00:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
sosreport from embedded device
none
lines from /var/log/messages
none
count vlan header on length verification
none
upstream/fedora patch none

Description Carsten Clasohm 2012-04-04 16:14:17 UTC
Created attachment 575180 [details]
sosreport from embedded device

Description of problem:

While testing an embedded device with an Intel EG20T Gigabit Ethernet Controller, we encountered a problem with the pch_gbe kernel module and VLAN tagging.


Version-Release number of selected component (if applicable):

kernel-3.1.0-7.fc16.i686


How reproducible:

always


Steps to Reproduce:
1. configure tagged VLANs
2. ping -s 3000 IP_ADDR
  
Actual results:

100% packet loss, and the following message is displayed on the console each time a ping packet is sent:

[  422.262549] pch_gbe: Transfer length Error: skb len: 1518 > max: 1518


Expected results:

ping should be able to send large packets.


Additional info:

Adding "MTU=1518" in /etc/sysconfig/network-scripts/ifcfg-p4p1 and restarting the network fixes the problem. (The MTU of the VLAN interface is kept at 1500.)

Comment 1 Carsten Clasohm 2012-04-04 16:15:13 UTC
Created attachment 575181 [details]
lines from /var/log/messages

Comment 2 Josh Boyer 2012-04-04 16:28:05 UTC
(In reply to comment #0)
> Created attachment 575180 [details]
> sosreport from embedded device
> 
> Description of problem:
> 
> While testing an embedded device with an Intel EG20T Gigabit Ethernet
> Controller, we encountered a problem with the pch_gbe kernel module and VLAN
> tagging.
> 
> 
> Version-Release number of selected component (if applicable):
> 
> kernel-3.1.0-7.fc16.i686

That's relatively old now.  Can you try with the 3.3.1 kernel update?  If it still has issues with that, you might want to email the netdev list.

Comment 3 Carsten Clasohm 2012-04-04 21:01:34 UTC
We tried to install Fedora 16 on the device and then update the kernel, but were unable to boot afterwards. For some reason, anaconda created a bios_grub partition, and the system said "no operating system found" after the reboot.

Because our priority is on getting pch_gbe to work with RHEL, we have decided to not spend more time on this.

Comment 4 Veaceslav Falico 2012-04-05 05:19:04 UTC
Created attachment 575272 [details]
count vlan header on length verification

Here's the proposed patch (against rhel6, but should apply cleanly to fedora kernel also).

Comment 6 Josh Boyer 2012-04-05 12:18:08 UTC
(In reply to comment #4)
> Created attachment 575272 [details]
> count vlan header on length verification
> 
> Here's the proposed patch (against rhel6, but should apply cleanly to fedora
> kernel also).

Except it doesn't apply at all.  The Fedora kernels are all on 3.3 or newer and this patch is against either RHEL or an older Fedora kernel.

Could you send this fix, with a proper changelog and signed-off-by, to netdev?

Comment 7 Veaceslav Falico 2012-04-10 07:21:42 UTC
(In reply to comment #6)
> Except it doesn't apply at all.  The Fedora kernels are all on 3.3 or newer and
> this patch is against either RHEL or an older Fedora kernel.

Yep, sorry, my bad. Will attach now the corrected patch for upstream/fedora, please test it if you have the hw.

> 
> Could you send this fix, with a proper changelog and signed-off-by, to netdev?

Will do, once it will be tested (I don't have the needed hardware atm).

Comment 8 Veaceslav Falico 2012-04-10 07:22:43 UTC
Created attachment 576409 [details]
upstream/fedora patch

Comment 9 Josh Boyer 2012-04-10 18:21:47 UTC
(In reply to comment #8)
> Created attachment 576409 [details]
> upstream/fedora patch

I'm guessing you can do a scratch build in koji with this for testing purposes?  If not, let me know and I can get one built.  I don't have the hardware either, so best to wait until you have access.

Comment 10 Andy Cress 2012-06-11 18:45:30 UTC
Veaceslav,

It appears from the 2012-04-10 comment that this patch tested ok on F16 with an EG20T PCH, is that right?

We have had a similar problem trying to get the pch_gbe driver working on RHEL6 (for the same customer that Carsten was working with).  
Backporting the 1.00 driver from F15/F16 has issues starting tx/rx after the link up completes.  
We got pretty close with the pch_gbe 0.91-NAPI driver base and adding some patches, but it keeps getting transmit timeouts.
Can you attach the driver source/tar that you mentioned above for el6?   

Andy

Comment 11 Josh Boyer 2012-09-04 17:46:36 UTC
(In reply to comment #8)
> Created attachment 576409 [details]
> upstream/fedora patch

Did that ever get sent upstream?  It doesn't seem to be in the latest kernel tree.

Comment 12 Andy Cress 2012-09-04 18:37:02 UTC
Here's the commit for it:
commit 4487e64de63b8e42efe5a5543871c42c5a5859d9
Author: Andy Cress <andycress>
Date:   Thu Jul 26 06:01:17 2012 +0000
    pch_gbe: vlan skb len fix

Comment 13 Josh Boyer 2012-09-04 19:00:18 UTC
OK.  That's in 3.6-rc1, so it's fixed in F18.