1069028 – ixgbevf prematurely strips VLAN tags

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1069028 - ixgbevf prematurely strips VLAN tags

Summary: ixgbevf prematurely strips VLAN tags

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	6.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	rc
Target Release:	---
Assignee:	Nikolay Aleksandrov
QA Contact:	Xin Long
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	994246 1094287
TreeView+	depends on / blocked

Reported:	2014-02-24 01:33 UTC by David Gibson
Modified:	2018-12-06 15:56 UTC (History)
CC List:	15 users (show)
Fixed In Version:	kernel-2.6.32-461.el6
Doc Type:	Bug Fix
Doc Text:	Due to a bug in the ixgbevf driver, the stripped VLAN information from incoming packets on the ixgbevf interface could be lost, and such packets thus did not reach a related VLAN interface. This problem has been fixed by adding the packet's VLAN information to the Socket Buffer (skb) before passing it to the network stack. As a result, the ixgbevf driver now passes the VLAN-tagged packets to the appropriate VLAN interface.
Clone Of:
Environment:
Last Closed:	2014-10-14 05:58:12 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
ixgbevf vlan stripping fix when not in netpoll path (1.19 KB, patch) 2014-03-03 14:19 UTC, Nikolay Aleksandrov	no flags	Details \| Diff
ixgbevf vlan stripping fix when not in netpoll path (v2) (1.22 KB, patch) 2014-03-05 12:05 UTC, Nikolay Aleksandrov	no flags	Details \| Diff
Revised approach to fix (780 bytes, patch) 2014-03-11 12:11 UTC, David Gibson	no flags	Details \| Diff
ixgbevf vlan stripping fix when not in netpoll path (v3) (1.15 KB, patch) 2014-03-11 12:13 UTC, Nikolay Aleksandrov	no flags	Details \| Diff
ixgbevf vlan stripping fix when not in netpoll path (v4) (2.54 KB, patch) 2014-03-13 11:43 UTC, Nikolay Aleksandrov	no flags	Details \| Diff
ixgbevf vlan stripping fix when not in netpoll path (v4) (2.52 KB, patch) 2014-03-13 11:55 UTC, Nikolay Aleksandrov	no flags	Details \| Diff
ixgbevf vlan stripping fix when not in netpoll path (v4) (2.82 KB, patch) 2014-03-13 17:07 UTC, Nikolay Aleksandrov	no flags	Details \| Diff
Show Obsolete (6) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2014:1392	0	normal	SHIPPED_LIVE	Important: kernel security, bug fix, and enhancement update	2014-10-14 01:28:44 UTC

Description David Gibson 2014-02-24 01:33:33 UTC

Description of problem:

I'm not sure if this is a driver bug, a firmware bug, or an interaction of both.

When a vlan interface eth1.NNN is created on top of a ixgbevf interface eth1, vlan tags are stripped from incoming packets before they reach eth1.  The packets therefore never correctly reach eth1.NNN

Version-Release number of selected component (if applicable):

kernel-2.6.32-431.5.1.el6.x86_64 (guest & host)
Not sure how to check the firmware version of the hosting ixgbe card

How reproducible:


Steps to Reproduce:
1. Configure at least one VF on a machine with ixgbe hardware
2. Use libvirt to create a VM which has a VF assigned to it via PCI passthrough.
3. *Don't* assign a VLAN to the VF using libvirt or ip link set - we want the guest to have a "trunk port" on the ixgbe's internal switch.
4. On the host create a vlan interface, tag NNN on top of the ixgbe PF, and assign it an address
5. In the guest create a vlan interface, tag NNN, on top of the ixgbevf VF, and assign it an address in the same subnet as the host.
6. Attempt to ping from the guest to the host

Expected results:

Both host and guest are on the same subnet on the same logical VLAN and should have full connectivity.

Actual results:

Ping will fail.  Using tcpdump -e on host and guest (raw interfaces) reveals that:
 * Correctly tagged ARP requests are sent by the guest
 * Correctly tagged ARP requests are received by the host
 * Correctly tagged ARP replies are issued by the host
 * The ARP replies are received in the guest, but they are missing a tag, and hence delivered to the eth1 logical interface, instead of eth1.NNN.

Additional info:

It looks like the problem may be in ixgbevf_vlan_rx_register() which contains this code:

	for (i = 0; i < adapter->num_rx_queues; i++) {
		j = adapter->rx_ring[i].reg_idx;
		ctrl = IXGBE_READ_REG(hw, IXGBE_VFRXDCTL(j));
		ctrl |= IXGBE_RXDCTL_VME;
		IXGBE_WRITE_REG(hw, IXGBE_VFRXDCTL(j), ctrl);
	}

This unconditionally enables the VME (strip VLAN tags) bit in the (virtual) hardware as soon as any VLAN is configured on the underlying interface.  This means the VLAN tags are no longer available to the kernel to properly route incoming packets to the right VLAN logical interface.

Comment 1 Nikolay Aleksandrov 2014-02-28 14:07:07 UTC

The description is correct, I just tested disabling IXGBE_RXDCTL_VME in the VM
and it started working. My check if we're in a VLAN was not reliable and isn't
applicable here. I'll give it some more thought and try to propose a solution.

Comment 2 David Gibson 2014-03-03 00:35:50 UTC

Yes, I've also tried disabling VME and I get the same behaviour - it then works for guest configured tags, but stops working for host configured tags.

As far as I can tell, what we need here is for IXGBE_RXDCTL_VME to be set if and only if we're inserting VLAN tags on the Tx path, i.e. when IXGBE_VMVIR_VLANA_DEFAULT is set.  The trouble is that VME is under guest/VF control whereas VLANA_DEFAULT is under host/PF control.

This could be solved with an extension to the host/guest mailbox protocol, but that's pretty awkward.  I'm hoping there's a way to detect this state from the guest that I haven't spotted yet.

Comment 3 Nikolay Aleksandrov 2014-03-03 12:53:01 UTC

I don't think you can configure VF as a trunk port currently, I can see the
card has the option to trust the descriptors but I don't think it's used.
Even libvirt complains when you try and configure the interface as a trunk port:
error: unsupported configuration: vlan trunking is not supported by SR-IOV network devices

Yes, this will work if we don't specify the "vlan trunk" part in libvirt, but
then we shouldn't expect a port to act as a trunk in the first place, this would
be inconsistent behaviour.
I think there's still more work to be done both on the driver part and the 
libvirt part in order to be able to make a trunk port out of an SR-IOV device.

Comment 4 Nikolay Aleksandrov 2014-03-03 13:05:50 UTC

(In reply to Nikolay Aleksandrov from comment #3)
> I don't think you can configure VF as a trunk port currently, I can see the
> card has the option to trust the descriptors but I don't think it's used.
> Even libvirt complains when you try and configure the interface as a trunk
> port:
> error: unsupported configuration: vlan trunking is not supported by SR-IOV
> network devices
> 
> Yes, this will work if we don't specify the "vlan trunk" part in libvirt, but
> then we shouldn't expect a port to act as a trunk in the first place, this
> would
> be inconsistent behaviour.
> I think there's still more work to be done both on the driver part and the 
> libvirt part in order to be able to make a trunk port out of an SR-IOV
> device.
Actually I don't think the "Additional Info" is correct, we should have the vlan 
information available after the stripping, I'll have to investigate what happens
there, I feel like I'm missing something but I'm new to this driver. I'll comment
again after I get some results on what's going on.

Comment 5 Nikolay Aleksandrov 2014-03-03 13:28:13 UTC

(In reply to Nikolay Aleksandrov from comment #4)
> (In reply to Nikolay Aleksandrov from comment #3)
> > I don't think you can configure VF as a trunk port currently, I can see the
> > card has the option to trust the descriptors but I don't think it's used.
> > Even libvirt complains when you try and configure the interface as a trunk
> > port:
> > error: unsupported configuration: vlan trunking is not supported by SR-IOV
> > network devices
> > 
> > Yes, this will work if we don't specify the "vlan trunk" part in libvirt, but
> > then we shouldn't expect a port to act as a trunk in the first place, this
> > would
> > be inconsistent behaviour.
> > I think there's still more work to be done both on the driver part and the 
> > libvirt part in order to be able to make a trunk port out of an SR-IOV
> > device.
> Actually I don't think the "Additional Info" is correct, we should have the
> vlan 
> information available after the stripping, I'll have to investigate what
> happens
> there, I feel like I'm missing something but I'm new to this driver. I'll
> comment
> again after I get some results on what's going on.
Yes, I was correct. We have the VLAN information but simply don't pass it up.
There's a very simple fix to this issue, just use __vlan_hwaccel_put_tag(skb, tag); if we have a VLAN present and vlgrp registered for the adapter.
So the new version of ixgbevf_receive_skb looks like:
static void ixgbevf_receive_skb(struct ixgbevf_q_vector *q_vector,
                                struct sk_buff *skb, u8 status,
                                union ixgbe_adv_rx_desc *rx_desc)
{
        struct ixgbevf_adapter *adapter = q_vector->adapter;
        bool is_vlan = (status & IXGBE_RXD_STAT_VP);
        u16 tag = le16_to_cpu(rx_desc->wb.upper.vlan);

        if (adapter->vl_grp && is_vlan)
                __vlan_hwaccel_put_tag(skb, tag);

        if (!(adapter->flags & IXGBE_FLAG_IN_NETPOLL)) {
                if (adapter->vlgrp && is_vlan)
                        vlan_gro_receive(&q_vector->napi,
                                         adapter->vlgrp,
                                         tag, skb);
                else
                        napi_gro_receive(&q_vector->napi, skb);
        } else
                netif_rx(skb);
}

Then the information is passed no matter which way is taken, I've just tested
this and the connection between the host and VM works.
I'll check with Andy about the approach since he's the maintainer of this driver
and if everything's alright I'll move to post this for 6.6.

Cheers,
 Nik

Comment 6 Nikolay Aleksandrov 2014-03-03 13:29:16 UTC

<snip>
> So the new version of ixgbevf_receive_skb looks like:
> static void ixgbevf_receive_skb(struct ixgbevf_q_vector *q_vector,
>                                 struct sk_buff *skb, u8 status,
>                                 union ixgbe_adv_rx_desc *rx_desc)
> {
>         struct ixgbevf_adapter *adapter = q_vector->adapter;
>         bool is_vlan = (status & IXGBE_RXD_STAT_VP);
>         u16 tag = le16_to_cpu(rx_desc->wb.upper.vlan);
> 
>         if (adapter->vl_grp && is_vlan)
>                 __vlan_hwaccel_put_tag(skb, tag);
I had too much coffee :) vl_grp -> vlgrp without the "_"

Comment 7 Andy Gospodarek 2014-03-03 14:07:50 UTC

(In reply to Nikolay Aleksandrov from comment #5)
> Then the information is passed no matter which way is taken, I've just tested
> this and the connection between the host and VM works.
> I'll check with Andy about the approach since he's the maintainer of this
> driver
> and if everything's alright I'll move to post this for 6.6.

This idea (and the correction in comment #6) looks good to me.  Feedback from the customer would be nice to verify this will resolve it.

Comment 8 Nikolay Aleksandrov 2014-03-03 14:19:08 UTC

Created attachment 869940 [details]
ixgbevf vlan stripping fix when not in netpoll path

This is the proposed patch, it'd be great if the customer could test it so I can
post it for 6.6.

Comment 9 David Gibson 2014-03-03 22:40:58 UTC

So, if I'm understanding correctly, when the VME bit is set, the adaptor strips the tag from the packet, but records the ID in the descriptors somewhere.  The call to __vlan_hwaccel_put_tag(skb, tag); puts it back in the packet.  Is that correct?

The if (vlgrp) doesn't seem right to me.  Won't that mean that if neither the host, nor the guest has vlan tags configured, but a vlan tagged packet is received by the guest, its vlan tag will be incorrectly removed? The packet should be received with tag intact - which will mean the packet is correctly ignored, since it's not for the master vlan.

I'll build a test kernel, then do some experimentation and also pass it on to the customer.

Comment 11 Nikolay Aleksandrov 2014-03-03 23:22:18 UTC

(In reply to David Gibson from comment #9)
> So, if I'm understanding correctly, when the VME bit is set, the adaptor
> strips the tag from the packet, but records the ID in the descriptors
> somewhere.  The call to __vlan_hwaccel_put_tag(skb, tag); puts it back in
> the packet.  Is that correct?
> 
Kind of, yes. The call to vlan_hwaccel_put_tag puts in the skb structure
i.e. it doesn't alter the packet (like adding it back as a header).

> The if (vlgrp) doesn't seem right to me.  Won't that mean that if neither
> the host, nor the guest has vlan tags configured, but a vlan tagged packet
> is received by the guest, its vlan tag will be incorrectly removed? The
> packet should be received with tag intact - which will mean the packet is
> correctly ignored, since it's not for the master vlan.
> 
It will be removed always, yes. This is how upstream ixgbevf does it, I chose
to stay close to upstream as that's the usual practice.

> I'll build a test kernel, then do some experimentation and also pass it on
> to the customer.
Thanks!

Comment 12 David Gibson 2014-03-04 02:04:52 UTC

If upstream ixgbevf gets it wrong, we should fix upstream as well.

So, my point remains, if the VF is logically connected to the master VLAN, then we should not strip VLAN tags, regardless of guest side configuration.

Comment 13 Nikolay Aleksandrov 2014-03-04 12:43:33 UTC

(In reply to David Gibson from comment #12)
> If upstream ixgbevf gets it wrong, we should fix upstream as well.
> 
> So, my point remains, if the VF is logically connected to the master VLAN,
> then we should not strip VLAN tags, regardless of guest side configuration.
What do you mean by master VLAN ? Stripping is done always, regardless of 
how the host is configured. As we found out, there's no reliable way to tell
if we're in a host vlan, besides this is not what's described in this BZ.

For this BZ's case keeping it as upstream makes sense, if we don't have any
registered VLANs no tagged traffic will arrive due to the VLAN filtering done.
See chapter 7.4.4, that is enabled always in the host when SR-IOV is used.

Comment 14 Nikolay Aleksandrov 2014-03-04 14:07:39 UTC

(In reply to Nikolay Aleksandrov from comment #13)
> (In reply to David Gibson from comment #12)
> > If upstream ixgbevf gets it wrong, we should fix upstream as well.
> > 
> > So, my point remains, if the VF is logically connected to the master VLAN,
> > then we should not strip VLAN tags, regardless of guest side configuration.
> What do you mean by master VLAN ? Stripping is done always, regardless of 
> how the host is configured. As we found out, there's no reliable way to tell
> if we're in a host vlan, besides this is not what's described in this BZ.
> 
> For this BZ's case keeping it as upstream makes sense, if we don't have any
> registered VLANs no tagged traffic will arrive due to the VLAN filtering
> done.
> See chapter 7.4.4, that is enabled always in the host when SR-IOV is used.
Actually to elaborate, after my patch the case where the guest is in a single
host vlan is broken if you configure *any* vlans on top of the VF because the
traffic that comes to the guest is tagged with the hypervisor tag and if the
appropriate VLAN interface isn't created, it won't be processed.
If there're no VLANs created on top and the guest is in a hypervisor vlan, then
there's no problem. I'll have to augment the patch to actually check not only
if vlgrp exists but also if the vlan is in it before adding it to the skb,
otherwise we risk breaking configurations which are in a hypervisor vlan but
try to create another vlan, this would be avoided of course if there was a way
for us to know that we're in a hypervisor vlan and could advertise the
VLAN_CHALLENGED so no vlans could be configured on top.

Comment 15 David Gibson 2014-03-05 06:03:29 UTC

Ah, yes, that makes sense.

I think we really need to devise a way for the guest to know if it's in a host VLAN, even if it does mean an ugly extension to the mailbox protocol.

Comment 16 Nikolay Aleksandrov 2014-03-05 12:05:32 UTC

Created attachment 870927 [details]
ixgbevf vlan stripping fix when not in netpoll path (v2)

I've attached the fixed patch which not only checks for vlgrp, but also
if the vlan is actually present, now the only way to "break" the setup with
hypervisor vlan is to create the same vlan id in the VM, other vlans shouldn't
break it anymore.
About the VLAN_CHALLENGED and the mbox solution, please take a look at my comment
in the other BZ, it is related to this fix, but not really necessary for it to
work.

Comment 17 David Gibson 2014-03-06 05:46:20 UTC

Customer has now tried the earlier version of your patch, without success.  They did some debugging and discovered that it seems to be because of this oddity in the ixgbevf driver:

	adapter->flags |= IXGBE_FLAG_IN_NETPOLL;
	ixgbevf_for_each_ring(ring, q_vector->rx)
		clean_complete &= ixgbevf_clean_rx_irq(q_vector, ring,
						       per_ring_budget);
	adapter->flags &= ~IXGBE_FLAG_IN_NETPOLL;

Setting the NETPOLL flag like that causes ixgbevf_receive_skb() to use netif_rx() instead of the vlan_gro_receive() path, which appeared to miss the put_tag().

I haven't had a chance to check if your revised version already fixes this problem.

Comment 18 Nikolay Aleksandrov 2014-03-06 10:59:49 UTC

(In reply to David Gibson from comment #17)
> Customer has now tried the earlier version of your patch, without success. 
> They did some debugging and discovered that it seems to be because of this
> oddity in the ixgbevf driver:
> 
> 	adapter->flags |= IXGBE_FLAG_IN_NETPOLL;
> 	ixgbevf_for_each_ring(ring, q_vector->rx)
> 		clean_complete &= ixgbevf_clean_rx_irq(q_vector, ring,
> 						       per_ring_budget);
> 	adapter->flags &= ~IXGBE_FLAG_IN_NETPOLL;
> 
> Setting the NETPOLL flag like that causes ixgbevf_receive_skb() to use
> netif_rx() instead of the vlan_gro_receive() path, which appeared to miss
> the put_tag().
> 
> I haven't had a chance to check if your revised version already fixes this
> problem.

This is weird, it actually means my patch hasn't been applied because it puts
the tag no matter if we're in a netpoll or not (i.e. which path we take in ixgbevf_receive_skb), which was actually the problem in the first place,
previously the tag would work only in the NETPOLL path, now I've made it work
for both and both the old version and this one of the patch should work.

Comment 19 David Gibson 2014-03-11 04:41:24 UTC

Hrm.  So, the fact is the customer found has tried the patch and it did not work.  They tried both
    a) removing the unconditional setting of the IN_NETPOLL flag in poll()
and b) changing ixgbevf_reeceive_skb() to have this:
      if (is_vlan && vlan_group_get_device(adapter->vlgrp, tag)) {
                vlan_hwaccel_rx(skb, adapter->vlgrp, tag);
                return;
      }

Both variants (a) anb (b) resulted in working guest-configured vlan tags.

Comment 20 David Gibson 2014-03-11 06:21:38 UTC

Ok, I've now had a chance to investigate this on some test hardware.

The problem with the current patch is that __vlan_hwaccel_put_tag() isn't sufficient to direct packets to the right VLAN interface.  It doesn't adjust skb->dev to point at the vlan device instead of the main device - the existing call sites (all in the core networking code, or openvswitch) for __vlan_hwaccel_put_tag() adjust skb->dev nearby.

It really looks like this function isn't built for direct use from drivers.  I think instead we should be using vlan_hwaccel_rx() which does tge put_tag() but also adjusts skb->dev if necessary and has some other sane looking checks before calling netif_rx().


[ASIDE: at present the ixgbevf driver also explicitly calls netif_rx() from the poll() path - from examples in other drivers, that looks wrong.  Shouldn't it be vlan_hwaccel_receive_skb()/netif_receive_skb() for the poll() path and vlan_hwaccel_rx()/netif_rx() for the non-poll() path?]

Comment 21 Nikolay Aleksandrov 2014-03-11 11:28:33 UTC

(In reply to David Gibson from comment #19)
> Hrm.  So, the fact is the customer found has tried the patch and it did not
> work.  They tried both
>     a) removing the unconditional setting of the IN_NETPOLL flag in poll()
> and b) changing ixgbevf_reeceive_skb() to have this:
>       if (is_vlan && vlan_group_get_device(adapter->vlgrp, tag)) {
>                 vlan_hwaccel_rx(skb, adapter->vlgrp, tag);
>                 return;
>       }
> 
> Both variants (a) anb (b) resulted in working guest-configured vlan tags.
>
>Ok, I've now had a chance to investigate this on some test hardware.
>
>The problem with the current patch is that __vlan_hwaccel_put_tag() isn't >sufficient to direct packets to the right VLAN interface.  It doesn't adjust >skb->dev to point at the vlan device instead of the main device - the existing >call sites (all in the core networking code, or openvswitch) for >__vlan_hwaccel_put_tag() adjust skb->dev nearby.
>
Aha! Yes, you're correct and the fact that it did work for me is because I
had a version of the patches for the other ixgbevf BZ applied and the offloads were off, that's why I was confused.

>It really looks like this function isn't built for direct use from drivers.  I >think instead we should be using vlan_hwaccel_rx() which does tge put_tag() but >also adjusts skb->dev if necessary and has some other sane looking checks >before calling netif_rx().
>
Yes, that's correct. Sorry for my confusion but I had a version of the patches
for the other BZ applied and it only seemed like this patch was working. The patch that I'm about to attach has been tested on a cleanly/newly installed system so this should be the final version which uses vlan_hwaccel_rx().

>
>[ASIDE: at present the ixgbevf driver also explicitly calls netif_rx() from the >poll() path - from examples in other drivers, that looks wrong.  Shouldn't it >be vlan_hwaccel_receive_skb()/netif_receive_skb() for the poll() path and >vlan_hwaccel_rx()/netif_rx() for the non-poll() path?]
from commit 366c1099123a0084cda581bee632911822748c61 ("ixgbevf: Add flag to indicate when rx is in net poll"):
"napi_gro_receive shouldn't be called from netpoll context.  Doing
 so was causing kernel panics when jumbo frames larger than 2K were set.
 Add a flag to check if the Rx ring processing is occurring from interrupt
 context or from netpoll context and call netif_rx() if in the polling
 context."

Comment 22 Nikolay Aleksandrov 2014-03-11 12:09:33 UTC

<snip>
> >
> >[ASIDE: at present the ixgbevf driver also explicitly calls netif_rx() from the >poll() path - from examples in other drivers, that looks wrong.  Shouldn't it >be vlan_hwaccel_receive_skb()/netif_receive_skb() for the poll() path and >vlan_hwaccel_rx()/netif_rx() for the non-poll() path?]
> from commit 366c1099123a0084cda581bee632911822748c61 ("ixgbevf: Add flag to
> indicate when rx is in net poll"):
> "napi_gro_receive shouldn't be called from netpoll context.  Doing
>  so was causing kernel panics when jumbo frames larger than 2K were set.
>  Add a flag to check if the Rx ring processing is occurring from interrupt
>  context or from netpoll context and call netif_rx() if in the polling
>  context."
Just to clarify, you're correct, but looking at the mailing list history for
that patch David Miller also had concerns and the Intel folks said they'll look
into the netpoll issue, but I think they should've used netif_receive_skb().
Anyway, this is not a big problem (more of a performance issue) and it'll get fixed once they fix the netpoll issues on upstream, so I'll stick to their version for now.

Comment 23 David Gibson 2014-03-11 12:11:38 UTC

Created attachment 873096 [details]
Revised approach to fix

I think this is a closer-to-correct patch for the problem.  In my tests it works with guest configured vlan tags, and works with host configured vlan tags providing that no vlans are configured on the guest side.

Comment 24 Nikolay Aleksandrov 2014-03-11 12:13:44 UTC

Created attachment 873098 [details]
ixgbevf vlan stripping fix when not in netpoll path (v3)

This is the patch that uses vlan_hwaccel_rx() instead of __vlan_hwaccel_put_tag(). I've tested this on a clean install and it works, that is with the patch applied I get connectivity between vlans on the VM's VF and the hypervisor's PF.

Comment 25 Nikolay Aleksandrov 2014-03-11 12:20:03 UTC

(In reply to David Gibson from comment #23)
> Created attachment 873096 [details]
> Revised approach to fix
> 
> I think this is a closer-to-correct patch for the problem.  In my tests it
> works with guest configured vlan tags, and works with host configured vlan
> tags providing that no vlans are configured on the guest side.

Both patches (mine that checks exactly for that vlan) and yours (which only checks if vlgrp exists) will work. There's one problem with checking only vlgrp though, if we're in a hypervisor vlan and we create another vlan on the VF in the VM, the hypervisor vlan will stop working because the accelerated path will be chosen for it (since the tag is seen) and unless a vlan with the same ID exists in the VM, the received packets will get dropped.

Comment 26 David Gibson 2014-03-12 05:46:25 UTC

Ah, yes, v3 looks better.

I was using the simple if (vlgrp) test based on the non-poll branch, forgetting that specifically checking for the right VLAN makes it more robust against the problems from bug 1059532.

In fact I suspect the non-poll path should use the same test for the same reason - I think we're just getting away with it for now because the existing hack setting the IN_NETPOLL flag means we basically always use the polling path.

Anyway, with v3, I get connectivity from host to all 4 combinations of bridged guest vs SR-IOV guest and host configured or guest configured VLAN tags.  In combination with the workaround from https://access.redhat.com/site/solutions/735403, I get connectivity between all 4 guests.

Comment 27 Nikolay Aleksandrov 2014-03-12 10:43:55 UTC

(In reply to David Gibson from comment #26)
> Ah, yes, v3 looks better.
> 
> I was using the simple if (vlgrp) test based on the non-poll branch,
> forgetting that specifically checking for the right VLAN makes it more
> robust against the problems from bug 1059532.
> 
> In fact I suspect the non-poll path should use the same test for the same
> reason - I think we're just getting away with it for now because the
> existing hack setting the IN_NETPOLL flag means we basically always use the
> polling path.
> 
Yes, I'll fix it up there as well before posting just to be consistent.

> Anyway, with v3, I get connectivity from host to all 4 combinations of
> bridged guest vs SR-IOV guest and host configured or guest configured VLAN
> tags.  In combination with the workaround from
> https://access.redhat.com/site/solutions/735403, I get connectivity between
> all 4 guests.

This is great, thanks a lot for testing!
I'll move forward and post the patch for review.

Comment 28 David Gibson 2014-03-13 02:41:30 UTC

np.  I've also passed the patch on to the customer for their own testing as well.

Comment 30 Nikolay Aleksandrov 2014-03-13 11:43:47 UTC

Created attachment 873936 [details]
ixgbevf vlan stripping fix when not in netpoll path (v4)

I actually found a problem with the last patch, because the vlgrp can be freed
while we're checking it. Thus to fix it we acquire rcu_read_lock, then check if
the device exists and note it in a local variable to be used later when determining which path to take.
Now the device can disappear after the check but that's really not an issue as
the packet will be handled properly later.

Comment 32 Nikolay Aleksandrov 2014-03-13 11:55:10 UTC

Created attachment 873949 [details]
ixgbevf vlan stripping fix when not in netpoll path (v4)

Comment 34 Nikolay Aleksandrov 2014-03-13 17:07:59 UTC

Created attachment 874080 [details]
ixgbevf vlan stripping fix when not in netpoll path (v4)

Found another issue, we should mask the proper bits of "tag" since it's really
vlan_tci and that could lead to accessing out-of-boundary elements in vlan_group_get_device.

I'm brewing this version which should be the final one and will test it tonight.

Comment 35 David Gibson 2014-03-13 23:42:48 UTC

Thanks for the revisions.  I just tested v4 and it seems to be working.

There actually are still some connectivity problems between certain guests.  Specifically things only sometimes work between an SR-IOV guest and a bridged guest with guest-side configured VLANs.  But I've realised that's just due to bug 1067722 (on the host, the tagged packets are eaten by the host's VLAN interface before the non-vlanned software bridge can forward them).

Comment 37 David Gibson 2014-03-16 23:07:35 UTC

Ok, now we have an identified fix, is it time to clone this to 6.5.z and RHEL7 - customer has requested fixes for both of these.

Comment 39 Nikolay Aleksandrov 2014-03-17 11:43:39 UTC

AFAICT RHEL7 is not affected by this, the ixgbevf driver there looks like upstream (i.e. correct).

Comment 42 RHEL Program Management 2014-04-08 22:20:53 UTC

This request was evaluated by Red Hat Product Management for
inclusion in a Red Hat Enterprise Linux release.  Product
Management has requested further review of this request by
Red Hat Engineering, for potential inclusion in a Red Hat
Enterprise Linux release for currently deployed products.
This request is not yet committed for inclusion in a release.

Comment 43 Rafael Aquini 2014-04-30 02:21:04 UTC

Patch(es) available on kernel-2.6.32-461.el6

Comment 48 Xin Long 2014-05-26 06:43:03 UTC

do test like https://bugzilla.redhat.com/show_bug.cgi?id=1069028#c0

reproduce on kernel-2.6.32-431.el6

verify on kernel-2.6.32-461.el6

Comment 49 David Gibson 2014-05-28 05:52:04 UTC

Bug has no customer information, making public (again) at customer request.

Comment 51 errata-xmlrpc 2014-10-14 05:58:12 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2014-1392.html

Note You need to log in before you can comment on or make changes to this bug.