Bug 743739 - DCB: Setting socket priority on VLAN with alpha characters in name fails to map correctly
Summary: DCB: Setting socket priority on VLAN with alpha characters in name fails to m...
Keywords:
Status: CLOSED DUPLICATE of bug 703245
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Neil Horman
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-10-05 20:58 UTC by john.r.fastabend
Modified: 2011-10-25 12:22 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-10-25 12:22:02 UTC
Target Upstream Version:


Attachments (Terms of Use)
stap script to produce log of priority values in skb at dev_pick_tx (144 bytes, application/octet-stream)
2011-10-18 19:59 UTC, Neil Horman
no flags Details

Description john.r.fastabend 2011-10-05 20:58:47 UTC
On DCB enabled systems users set the socket priority with SO_PRIORITY in user space to steer packets (skbs) to underlying hardware traffic classes. On a device 'eth3.101' this works as expected.

(1) user sets SO_PRIORITY
(2) writes to socket
(3) frame proceeds down the stack
(4) skb_tx_hash maps skb->priority onto traffic class
(5) driver receives frame on a tx queue associated with the correct traffic class

Now if I rename the device 'eth3.foo' the setsockopt() calls returns success and the write to socket returns correctly, but somehow the frame is not steered to the correct queue. Meaning in step (5) looking at the driver stats with 'ethtool -S' shows skb's are being transmitted on what appears to be random(?) tx queues.

This is specific to the RHEL 6.2Alpha kernel and is not seen on older RHEL kernels or upstream kernels. I may continue to dig into this but was hoping someone with access to git logs and so forth might take a look.

Comment 2 john.r.fastabend 2011-10-05 21:18:53 UTC
Ran a quick test with pktgen (hard coded skb->priority in ./net/core/pktgen.c) and this works meaning priorities are mapped to correct tx queues. This works on eth3.101 and eth3.foo. Could the skb->priority be mangled above the vlan xmit routines?

Comment 3 RHEL Program Management 2011-10-07 15:53:37 UTC
Since RHEL 6.2 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 4 Neil Horman 2011-10-13 18:10:06 UTC
do you have a reproducer that you can share here so we can re-create this locally?  Are you seeing another queue get selected speficially or are the queue selections random?

Also what driver are you using with this test?  ixgbe?  and I presume the standard 802.1q vlan driver?

If you can't provide a reproducer please let me know and I'll post a stap script for you to run to see where in the stack the skb->priority is changing.

Comment 5 john.r.fastabend 2011-10-13 18:54:54 UTC
The skb appears to be hitting the skb_hash with priority 0, I'll try to provide a reproducer here in the next day or two. This is with standard 8021q and ixgbe drivers.

Comment 6 Ross Brattain 2011-10-13 19:40:13 UTC
A systemtap script would be awesome.  I tried using a systemtap script of my own but it complained about missing debuginfo for Beta1.

Comment 7 Neil Horman 2011-10-18 19:47:07 UTC
well, any stap script I provide to you will also need the debuginfo packages for RHEL6 beta.  You can download them via the beta channel on rhn.  Do you need to me to provide you specific beta debuginfo packages?

Comment 8 Neil Horman 2011-10-18 19:59:02 UTC
Created attachment 528883 [details]
stap script to produce log of priority values in skb at dev_pick_tx

They will probably need a few iterations to nail down the root of the problem, but this stap script should at least confirm for us that the priority of an skb is correct, or has become reset at various stages of transmission.

Comment 9 Neil Horman 2011-10-21 13:29:28 UTC
Ross, have you had a chance to run that script yet?

Comment 10 john.r.fastabend 2011-10-21 15:49:13 UTC
Hi Neil,

My current theory is these tests were run without this patch,

* Tue Jun 14 2011 Aristeu Rozanski <arozansk> [2.6.32-158.el6]
- [net] vlan: remove multiqueue ability from vlan device (Neil Horman) [703245]


Which would cause something like the symptom described to occur. I'm not sure where the eth3.fcoe notation came from though. Anyways I'll try to get to the bottom of this I can't reproduce it on my systems.

By the way Comment5 was just a stupid test on my part I had some setup issues.

Thanks, John.

Comment 11 Neil Horman 2011-10-21 16:12:48 UTC
I could certainly see that being the problem, but that seems like an awfully old kernel to be testing.  Ross, can you provide the kernel version that you have been testing with?

Comment 12 Ross Brattain 2011-10-24 22:24:29 UTC
This was originally reported on 2.6.32-195.el6.x86_64.

I have no results from system-tap due to lack of debuginfo rpms.  I am unable to find the kernel-debuginfo-* RPMs for 2.6.32-195.el6.x86_64 or 2.6.32-207.el6.x86_64 on RHN.

We are unable to reproduce this issue on 2.6.32-207.el6.x86_64.  

Testing steps:

SUT and Peer connected back-to-back on eth2.

SUT and Peer:
1. ip link add dev eth2.102-iperf link eth2 type vlan id 102
2. ip addr add 10.2.102.X/24 dev eth2.102-iperf
3. ip link set dev eth2.102-iperf up
4. dcbtool sc eth2 dcb on
5. dcbtool sc eth2 app:fcoe e:1

Run tshark on SUT
6. tshark -i eth2 -z proto,colinfo,vlan.priority,vlan.priority udp

Run socat on Peer (send udp traffic to SUT on priority 5

7. socat -u -T 5 file:<data-file> udp:10.2.102.X:5005,reuseaddr,sourceport=5005,priority=5

8. verify vlan.priority in tshark capture.

Comment 13 Neil Horman 2011-10-25 12:22:02 UTC
Ok, copy that.  given this can't be reproduced on the most recent kernel, I'm going to assume that this is fixed by the patch referenced in comment 10.  closing as dup of 703245

*** This bug has been marked as a duplicate of bug 703245 ***


Note You need to log in before you can comment on or make changes to this bug.