Bug 1264316
Summary: | vlan tag removed in promiscuous mode from kernel-2.6.32-504.23.4.el6.i686 to kernel-2.6.32-573.3.1.el6.i686 | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Anitha <anitha.lashmi> |
Component: | kernel | Assignee: | Ken Cox <jkc> |
kernel sub component: | NIC Drivers | QA Contact: | Jianlin Shi <jishi> |
Status: | CLOSED WONTFIX | Docs Contact: | |
Severity: | high | ||
Priority: | unspecified | CC: | ajith.p, anitha.lashmi, danken, fcolumbu, jkc, network-qe, prarit, sbonazzo |
Version: | 6.8 | Keywords: | Reopened |
Target Milestone: | alpha | Flags: | ajith.p:
needinfo?
|
Target Release: | 6.8 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-12-06 10:56:32 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1269638 | ||
Attachments: |
Description
Anitha
2015-09-18 07:41:51 UTC
*** Bug 1263561 has been marked as a duplicate of this bug. *** Are you monitoring on the physical interface? Or on the VLAN interface? monitoring on physical interface. when vlan packets are ping from external device, it reaches Linux, vlan tag is being stripped out. below is the debug snippet when ping from peer end with vlan tag 77. but this vlan tag is not seen in tcpdump output of Linux. *Jan 11 19:50:54.921 EST: IP: s=77.7.7.1 (local), d=77.7.7.20 (Vlan77), len 100, sending *Jan 11 19:50:54.921 EST: IP: s=77.7.7.1 (local), d=77.7.7.20 (Vlan77), len 100, sending full packet attached is the output of tcpdump using command tcpdump -i eth3 -w failure.pcap. Created attachment 1076261 [details]
failure log of vlan tag not found in linux failed kernel version 2.6.32-573.3.1.el6.i686
What hardware is in use? Please also attach the output of running 'dmesg' after the failure and the output of running 'lsmod'...thanks! Also, do you see these frames on the vlan77 interface as well? Or only on the physical interface? Thanks for your support on this issue. hardware is e1000 driver. version below. driver: e1000 version: 7.3.21-k8-NAPI we set the interfaces to be in promiscuous mode. we don't see specific message at the time of failure in dmesg. but below are dmesg output. device eth0 entered promiscuous mode device eth1 entered promiscuous mode device eth3 entered promiscuous mode [root@iot-5921-lnx1 lib64]# ethtool -d eth3 | grep VLAN VLAN mode: disabled VLAN filter: disabled [root@iot-5921-lnx1 lib64]# [root@iot-5921-lnx1 lib64]# ethtool -k eth3 | grep vlan rx-vlan-offload: on [fixed] tx-vlan-offload: on [fixed] rx-vlan-filter: on [fixed] vlan-challenged: off [fixed] [root@iot-5921-lnx1 lib64]# we don't configure vlan 77 in Linux interface. but Linux physical interface accepts the packet. only vlan tagging differs in passed and failed kernel version. please let me know for further information. am attaching lsmod output here. Created attachment 1076390 [details]
lsmod output of 2.6.32-573.3.1.el6.x86_64
lsmod output as requested.
OK, so you are _not_ configuring a VLAN interface. You are sending VLAN-tagged frames on the physical network attached to the e1000 interface, and the e1000 (in promiscuous mode) is receiving the frames. But the received frames have their VLAN tag stripped before tcpdump sees them. Is this correct? Hi John, yes. we are not configuring VLAN interface. we use e1000 interface. one correction is we could "see" VLAN tag in tcpdump in failed kernel version. We are running a application, which expects vlan tagged frames. the application/process gets vlan tagged frames from Linux passed version with kernel-2.6.32-431.el6.i686. the same application/process gets the expected frame, but without vlan tag in failed case, kernel-2.6.32-573.3.1.el6.i686 we could narrow down the passed and failed kernel versions and the nearest passed and failed kernels are below. kernel-2.6.32-504.16.2.el6.i686 kernel-2.6.32-504.23.4.el6.i686 Please let me know if there is a chance of vlan tag stripping in kernel. also please let me know if further information is required as this issue is getting high priority to us. Thanks much for your help on this issue. Created attachment 1076919 [details]
vlan tag seen in tcpdump in failed kernel version
please refer packet 8 where vlan tag 77 is seen in tcpdump output. but this tag is not seen in our application when checked in failed kernel version, 2.6.32-573.3.1.el6.i686
I have been able to reproduce this problem and am investigating now. It turns out that the tag is not missing, but is accessible via the PACKET_AUXDATA cmsg (in the tp_vlan_tci field). The PACKET_AUXDATA cmsg will need to be checked for every frame in order to retrieve the vlan tag. hi Ken, Thanks for your inputs. we are investigating further to retrieve the tag in our application. Please let me know whether this behavior is intentional after specific kernel version? do we have bug id/change set which causes this behavior? do you have any thoughts about below patch causes the intentional change of vlan tag? Please correct me if am wrong. - [net] vlan: Always untag vlan-tagged traffic on input (Jiri Pirko) [1173501 1135347] https://www.mail-archive.com/stable@vger.kernel.org/msg97900.html Thanks, Anitha Hi Anitha, The patch you mention above is responsible for this change. The change is intentional and brings the kernel in line with upstream kernel behavior. The bug associated with this can be found at: https://bugzilla.redhat.com/show_bug.cgi?id=1135347 This change was first available in kernel-2.6.32-511.el6. I will have to check on the Z-stream version this first appeared in. hi Ken, Is it possible for you to recommend any other option to retrieve the vlan tag other than from PACKET_AUXDATA and tp_vlan_tci field? It looks like tp_vlan_tci field is available in recent kernel versions. but our compiler environment is 2.6.18 which does not have the field. I know I may be silly to ask this and I understand compilation and target environment should be same. but just thinking aloud if there is a chance. could you help any other alternative solution to retrieve vlan tag? Thanks for your help. Thanks, Anitha Hi Anitha, I am not aware of another way to retrieve the vlan tag. You will need to compile in an updated RHEL6 environment in order to access this info. hi Ken, we are looking out the possibilities to compile in updated environment. BTW, I don't have access to below two posted bugids in this thread. https://bugzilla.redhat.com/show_bug.cgi?id=1135347 and 1269638 can I have access to see it? Thanks, Anitha hi Ken, I understand there is a behavior change in kernel and is not a bug. but I am in need of your support for further clarification. as per your suggestion, with auxiliary data we could read vlan tag and vlan ping is success. we wanted to introduce this vlan tag reconstruction in our code after Linux kernel fix version for conditional compilation. as per my understanding, the behavior is exactly changed in kernel-2.6.32-504.23.4.el6.i686. in this case, could you suggest the macro to uniquely identify the version/release for conditional compilation? I see below details in version.h. Please advise. /usr/include/Linux/version.h #define LINUX_VERSION_CODE 132640 #define KERNEL_VERSION(a,b,c) (((a) << 16) + ((b) << 8) + (c)) #define RHEL_MAJOR 6 #define RHEL_MINOR 7 #define RHEL_RELEASE_VERSION(a,b) (((a) << 8) + (b)) #define RHEL_RELEASE_CODE 1543 #define RHEL_RELEASE 572 #define RHEL_16KSTACK_BUILD 520 Thanks, Anitha Hi Anitha, Unfortunately, the macros in version.h can not be used to differentiate between builds kernel-2.6.32-504.16.2.el6.i686 and kernel-2.6.32-504.23.4.el6.i686. You could possibly parse KERNELRELEASE in your makefile and act accordingly. KERNELRELEASE is a string that contains the full identifier for that particular build of the kernel, such as: 2.6.32-279.14.1.el6.x86_64 hi Ken, thanks for the clarification. Basically our goal is to use same code changes to run for older and newer kernels. while surfing, I see some kernels uses TP_STATUS_VLAN_VALID to set the tp_status field in auxiliary data. but the field is not supported in our target kernels. it would be helpful if we get common data structure to check vlan tag in older and newer kernels, other then newly added fields(tp_vlan_tci and tp_padding). Otherwise, we would need to read the tag for every packet till ethtype from CMSG, even though it is not necessary for older kernel where we use recvfrom(). but I believe this might slow down the performance for live traffic in our application. Appreciate your thoughts on this. Thanks, Anitha Hi Anitha, The tp_vlan_tci field has been present in all of the RHEL6 kernels. The problem was that vlan tags were not being handled consistently because of the differences in how individual NICs handle the vlan tags. That has now been rectified and the RHEL6 kernel has been aligned with upstream in the way that vlan tags are passed to the user. The PACKET_AUXDATA cmsg should be checked for every packet, even on older kernels. Apparently, your application just happened to work because of the NICs that happened to be present on the platform you are running on. You might look at something like libpcap and see how it processes received packets. Hi Ken, thanks for the pointer. we did see the libpcap source file. but I am still unable to find how they have set HAVE_LINUX_TPACKET_AUXDATA_TP_VLAN_TCI to handle vlan tagged frames. example version 1.4.0. https://github.com/the-tcpdump-group/libpcap/blob/master/pcap-linux.c I understand it may be out of scope to you. I have one more clarification regarding kernel releases. during our testing we have found the working and failure releases are below. 1. Last Working version before vlan tag stripping by kernel: kernel-2.6.32-504.16.2.el6.i686 2. First behavior change version: kernel-2.6.32-504.23.4.el6.i686 but 2.6.36 version, we don't see any issues. as part of our code fix, we have planned to differentiate the versions based on above #1 and #2. since, 2.6.36 is incremental version, I wonder whether the 1135347 fix might not enter 2.6.36? could you please help me to understand? Thanks, Anitha Hi Ken, as per your comment #17, kernel-2.6.32-511.el6 is first introduced with the fix. but in our testing, the behavior is seen from kernel-2.6.32-504.23.4.el6 itself. could you please confirm the exact release where below bug fixes are integrated? - [net] vlan: Always untag vlan-tagged traffic on input (Jiri Pirko) [1173501 1135347] in many email discussions, looks like below fix also responsible for the new behavior change. - [net] vlan: make non-hw-accel rx path similar to hw-accel (Jiri Pirko) [1173501 1135347] Appreciate your support on this. Thanks, Anitha This change went into 3 different kernel streams: 6.5.z: 2.6.32-431.61.2 6.6.z: 2.6.32-504.23.4.el6 6.7: 2.6.32-511.el6 so, any kernel within each of these streams that is at or later than the above versions would have the change. Hi Ken, Thanks for the clarification. we are planning to put fix in our application based on the kernel version the behavior change is introduced. * Can we assume that any version greater than 2.6.32-511.el6 will have the behavior change ? * Is the behavior change fix back ported to 431 and 504 releases ? * Should we also bother about any releases in between like 2.6.32-505, 506, etc. until 511 ? This information would really helpful to decide our fix in application. Thanks, Anitha (In reply to Anitha from comment #30) > Hi Ken, > > Thanks for the clarification. > > we are planning to put fix in our application based on the kernel version > the behavior change is introduced. > > * Can we assume that any version greater than 2.6.32-511.el6 will have the > behavior change ? yes > > * Is the behavior change fix back ported to 431 and 504 releases ? yes, as mentioned above, this also went into the z-streams. Anything in the z-stream after 2.6.32-431.61.2 for the 6.5 z-stream or 2.6.32-504.23.4.el6 for the 6.6 z-stream will have the change. > > * Should we also bother about any releases in between like 2.6.32-505, 506, > etc. until 511 ? No. this change does not appear in 2.6.32-505, 506, etc. until 2.6.32-511. > > This information would really helpful to decide our fix in application. > > Thanks, > Anitha hi Ken, I am sorry to keep further questions. could you please give inputs on z-stream numbering? 1. kernel version numbering is per stream OR universal across streams ? 2. If per stream, how do the /etc/*release for a z-stream look like ? Is it possible for us to download and test ? For example, how do we know the latest version of 6.6.z? 3. If universal across streams, can you give us the exact release versions in between 431 and 504 that will and will not have the kernel patch for vlan ? We assume that 505-510 will never have the vlan patch in future also. 4. Does 2.6.32-504.16.2 falls under 6.6.z? does any minor version after 504 is z-series? as per my understanding, 2.6.32-431 =>6.5 2.6.32-431.11.2 =>6.5.z 2.6.32-504 => 6.6 2.6.32-504.16.2 => 6.6.z 2.6.32-504.23.4 => 6.6.z Please correct me if I am wrong. Thanks for your help on this. Anitha Hi Anitha, Your example in point 4 below is correct so I think that answers the other questions you had. I believe that Z-stream releases are available by subscription but I do not know the details so you would need to reach out to your account management team for information on how to obtain Z-stream releases. (In reply to Anitha from comment #33) > hi Ken, > > I am sorry to keep further questions. > > could you please give inputs on z-stream numbering? > > 1. kernel version numbering is per stream OR universal across streams ? > > 2. If per stream, how do the /etc/*release for a z-stream look like ? > Is it possible for us to download and test ? For example, how do we know > the latest version of 6.6.z? > > 3. If universal across streams, can you give us the exact release versions > in between 431 and 504 that will and will not have the kernel patch for vlan > ? > We assume that 505-510 will never have the vlan patch in future also. > > 4. Does 2.6.32-504.16.2 falls under 6.6.z? does any minor version after 504 > is z-series? as per my understanding, > > 2.6.32-431 =>6.5 > 2.6.32-431.11.2 =>6.5.z > > 2.6.32-504 => 6.6 > 2.6.32-504.16.2 => 6.6.z > 2.6.32-504.23.4 => 6.6.z > > Please correct me if I am wrong. > > Thanks for your help on this. > > Anitha Hi, I am facing the same issue in latest kernels (in centos 6.7). I am a beginner in this area. Can you please help me out, by giving the step by step procedure in detail to fix this issue. It will be very helpful for me. Thanks in advance. regards, Ajith Red Hat Enterprise Linux 6 is in the Production 3 Phase. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available. The official life cycle policy can be reviewed here: http://redhat.com/rhel/lifecycle This issue does not meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Note that a strong business justification will be required for re-evaluation. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL: https://access.redhat.com/ |