Bug 1567072

Summary: [ovs]can't capture output packet using tcpdump when nfp nic was added to ovs bridge vlan_mode=native-tagged
Product: Red Hat Enterprise Linux 7 Reporter: LiLiang <liali>
Component: tcpdumpAssignee: Michal Ruprich <mruprich>
Status: CLOSED NOTABUG QA Contact: qe-baseos-daemons
Severity: medium Docs Contact:
Priority: medium    
Version: 7.5CC: atragler, echaudro, jakub.kicinski, jomiller, liali, louis.peens, msekleta, pablo.cascon, pieter.jansenvanvuuren, qding, simon.horman
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-21 13:57:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description LiLiang 2018-04-13 11:46:02 UTC
Description of problem:
after add nfp nic to ovs bridge and set vlan_mode=native-tagged, i can't capture output packet on this nfp nic using tcpdump, can only capture input packet.

Version-Release number of selected component (if applicable):
[root@hp-dl380g9-01 topo]# uname -r
3.10.0-860.el7.x86_64
[root@hp-dl380g9-01 topo]# ethtool -i ens2np0
driver: nfp
version: 3.10.0-860.el7.x86_64 SMP mod_u
firmware-version: 0.0.3.5 0.22 nic-2.0.4 nic
expansion-rom-version: 
bus-info: 0000:0b:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: no
[root@hp-dl380g9-01 topo]# lspci -s 0000:0b:00.0
0b:00.0 Ethernet controller: Netronome Systems, Inc. Device 4000


How reproducible:
always

+----------------------------------------+
|                                        |
|                                        |
|        systemA                         |
|                                        |
|        +------+           +--------+   |
|        |ovsbr0|----vnet1--|   vm   |   |
|        +------+           +--------+   |
|       /                                |
|      /                                 |
+----------------------------------------+
      |nfp nic    
      |            
      |            
      |            
      |            
      |            
      |            
      |any nic      
+---------------------------------------+
|      \                                |
|       \                               |
|        +------+           +--------+  |
|        |ovsbr0|----vnet1--|   vm   |  |
|        +------+           +--------+  |
|                                       |
|        systemB                        |
|                                       |
|                                       |
+---------------------------------------+


Steps to Reproduce:
1.on system A, create ovs bridge, add vnet1 and nfp nic to ovs-bridge, set nfp nic vlan_mode=native-tagged
#ovs-vsctl add-br ovsbr0
#ovs-vsctl add-port ovsbr0 ens2np0 tag=3 vlan_mode=native-tagged
#ovs-vsctl add-port ovsbr0 vnet1 tag=3

2.on system B, do the same thing with system A

3.on system B vm, ping system A vm succeeded
[root@localhost ~]# ping 172.31.220.1
PING 172.31.220.1 (172.31.220.1) 56(84) bytes of data.
64 bytes from 172.31.220.1: icmp_seq=1 ttl=64 time=1.08 ms
64 bytes from 172.31.220.1: icmp_seq=2 ttl=64 time=0.469 ms
64 bytes from 172.31.220.1: icmp_seq=3 ttl=64 time=0.592 ms
64 bytes from 172.31.220.1: icmp_seq=4 ttl=64 time=0.562 ms
64 bytes from 172.31.220.1: icmp_seq=5 ttl=64 time=0.488 ms

4.on system A, capture packet on nfp nic, only got input nic, can't got output packet:
[root@hp-dl380g9-01 topo]# tcpdump -i ens2np0 host 172.31.220.2
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens2np0, link-type EN10MB (Ethernet), capture size 262144 bytes
07:18:00.616990 IP 172.31.220.2 > 172.31.220.1: ICMP echo request, id 7440, seq 114, length 64
07:18:01.616915 IP 172.31.220.2 > 172.31.220.1: ICMP echo request, id 7440, seq 115, length 64
07:18:02.616876 IP 172.31.220.2 > 172.31.220.1: ICMP echo request, id 7440, seq 116, length 64
07:18:03.616885 IP 172.31.220.2 > 172.31.220.1: ICMP echo request, id 7440, seq 117, length 64
07:18:04.616862 IP 172.31.220.2 > 172.31.220.1: ICMP echo request, id 7440, seq 118, length 64
07:18:05.616859 IP 172.31.220.2 > 172.31.220.1: ICMP echo request, id 7440, seq 119, length 64
07:18:06.616743 IP 172.31.220.2 > 172.31.220.1: ICMP echo request, id 7440, seq 120, length 64


Actual results:
can't capture output packet using tcpdump when nfp nic was added to ovs bridge vlan_mode=native-tagged

Expected results:


Additional info:

Comment 2 Eelco Chaudron 2018-04-20 06:43:51 UTC
Can you let me know if this is with or without hardware offload (tc flower)?

Comment 3 LiLiang 2018-04-20 06:58:31 UTC
(In reply to Eelco Chaudron from comment #2)
> Can you let me know if this is with or without hardware offload (tc flower)?

without hardware offload

Comment 4 Eelco Chaudron 2018-04-20 07:31:30 UTC
I'm assuming this is working on the other system with the none NFP nic. Pablo, is this a known issue/limitation? Can you quickly check before I dig deeper into this?

Comment 5 Pablo Cascon (Netronome) 2018-04-20 09:29:39 UTC
Thanks for the bug report and highlighting. This is not expected. If you can please check the netdev statistics with "ethtool -S" and ifconfig to see if the "output packet" are accounted somehow. tcpdump should show the outgoing packets if I'm not mistaken regardless of the netdev/NIC

Comment 6 LiLiang 2018-04-20 10:02:24 UTC
(In reply to Pablo Cascon from comment #5)
> Thanks for the bug report and highlighting. This is not expected. If you can
> please check the netdev statistics with "ethtool -S" and ifconfig to see if
> the "output packet" are accounted somehow. tcpdump should show the outgoing
> packets if I'm not mistaken regardless of the netdev/NIC

I ping with option -i 0.001: ping 192.168.3.253 -i 0.001 &
It seems all "output packet" are accounted.


[root@hp-dl380g9-01 topo]# ethtool -S ens2np0 | grep dev_tx_pkts
     dev_tx_pkts: 5766
[root@hp-dl380g9-01 topo]# ethtool -S ens2np0 | grep dev_tx_pkts
     dev_tx_pkts: 6802
[root@hp-dl380g9-01 topo]# ethtool -S ens2np0 | grep dev_tx_pkts
     dev_tx_pkts: 7425
[root@hp-dl380g9-01 topo]# ethtool -S ens2np0 | grep dev_tx_pkts
     dev_tx_pkts: 8013
[root@hp-dl380g9-01 topo]# ethtool -S ens2np0 | grep dev_tx_pkts
     dev_tx_pkts: 8670
[root@hp-dl380g9-01 topo]# ethtool -S ens2np0 | grep dev_tx_pkts
     dev_tx_pkts: 9154

[root@hp-dl380g9-01 topo]# ifconfig ens2np0
ens2np0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 2001::215:4dff:fe13:424  prefixlen 64  scopeid 0x0<global>
        ether 00:15:4d:13:04:24  txqueuelen 1000  (Ethernet)
        RX packets 24674  bytes 2508003 (2.3 MiB)
        RX errors 0  dropped 500  overruns 0  frame 0
        TX packets 23312  bytes 2377176 (2.2 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@hp-dl380g9-01 topo]# ifconfig ens2np0
ens2np0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 2001::215:4dff:fe13:424  prefixlen 64  scopeid 0x0<global>
        ether 00:15:4d:13:04:24  txqueuelen 1000  (Ethernet)
        RX packets 25504  bytes 2592621 (2.4 MiB)
        RX errors 0  dropped 500  overruns 0  frame 0
        TX packets 24141  bytes 2461734 (2.3 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@hp-dl380g9-01 topo]# ifconfig ens2np0
ens2np0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 2001::215:4dff:fe13:424  prefixlen 64  scopeid 0x0<global>
        ether 00:15:4d:13:04:24  txqueuelen 1000  (Ethernet)
        RX packets 30470  bytes 3099213 (2.9 MiB)
        RX errors 0  dropped 500  overruns 0  frame 0
        TX packets 29097  bytes 2967246 (2.8 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Comment 7 Pablo Cascon (Netronome) 2018-04-20 10:13:19 UTC
Hmm so dev_tx_pkts goes up, that's the NIC saying the pkts are sent, tcpdump  filter issue?

Comment 9 LiLiang 2018-04-23 01:01:30 UTC
(In reply to Pablo Cascon from comment #7)
> Hmm so dev_tx_pkts goes up, that's the NIC saying the pkts are sent, tcpdump
> filter issue?

I don't think so. No this issue when using tcpdump with other drivers.

Comment 10 Eelco Chaudron 2018-04-23 07:56:13 UTC
Assigning this to Pablo as he is troubleshooting this already.

Comment 11 Pablo Cascon (Netronome) 2018-04-24 15:27:51 UTC
Bug replicated even without VMs (just regular netdevs). Somehow without the tcpdump filter all pkts can be seen. And with the filter only the ingress ones get filtered. Investigating

Comment 12 jakub.kicinski 2018-04-25 00:03:39 UTC
This is due to our CoreNIC firmware choosing not to support VLAN stripping and inserting in the datapath.

tcpdump doesn't actually test in its filters for the VLAN case.  It so happens that most NICs today strip and report the VLAN tag out-of-band, so it's never part of the frame.  This has negligible performance advantages so we don't do that.  In our case VLAN tag is present inside the frame data.  You can see RX frames in tcpdump because stack will strip the VLAN tag before the tcpdump hook, on TX path the hook is after VLAN insertion, so BPF filter is using wrong offsets.

One can investigate the filter byte code with -d option, with sufficient BPF knowledge :)

# tcpdump -pi p4p1 -d 'host 10.22.100.2'
(000) ldh      [12]
(001) jeq      #0x800           jt 2	jf 6
(002) ld       [26]
(003) jeq      #0xa166402       jt 12	jf 4
(004) ld       [30]
(005) jeq      #0xa166402       jt 12	jf 13
(006) jeq      #0x806           jt 8	jf 7
(007) jeq      #0x8035          jt 8	jf 13
(008) ld       [28]
(009) jeq      #0xa166402       jt 12	jf 10
(010) ld       [38]
(011) jeq      #0xa166402       jt 12	jf 13
(012) ret      #262144
(013) ret      #0

You can see the offsets are hard coded and there is no 0x8100 test only test for offset 12 (ethtype) is 0x0800, so IP.

The same exact behaviour can be seen with any other vendor if rx-vlan-offload and tx-vlan-offload are set to off in ethtool:

# ethtool -K eth4 rxvlan off txvlan off

Now you will only see received frames.

tcpdump and VLANs are always causing trouble:

http://netoptimizer.blogspot.com/2010/09/tcpdump-vs-vlan-tags.html
https://christian-rossow.de/articles/tcpdump_filter_mixed_tagged_and_untagged_VLAN_traffic.php

Comment 13 Pablo Cascon (Netronome) 2018-10-18 08:47:10 UTC
Changing this BZ's component to 'tcpdump' as it is its limitation.

Comment 14 Joshua Miller 2019-08-21 13:57:33 UTC
Closing per request of Netronome as this is a stale bz.