Bug 1542013

Summary: RHEL-7.5: Cannot set port mirroring onto two interface
Product: Red Hat Enterprise Linux 7 Reporter: Michael Burman <mburman>
Component: kernelAssignee: Ivan Vecera <ivecera>
kernel sub component: Networking QA Contact: Li Shuang <shuali>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: high CC: atragler, bhu, bugs, danken, ivecera, jhsiao, jiji, jkurik, kzhang, mleitner, myakove, network-qe, rkhan, sukulkar
Version: 7.5Keywords: Regression
Target Milestone: pre-dev-freeze   
Target Release: 7.5   
Hardware: x86_64   
OS: Linux   
URL: git://git.engineering.redhat.com/users/ivecera/rhel7.git#bz1542013
Whiteboard:
Fixed In Version: kernel-3.10.0-854.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1544217 (view as bug list) Environment:
Last Closed: 2018-04-10 23:49:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1442258, 1533778, 1544217    
Attachments:
Description Flags
Logs none

Description Michael Burman 2018-02-05 12:01:30 UTC
Created attachment 1391467 [details]
Logs

Description of problem:
Can't run VM with port mirroring if another VM with port mirroring is already running on the host.

If trying to run VM with port mirroring vNIC and we have already a running VM with port mirroring running on the host we fail with:

2018-02-05 13:49:02,560+0200 ERROR (jsonrpc/1) [api] FINISH destroy error=(22, 'RTNETLINK answers: Invalid argument', ['/sbin/tc', 'filter', 'replace', 'dev', 'pm1', 'protocol', 'all', 'parent', 'ffff:', 'handle',
 '800::800', 'pref', '49152', 'u32', 'match', 'u8', '0', '0', 'action', 'mirred', 'egress', 'mirror', 'dev', 'vnet0']) (api:127)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 117, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 311, in destroy
    res = self.vm.destroy(gracefulAttempts)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 5131, in destroy
    result = self.doDestroy(gracefulAttempts, reason)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 5150, in doDestroy
    return self.releaseVm(gracefulAttempts)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 5032, in releaseVm
    nic.name)
  File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 55, in __call__
    return callMethod()
  File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 53, in <lambda>
    **kwargs)
  File "<string>", line 2, in unsetPortMirroring
  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in _callmethod
    raise convert_to_error(kind, result)
TrafficControlException: (22, 'RTNETLINK answers: Invalid argument', ['/sbin/tc', 'filter', 'replace', 'dev', 'pm1', 'protocol', 'all', 'parent', 'ffff:', 'handle', '800::800', 'pref', '49152', 'u32', 'match', 'u8', '0', '0', 'action', 'mirred', 'egress', 'mirror', 'dev', 'vnet0'])
2018-02-05 13:49:02,577+0200 INFO  (jsonrpc/1) [api.virt] FINISH destroy return={'status': {'message': 'General Exception: ("(22, \'RTNETLINK answers: Invalid argument\', [\'/sbin/tc\', \'filter\', \'replace\', \'dev\', \'pm1\', \'protocol\', \'all\', \'parent\', \'ffff:\', \'handle\', \'800::800\', \'pref\', \'49152\', \'u32\', \'match\', \'u8\', \'0\', \'0\', \'action\', \'mirred\', \'egress\', \'mirror\', \'dev\', \'vnet0\'])",)', 'code': 100}} from=::ffff:10.35.163.149,37508 (api:52)

After the vM failed to run, on the host it is reproted as running and reboot required to release it. 

[root@camel-vdsa ~]# virsh -r list
 Id    Name                           State
----------------------------------------------------
 9     V1                             running
 10    V2                             running

VM V2 is failed to run. 

Version-Release number of selected component (if applicable):
vdsm-4.20.17-1.el7ev.x86_64
kernel-3.10.0-830.el7.x86_64

How reproducible:
100

Steps to Reproduce:
1. Create network with port mirroring vNIC profile and attach to the host 
2. Run VM1 with port mirroring vNIC 
3. Try to run VM2 with port mirroring vNIC

Actual results:
Failed with tc error

Expected results:
Should work

Comment 1 Red Hat Bugzilla Rules Engine 2018-02-05 12:18:00 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 2 Michael Burman 2018-02-05 15:11:34 UTC
This passes on el7.4 (3.10.0-693.el7.x86_64), explodes in el7.5 (3.10.0-830.el7.x86_64)
iproute-3.10.0-87.el7.x86_64 version running on both kernels 


ip link add type veth
ip link add type veth
brctl delbr pm2
brctl addbr pm2

/sbin/tc qdisc add dev pm2 ingress
/sbin/tc filter show dev pm2 parent ffff:
/sbin/tc filter replace dev pm2 protocol all parent ffff: u32 match u8 0 0 action mirred egress mirror dev veth0
/sbin/tc qdisc replace dev pm2 root prio
qd=`/sbin/tc qdisc show dev pm2 |grep '^qdisc prio ' |sed 's/qdisc prio //;s/: .*//'`

/sbin/tc filter show dev pm2 parent "$qd":
/sbin/tc filter replace dev pm2 protocol all parent "$qd": u32 match u8 0 0 action mirred egress mirror dev veth0
/sbin/ip link set dev pm2 promisc on

/sbin/tc qdisc add dev pm2 ingress || :
/sbin/tc filter show dev pm2 parent ffff:
/sbin/tc filter replace dev pm2 protocol all parent ffff: handle 800::800 pref 49152 u32 match u8 0 0 action mirred egress mirror dev veth0 action mirred egress mirror dev veth2


[root@camel-vdsa ~]# bash -ex burman.sh
+ brctl delbr pm2
+ brctl addbr pm2
+ /sbin/tc qdisc add dev pm2 ingress
+ /sbin/tc filter show dev pm2 parent ffff:
+ /sbin/tc filter replace dev pm2 protocol all parent ffff: u32 match u8 0 0 action mirred egress mirror dev veth0
+ /sbin/tc qdisc replace dev pm2 root prio
++ /sbin/tc qdisc show dev pm2
++ grep '^qdisc prio '
++ sed 's/qdisc prio //;s/: .*//'
+ qd=8010
+ /sbin/tc filter show dev pm2 parent 8010:
+ /sbin/tc filter replace dev pm2 protocol all parent 8010: u32 match u8 0 0 action mirred egress mirror dev veth0
+ /sbin/ip link set dev pm2 promisc on
+ /sbin/tc qdisc add dev pm2 ingress
RTNETLINK answers: File exists
+ :
+ /sbin/tc filter show dev pm2 parent ffff:
filter protocol all pref 49152 u32 
filter protocol all pref 49152 u32 fh 800: ht divisor 1 
filter protocol all pref 49152 u32 fh 800::800 order 2048 key ht 800 bkt 0 terminal flowid ??? not_in_hw 
  match 00000000/00000000 at 0
        action order 1: mirred (Egress Mirror to device veth0) pipe
        index 1 ref 1 bind 1
 
+ /sbin/tc filter replace dev pm2 protocol all parent ffff: handle 800::800 pref 49152 u32 match u8 0 0 action mirred egress mirror dev veth0 action mirred egress mirror dev veth2
RTNETLINK answers: Invalid argument
We have an error talking to the kernel

Comment 3 Dan Kenigsberg 2018-02-05 16:03:26 UTC
Given the commandline-only reproducer, I'm moving the bug to RHEL.

Comment 7 Ivan Vecera 2018-02-08 13:14:56 UTC
The issue is caused by commit:

commit 24d3dc6d27eae19f422a5e216e25d3a16628d4ff
Author: Or Gerlitz <ogerlitz>
Date:   Thu Feb 16 10:31:15 2017 +0200

    net/sched: cls_u32: Reflect HW offload status
    
    U32 support for the "in hw" offloading flags.
    
    Signed-off-by: Or Gerlitz <ogerlitz>
    Reviewed-by: Amir Vadai <amir>
    Signed-off-by: David S. Miller <davem>

This commit added TCA_CLS_FLAGS_{,NOT}_IN_HW flags to u32 but the conditional in u32_change() is too strict and causes impossibility to replace existing filter:

static int u32_change(struct net *net, struct sk_buff *in_skb,
                      struct tcf_proto *tp, unsigned long base, u32 handle,
                      struct nlattr **tca, void **arg, bool ovr,
                      struct netlink_ext_ack *extack)
{
...
                if (n->flags != flags) {
                        NL_SET_ERR_MSG_MOD(extack, "Key node flags do not match passed flags");
                        return -EINVAL;
                }
...
}

The n->flags contains either ...IN_HW or ...NOT_IN_HW according offloading state. These flags cannot be passed from userspace so the passed flags cannot contain them and the conditional cannot be true.

The upstream is affected as well so I'm going to fix it first.

Comment 8 Ivan Vecera 2018-02-08 15:12:21 UTC
Upstream patch submitted:

https://patchwork.ozlabs.org/patch/870905/

Comment 9 Ivan Vecera 2018-02-09 13:09:02 UTC
(In reply to Ivan Vecera from comment #8)
> Upstream patch submitted:
> 
> https://patchwork.ozlabs.org/patch/870905/

Accepted.

Comment 12 Jean-Tsung Hsiao 2018-02-09 16:26:39 UTC
Done

Comment 13 Dan Kenigsberg 2018-02-18 19:45:46 UTC
May I ask when would we have a build to test?

Comment 14 Bruno Meneguele 2018-02-19 18:36:48 UTC
Patch(es) committed on kernel repository and an interim kernel build is undergoing testing

Comment 16 Bruno Meneguele 2018-02-20 12:22:35 UTC
Patch(es) available on kernel-3.10.0-854.el7

Comment 18 Meni Yakove 2018-02-26 08:31:12 UTC
ovirt-engine-4.2.2.1-0.1.el7.noarch
kernel 3.10.0-855.el7.x86_64

Comment 19 Meni Yakove 2018-02-26 08:31:39 UTC
ovirt-engine-4.2.2.1-0.1.el7.noarch
kernel 3.10.0-855.el7.x86_64

Comment 20 errata-xmlrpc 2018-04-10 23:49:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1062