Bug 1842876

Summary:	[OVN] Port range filtering sometimes does not allow traffic to the entire range
Product:	Red Hat OpenStack	Reporter:	Maysa Macedo <mdemaced>
Component:	python-networking-ovn	Assignee:	Jakub Libosvar <jlibosva>
Status:	CLOSED CURRENTRELEASE	QA Contact:	GenadiC <gcheresh>
Severity:	high	Docs Contact:
Priority:	medium
Version:	16.1 (Train)	CC:	apevec, awels, dalvarez, dcbw, eduen, gcheresh, itbrown, jlibosva, lhh, ltomasbo, majopela, njohnston, nunnatsa, nusiddiq, oblaut, racedoro, scohen, spower, tsmetana, wking
Target Milestone:	z2	Keywords:	AutomationBlocker, TestBlockerForLayeredProduct, TestOnly, Triaged
Target Release:	16.1 (Train on RHEL 8.2)	Flags:	dmellado: needinfo-
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2022-05-16 14:48:13 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1858878
Bug Blocks:

Description Maysa Macedo 2020-06-02 09:51:47 UTC

Description of problem:

Etcd leader change is happening more constantly causing the Network Policy tests
to fail. The tests failed in different points, but with the following errors:

should enforce multiple, stacked policies with overlapping podSelectors [Feature:NetworkPolicy-10] [BeforeEach]
    /home/stack/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/network/network_policy.go:488

    Jun  1 22:12:08.856: Pod did not finish as expected.
    Unexpected error:
        <*errors.StatusError | 0xc0013c4c80>: {
            ErrStatus: {
                TypeMeta: {Kind: "", APIVersion: ""},
                ListMeta: {
                    SelfLink: "",
                    ResourceVersion: "",
                    Continue: "",
                    RemainingItemCount: nil,
                },
                Status: "Failure",
                Message: "rpc error: code = Unavailable desc = etcdserver: leader changed",
                Reason: "",
                Details: nil,
                Code: 500,
            },
        }
        rpc error: code = Unavailable desc = etcdserver: leader changed
    occurred

 should enforce policy to allow traffic only from a pod in a different namespace based on PodSelector and NamespaceSelector [Feature:NetworkPolicy-08] [BeforeEach]
    /home/stack/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/network/network_policy.go:382

    Jun  1 22:16:30.619: Pod did not finish as expected.
    Unexpected error:
        <*url.Error | 0xc002f52360>: {
            Op: "Get",
            URL: "https://api.ostest.shiftstack.com:6443/api/v1/namespaces/network-policy-7642/pods/client-can-connect-80-4gp4f",
            Err: {s: "EOF"},
        }
        Get https://api.ostest.shiftstack.com:6443/api/v1/namespaces/network-policy-7642/pods/client-can-connect-80-4gp4f: EOF
    occurred

The list of failed tests is:

[Fail] [sig-network] NetworkPolicy [LinuxOnly] NetworkPolicy between server and client [BeforeEach] should allow ingress access on one named port [Feature:NetworkPolicy-12] 
/home/stack/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/network/network_policy.go:62

[Fail] [sig-network] NetworkPolicy [LinuxOnly] NetworkPolicy between server and client [BeforeEach] should enforce policy based on NamespaceSelector with MatchExpressions[Feature:NetworkPolicy-05] 
/home/stack/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/network/network_policy.go:62

[Fail] [sig-network] NetworkPolicy [LinuxOnly] NetworkPolicy between server and client [BeforeEach] should support a 'default-deny' policy [Feature:NetworkPolicy-01] 
/home/stack/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/network/network_policy.go:62

[Fail] [sig-network] NetworkPolicy [LinuxOnly] [BeforeEach] NetworkPolicy between server and client should allow egress access to server in CIDR block [Feature:NetworkPolicy-22] 
/home/stack/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:210

[Fail] [sig-network] NetworkPolicy [LinuxOnly] NetworkPolicy between server and client [BeforeEach] should enforce policy to allow traffic from pods within server namespace based on PodSelector [Feature:NetworkPolicy-02] 
/home/stack/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/network/network_policy.go:62

[Fail] [sig-network] NetworkPolicy [LinuxOnly] NetworkPolicy between server and client [It] should allow ingress access from updated pod [Feature:NetworkPolicy-17] 
/home/stack/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/network/network_policy.go:1427

[Fail] [sig-network] NetworkPolicy [LinuxOnly] NetworkPolicy between server and client [BeforeEach] should enforce multiple, stacked policies with overlapping podSelectors [Feature:NetworkPolicy-10] 
/home/stack/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/network/network_policy.go:1427

[Fail] [sig-network] NetworkPolicy [LinuxOnly] NetworkPolicy between server and client [BeforeEach] should enforce egress policy allowing traffic to a server in a different namespace based on PodSelector and NamespaceSelector [Feature:Net
workPolicy-18] 
/home/stack/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/network/network_policy.go:62

[Fail] [sig-network] NetworkPolicy [LinuxOnly] NetworkPolicy between server and client [BeforeEach] should enforce policy to allow traffic only from a pod in a different namespace based on PodSelector and NamespaceSelector [Feature:Networ
kPolicy-08] 
/home/stack/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/network/network_policy.go:1427

[Fail] [sig-network] NetworkPolicy [LinuxOnly] NetworkPolicy between server and client [BeforeEach] should support allow-all policy [Feature:NetworkPolicy-11] 
/home/stack/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/network/network_policy.go:62

Ran 23 of 4843 Specs in 8205.748 seconds
FAIL! -- 13 Passed | 10 Failed | 0 Pending | 4820 Skipped

Version-Release number of selected component (if applicable):
Red Hat OpenStack Platform release 16.0.2 (Train)
4.3.0-0.nightly-2020-06-01-043839
Octavia Amphoras + Ceph + OVN are used.

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Tomas Smetana 2020-06-03 13:53:00 UTC

*** Bug 1843053 has been marked as a duplicate of this bug. ***

Comment 2 Tomas Smetana 2020-06-03 13:53:08 UTC

*** Bug 1843061 has been marked as a duplicate of this bug. ***

Comment 3 Tomas Smetana 2020-06-03 13:53:15 UTC

*** Bug 1843062 has been marked as a duplicate of this bug. ***

Comment 4 Tomas Smetana 2020-06-03 13:53:23 UTC

*** Bug 1843063 has been marked as a duplicate of this bug. ***

Comment 5 Tomas Smetana 2020-06-03 13:53:30 UTC

*** Bug 1843066 has been marked as a duplicate of this bug. ***

Comment 6 Tomas Smetana 2020-06-03 13:53:41 UTC

*** Bug 1843067 has been marked as a duplicate of this bug. ***

Comment 7 Tomas Smetana 2020-06-03 13:53:49 UTC

*** Bug 1843068 has been marked as a duplicate of this bug. ***

Comment 8 Tomas Smetana 2020-06-03 13:53:54 UTC

*** Bug 1843069 has been marked as a duplicate of this bug. ***

Comment 9 Tomas Smetana 2020-06-03 13:54:02 UTC

*** Bug 1843070 has been marked as a duplicate of this bug. ***

Comment 10 Tomas Smetana 2020-06-03 13:54:09 UTC

*** Bug 1843071 has been marked as a duplicate of this bug. ***

Comment 11 Tomas Smetana 2020-06-03 13:56:35 UTC

We've had each of the failed tests in separate bugzilla. Marking them as duplicate since they all seem to have the same root cause and this is much saner way to track the fix.

Comment 13 Sam Batschelet 2020-06-04 13:45:56 UTC

We are open to assisting but we need must-gather or links to CI runs to investigate.

Few things before we begin that.

- Ceph: etcd requires fast disks in order to properly facilitate its serial workloads(fsync), Ceph has generally not been a good match for etcd because the actual storage layer is generally not SSD. We actually document that now explicitly [1].

> Message: "rpc error: code = Unavailable desc = etcdserver: leader changed",

As leader elections can be the direct result of poor disk I/O I would like to direct all focus on storage.

[1] https://github.com/openshift/openshift-docs/pull/20939/files

Comment 14 Tomas Smetana 2020-06-04 14:57:19 UTC

*** Bug 1843802 has been marked as a duplicate of this bug. ***

Comment 15 Itzik Brown 2020-06-04 16:32:00 UTC

I'm going to run the following metrics after running the tests again:
histogram_quantile(0.99, rate(etcd_disk_backend_commit_duration_seconds_bucket[5m]))
histogram_quantile(0.99, rate(etcd_network_peer_round_trip_time_seconds_bucket[5m]))
histogram_quantile(0.99, rate(etcd_disk_wal_fsync_duration_seconds_bucket[5m]))
max(etcd_server_leader_changes_seen_total)
histogram_quantile(0.99, rate(etcd_disk_backend_commit_duration_seconds_bucket[5m]))

This will be with a setup of:
OSP13 + OVN + OCP4.3 without Ceph

Comment 16 Sam Batschelet 2020-06-04 16:43:25 UTC

This sounds good but the bug was against a cluster using Ceph right?

Comment 17 Maysa Macedo 2020-06-05 07:28:09 UTC

Yes, it was against a cluster using Ceph. We'll spin a cluster with those config and provide the needed info.

Comment 19 Sam Batschelet 2020-06-08 14:46:49 UTC

From initial review of leader election changes, this is a performance issue as expected. Because I could not immediately isolate the issue with isolated screenshots of queries I have asked for a full prom db dump for further review.

Comment 20 Dan Williams 2020-06-15 14:44:35 UTC

@Itzik / @Maysa is the issue here that:

1) etcd is seeing slow storage and thus triggering leader elections
2) etcd storage is using Ceph, a network-based storage provider
3) OpenShift networking for the cluster is provided by Kuryr on OpenStack
4) OpenStack networking itself is using the ml2/ovn network plugin

Therefore, the current thought is that ml2/ovn is not providing sufficent network performance to support etcd's latency requirements? Is that correct?

Comment 21 Maysa Macedo 2020-06-15 15:36:38 UTC

Hi Dan,

Yes for all points, with exception of 2. Ceph is present on the cluster, but is not backing up the etcd storage.

While running the Network Policy tests every now and then we saw the following error (read tcp 10.196.1.147:47230->10.196.2.105:2380: i/o timeout). This lead us to believe that under certain load the connection can timeout.
The traffic is currently being allowed between nodes on that port by creating a security group rule with the port range(2379-2380, which seem to be not always enforced):

direction='ingress', ethertype='IPv4', id='a0d98acc-8065-41d9-9686-d12265a3ed9c', port_range_max='2380', port_range_min='2379', protocol='tcp', remote_ip_prefix='10.196.0.0/16'

However, once security group rules are created for each etcd port(2379 and 2380) the issue stopped happening:

direction='ingress', ethertype='IPv4', id='a0d98acc-8065-41d9-9686-d12265a3ed9c', port_range_max='2379', port_range_min='2379', protocol='tcp', remote_ip_prefix='10.196.0.0/16'
direction='ingress', ethertype='IPv4', id='27301a08-2ca8-45d8-aa46-4261440be72c', port_range_max='2380', port_range_min='2380', protocol='tcp', remote_ip_prefix='10.196.0.0/16'

So, this seems to be an issue more related to OVN rather than to the use on Ceph.

Comment 22 Sam Batschelet 2020-06-16 14:32:55 UTC

> So, this seems to be an issue more related to OVN rather than to the use on Ceph.

moving to OVN for review.

Comment 24 Daniel Alvarez Sanchez 2020-06-22 15:41:19 UTC

Changed the BZ title to reflect the OVN issue. The issue has been detected in ShiftOnStack, specifically on the etcd ports causing the leader to change and Network Policy tests to fail.

The existing testing coverage for port range filtering is passing so it's not something 100% reproducible and still requires RCA.

Comment 26 Luis Tomas Bolivar 2020-06-23 08:02:52 UTC

Some more info that could be useful to narrow down where the problem may be:
- When the problem appears, restarting ovn-controller does not help, the problem is still there
- We have seen this both in OSP13 and OSP16 environments, but only with OVN backend. We have not seen the issue with ml2/ovs
- First time we saw this problem (on the etcd side) was a month ago (May 19th). It took us some time until we realised it was the SG range not being always enforced, and the bugzilla was moved across different groups (kuryr, etcd, and now OVN).

Comment 28 Jakub Libosvar 2020-07-15 10:01:09 UTC

We finally found the root cause. The currently used OVN version recalculates conjunction ids on changes like port groups or port bindings. Port ranges in ACLs are implemented using conjuncions, meaning that these rules change their conjunction ID even when change is not related to the given ACL. It causes a little network disruption in the data plane that triggers leader election in etcd cluster.

I tested newer OVN version - ovn2.13-20.06.1-2.el8fdp and we're no longer able to reproduce the hiccup with the minimal reproducer we had. Now we're running the full OCP tests that found the issue. Worth to note this OVN version is not tested with OSP yet, so we may hit some regressions.

Comment 35 Jakub Libosvar 2020-10-01 07:56:37 UTC

This is fixed in ovn2.13-20.06.2-11 which should be part of the current compose. Thus moving to ON_QA to test it.

Comment 38 Itzik Brown 2020-10-29 09:12:56 UTC

Ran NP tests with OCP4.5 and OCP4.6 one time with the Kuryr W/A and one time without (Using Maysa's release image).
NP Tests were ran using the same seed. 
Also ran OCP4.6 tempest and NP tests without using the see option.

Versions
OSP16 - RHOS-16.1-RHEL-8-20201007.n.0
OCP 4.5.0-0.nightly-2020-10-25-174204
4.6.0-0.nightly-2020-10-22-034051

Comment 39 Itzik Brown 2020-10-29 09:35:09 UTC

For future reference we used seed 1594215440. The Ginkgo seed option ensures we are running the tests in the same order each time.
We used the mentioned seed because it's the one we saw most of the issues caused by this bug.