Bug 1845740

Summary: OVN: NetworkPolicy between server and client should enforce policy based on PodSelector with MatchExpressions:
Product: OpenShift Container Platform Reporter: W. Trevor King <wking>
Component: NetworkingAssignee: Daniel Mellado <dmellado>
Networking sub component: ovn-kubernetes QA Contact: zhaozhanqi <zzhao>
Status: CLOSED DUPLICATE Docs Contact:
Severity: high    
Priority: unspecified CC: aconstan
Version: 4.5   
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
[sig-network] NetworkPolicy [LinuxOnly] NetworkPolicy between server and client should enforce policy based on PodSelector with MatchExpressions[Feature:NetworkPolicy]
Last Closed: 2020-06-18 12:50:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description W. Trevor King 2020-06-09 23:10:11 UTC
test:
[sig-network] NetworkPolicy [LinuxOnly] NetworkPolicy between server and client should enforce policy based on PodSelector with MatchExpressions[Feature:NetworkPolicy] 

is failing frequently in CI, see search results:
https://search.svc.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=%5C%5Bsig-network%5C%5D+NetworkPolicy+%5C%5BLinuxOnly%5C%5D+NetworkPolicy+between+server+and+client+should+enforce+policy+based+on+PodSelector+with+MatchExpressions%5C%5BFeature%3ANetworkPolicy%5C%5D

Stats from the past 24h:

$ w3m -dump -cols 200 'https://search.svc.ci.openshift.org/?maxAge=24h&type=junit&name=release-openshift-&search=%5C%5Bsig-network%5C%5D+NetworkPolicy+%5C%5BLinuxOnly%5C%5D+NetworkPolicy+between+server+and+client+should+enforce+policy+based+on+PodSelector+with+MatchExpressions%5C%5BFeature%3ANetworkPolicy%5C%5D' | grep 'failures match'
release-openshift-origin-installer-e2e-aws-ovn-network-stress-4.5 - 3 runs, 100% failed, 67% of failures match
release-openshift-ocp-installer-e2e-aws-ovn-4.5 - 6 runs, 100% failed, 33% of failures match
release-openshift-ocp-installer-e2e-openstack-4.3 - 1 runs, 100% failed, 100% of failures match

Picking one of the jobs from that query, [1] is 4.5.0-0.nightly-2020-06-09-030606 and has:

  fail [k8s.io/kubernetes/test/e2e/network/network_policy.go:65]: Unexpected error:
    <*errors.errorString | 0xc0001e09a0>: {
        s: "timed out waiting for the condition",
    }
    timed out waiting for the condition
  occurred

among many, many other issues going on in that cluster.  Skimming through the build log turned up:

Jun 09 03:46:20.934 I ns/openshift-kube-apiserver-operator deployment/kube-apiserver-operator reason/OperatorStatusChanged Status for clusteroperator/kube-apiserver changed: Degraded message changed from "NodeControllerDegraded: All master nodes are ready\nStaticPodsDegraded: pod/kube-apiserver-ip-10-0-150-247.us-east-2.compute.internal container \"kube-apiserver\" is not ready: PodInitializing: \nStaticPodsDegraded: pod/kube-apiserver-ip-10-0-150-247.us-east-2.compute.internal container \"kube-apiserver-cert-regeneration-controller\" is not ready: PodInitializing: \nStaticPodsDegraded: pod/kube-apiserver-ip-10-0-150-247.us-east-2.compute.internal container \"kube-apiserver-cert-syncer\" is not ready: PodInitializing: \nStaticPodsDegraded: pod/kube-apiserver-ip-10-0-150-247.us-east-2.compute.internal container \"kube-apiserver-insecure-readyz\" is not ready: PodInitializing: " to "NodeControllerDegraded: All master nodes are ready\nStaticPodsDegraded: pod/kube-apiserver-ip-10-0-150-247.us-east-2.compute.internal container \"kube-apiserver\" is not ready: unknown reason"

I dunno what the root cause is.  There are quite a few other bugs that mention this test.  For example, bug 1842876 triggered by etcd leader elections (which we don't have in the build-log from [1]).  And bug 1823460 had "Error getting container logs: the server could not find the requested resource" as the error condition.  But I couldn't find a bug report that mentioned this test with this error message, so I'm filing this new bug.  Please close as a dup if I'm just missing an existing bug.

[1]: https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-ovn-4.5/1392

Comment 2 Daniel Mellado 2020-06-18 12:50:16 UTC
Hi Trevor, yep, this is indeed a dup of Bug 1845724. At first I thought to keep this as POST but as they share the same root cause, I'll be just closing this one in order to avoid keeping track of both.

Thanks!

*** This bug has been marked as a duplicate of bug 1845724 ***