Description of problem: The attached YAML file (includes namespace, svc, StatefulSet, and NetworkPolicy) results in 3 Pods that cannot always communicate with each other, even though the NetworkPolicy would not apply to the Pods. If you add Ingress under policyTypes and an empty Ingress rules section in the NetworkPolicy then the issue DOES NOT occur. $ oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES web-0 1/1 Running 0 31s 10.131.1.31 worker-1.jocolema4.lab.upshift.rdu2.redhat.com <none> <none> web-1 1/1 Running 0 25s 10.128.2.27 worker-0.jocolema4.lab.upshift.rdu2.redhat.com <none> <none> web-2 1/1 Running 0 8s 10.131.1.32 worker-1.jocolema4.lab.upshift.rdu2.redhat.com <none> <none> TEST 1: Curl to self succeeds (I expected an HTTP 404 response): $ oc exec -it web-0 -- timeout 5 curl -s -o /dev/null -w "%{http_code}\n" web-0.httpd:8080/asdf 404 TEST 2: Curl from web-0 -> web-1 (it times out): $ oc exec -it web-0 -- timeout 5 curl -s -o /dev/null -w "%{http_code}\n" web-1.httpd:8080/asdf command terminated with exit code 124 TEST 3: Curl from web-0 -> web-2 (WORKS): $ oc exec -it web-0 -- timeout 5 curl -s -o /dev/null -w "%{http_code}\n" web-2.httpd:8080/asdf 404 TEST 4: Deleting all NetworkPolicies: $ oc delete networkpolicy --all $ oc exec -it web-0 -- timeout 5 curl -s -o /dev/null -w "%{http_code}\n" web-0.httpd:8080/asdf 404 $ oc exec -it web-0 -- timeout 5 curl -s -o /dev/null -w "%{http_code}\n" web-1.httpd:8080/asdf 404 $ oc exec -it web-0 -- timeout 5 curl -s -o /dev/null -w "%{http_code}\n" web-2.httpd:8080/asdf 404 Additionally, the SDN logs is showing this repeated which I believe is related: I1209 01:24:01.738089 4106341 pod.go:508] CNI_ADD hsts/web-0 got IP 10.131.1.97, ofport 1875 I1209 01:24:01.763042 4106341 ovs.go:158] Error executing ovs-ofctl: ovs-ofctl: -:2: 0/0: invalid IP address I1209 01:24:02.282660 4106341 ovs.go:158] Error executing ovs-ofctl: ovs-ofctl: -:2: 0/0: invalid IP address I1209 01:24:02.929075 4106341 ovs.go:158] Error executing ovs-ofctl: ovs-ofctl: -:2: 0/0: invalid IP address I1209 01:24:03.730945 4106341 ovs.go:158] Error executing ovs-ofctl: ovs-ofctl: -:2: 0/0: invalid IP address I1209 01:24:04.736159 4106341 ovs.go:158] Error executing ovs-ofctl: ovs-ofctl: -:2: 0/0: invalid IP address I1209 01:24:05.977063 4106341 ovs.go:158] Error executing ovs-ofctl: ovs-ofctl: -:2: 0/0: invalid IP address I1209 01:24:07.524868 4106341 ovs.go:158] Error executing ovs-ofctl: ovs-ofctl: -:2: 0/0: invalid IP address I1209 01:24:09.454298 4106341 ovs.go:158] Error executing ovs-ofctl: ovs-ofctl: -:2: 0/0: invalid IP address I1209 01:24:11.861189 4106341 ovs.go:158] Error executing ovs-ofctl: ovs-ofctl: -:2: 0/0: invalid IP address I1209 01:24:14.867488 4106341 ovs.go:158] Error executing ovs-ofctl: ovs-ofctl: -:2: 0/0: invalid IP address E1209 01:24:14.867721 4106341 networkpolicy.go:311] Error syncing OVS flows for VNID: timed out waiting for the condition Version-Release number of selected component (if applicable): 4.6.4 How reproducible: Always with attached YAML contents.
> Additionally, the SDN logs is showing this repeated which I believe is related: > I1209 01:24:01.738089 4106341 pod.go:508] CNI_ADD hsts/web-0 got IP 10.131.1.97, ofport 1875 To clarify the logs vs oc get pods output in my last comment, the logs always show the IP Address of the Pod. The log I shared above was just saved to a clipboard.
Created attachment 1737777 [details] YAML containing reproducer project
I have been able to reproduce the error logs with the attached YAML, and have posted a PR. Even with the error logs, I did not see a connectivity issue between any of the pods in my local setup with 4.6.4. I consistently got a 404 error (expected) when doing a curl from web-0 pod to the other pods. It's possible, that the error caused by attempting to program a flow with an empty IP address may have prevented some other flows from being installed under certain conditions. I'll verify with some other team members.
> I did not see a connectivity issue between any of the pods in my local setup with 4.6.4. Apologies for the late reply, but please check the Pods are running on different nodes when you attempt to reproduce problem.
checked this issue on 4.7.0-0.nightly-2021-01-07-181010 the `invalid IP address` logs not found now. but when create the networkpolicy with type is egress. then pod cannot be accessed each other. steps: 1. create namespace z3 and create test pods oc get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES test-rc-7bx7s 1/1 Running 0 15m 10.129.3.3 zzhao108-zk7kr-compute-1 <none> <none> test-rc-pktzp 1/1 Running 0 15m 10.129.3.2 zzhao108-zk7kr-compute-1 <none> <none> 2. access pod from one to another, both two can work can return 'Hello OpenShift!" $ oc exec test-rc-7bx7s -- curl 10.129.3.3:8080 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 17 100 17 0 0 17000 0 --:--:-- --:--:-- --:--:-- 17000 Hello OpenShift! $ oc exec test-rc-7bx7s -- curl 10.129.3.2:8080 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0Hello OpenShift! 100 17 100 17 0 0 17000 0 --:--:-- --:--:-- --:--:-- 17000 3. Create the networkpolicy with egress type with not match any pods apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: bad-np spec: egress: - {} podSelector: matchLabels: never-gonna: match policyTypes: - Egress 4. access again. this time pod1 cannot access pod2 $ oc exec test-rc-7bx7s -- curl --connect-timeout 4 10.129.3.2:8080 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- 0:00:04 --:--:-- 0 curl: (28) Connection timed out after 4001 milliseconds command terminated with exit code 28 $ oc get netnamespace z3 NAME NETID EGRESS IPS z3 4671645
Created attachment 1745586 [details] sdn ovs openflow
Hi Zhanqi, Can you please attach the yaml files that you used to recreate this? I really like the "Hello Openshift" server you have. Thanks
> Additionally, the SDN logs is showing this repeated which I believe is related: > > I1209 01:24:01.763042 4106341 ovs.go:158] Error executing ovs-ofctl: ovs-ofctl: -:2: 0/0: invalid IP address yes, there was a bug recently introduced in the NetworkPolicy code. None of the other specifics here are relevant; the buggy NetworkPolicy code would result in bad rules regardless of what you were doing. *** This bug has been marked as a duplicate of bug 1914284 ***
sorry, no there's another bug here
Verified this bug on 4.7.0-0.nightly-2021-01-14-211319
@zzhao has FailedQA flag set. Perhaps you could set(clear) this flag when you get a chance? Thanks!
ah, I clear the FailedQA flag. thanks.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633