Description of problem: Using a load balancer (or nodeport) type service with the new openshift-sdn hack `local-with-fallback` annotation breaks source IP preservation for Ingress traffic originating from pods in the cluster. ie from a pod with podIP `10.128.0.11`, without the `local-with-fallback` annotation, curling a simple echo pod exposed via a route echos back `x-forward-for:10.128.0.11`. With the `local-with-fallback` annotation enabled, the following header is observed: `x-forwarded-for:10.128.0.1`. The last octet of the podIP seems to always be changed to `1` incorrectly. Version-Release number of selected component (if applicable): OCP 4.8 only How reproducible: 100% Steps to Reproduce: 1. oc new-project test 2. oc apply -f test/extended/testdata/router/router-http-echo-server.yaml 3. oc delete route router-http-echo 4. oc expose service router-http-echo (do this to get a route under default ingress) 5. rsh into any pod in cluster 6. curl route host name from 4) 7. observe incorrect podIP echoed back To observe behavior without the annotation: 1. remove `local-with-fallback` annotation from default Ingress (set localWithFallback: "false" in unsupportedConfigOverrides on the default ingresscontroller) 2. curl echo pod from any cluster pod 3. observe correct x-forward-for pod IP This is related to https://bugzilla.redhat.com/show_bug.cgi?id=1871939#c14. I have reproduced on GCP, but im assuming all platforms would be affected.
Some CI runs hitting this issue, if it helps at all https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_installer/4994/pull-ci-openshift-installer-master-e2e-azure/1404292225227034624 https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/26226/pull-ci-openshift-origin-master-e2e-gcp/1404491270570643457
This is showing up quite a bit in CI. Sippy shows out of 356 runs, it happened 46 times.
Verified this bug on 4.9.0-0.nightly-2021-06-22-005403 $ oc exec hello-2sbd6 -- curl router-http-echo-default.apps.ci-ln-ymcf7w2-002ac.ci.azure.devcluster.openshift.com % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 455 0 455 0 0 11167 0 --:--:-- --:--:-- --:--:-- 13382 GET / HTTP/1.1 user-agent: curl/7.52.1 accept: */* host: router-http-echo-default.apps.ci-ln-ymcf7w2-002ac.ci.azure.devcluster.openshift.com x-forwarded-host: router-http-echo-default.apps.ci-ln-ymcf7w2-002ac.ci.azure.devcluster.openshift.com x-forwarded-port: 80 x-forwarded-proto: http forwarded: for=10.131.0.20;host=router-http-echo-default.apps.ci-ln-ymcf7w2-002ac.ci.azure.devcluster.openshift.com;proto=http x-forwarded-for: 10.131.0.20 [zzhao@dhcp-0-105 ~]$ oc exec hello-2sbd6 -- ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 3: eth0@if25: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue state UP link/ether 0a:58:0a:83:00:14 brd ff:ff:ff:ff:ff:ff inet 10.131.0.20/23 brd 10.131.1.255 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::2c66:aff:fe21:c7a/64 scope link valid_lft forever preferred_lft forever
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759