1971808 – New `local-with-fallback` service annotation does not preserve source IP

Bug 1971808 - New `local-with-fallback` service annotation does not preserve source IP

Summary: New `local-with-fallback` service annotation does not preserve source IP

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.9.0
Assignee:	Dan Winship
QA Contact:	zhaozhanqi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1972864
TreeView+	depends on / blocked

Reported:	2021-06-14 19:40 UTC by Stephen Greene
Modified:	2021-10-18 17:34 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Clones:	1972864 (view as bug list)
Environment:	[sig-network][Feature:Router] The HAProxy router should set Forwarded headers appropriately
Last Closed:	2021-10-18 17:34:12 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift sdn pull 320	0	None	Merged	Bug 1971808: fix local-with-fallback	2022-04-19 13:54:50 UTC
Red Hat Product Errata	RHSA-2021:3759	0	None	None	None	2021-10-18 17:34:38 UTC

Description Stephen Greene 2021-06-14 19:40:35 UTC

Description of problem:

Using a load balancer (or nodeport) type service with the new openshift-sdn hack `local-with-fallback` annotation breaks source IP preservation for Ingress traffic originating from pods in the cluster.

ie from a pod with podIP `10.128.0.11`, without the `local-with-fallback` annotation, curling a simple echo pod exposed via a route echos back `x-forward-for:10.128.0.11`.

With the `local-with-fallback` annotation enabled, the following header is observed:
`x-forwarded-for:10.128.0.1`. The last octet of the podIP seems to always be changed to `1` incorrectly.


Version-Release number of selected component (if applicable):
OCP 4.8 only

How reproducible:
100%

Steps to Reproduce:
1. oc new-project test
2. oc apply -f test/extended/testdata/router/router-http-echo-server.yaml
3. oc delete route router-http-echo
4. oc expose service router-http-echo (do this to get a route under default ingress)
5. rsh into any pod in cluster
6. curl route host name from 4)
7. observe incorrect podIP echoed back

To observe behavior without the annotation:
1. remove `local-with-fallback` annotation from default Ingress (set localWithFallback: "false" in unsupportedConfigOverrides on the default ingresscontroller)
2. curl echo pod from any cluster pod
3. observe correct x-forward-for pod IP


This is related to https://bugzilla.redhat.com/show_bug.cgi?id=1871939#c14.
I have reproduced on GCP, but im assuming all platforms would be affected.

Comment 1 Stephen Greene 2021-06-14 19:44:18 UTC

Some CI runs hitting this issue, if it helps at all

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_installer/4994/pull-ci-openshift-installer-master-e2e-azure/1404292225227034624

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/26226/pull-ci-openshift-origin-master-e2e-gcp/1404491270570643457

Comment 3 Stephen Benjamin 2021-06-16 14:49:47 UTC

This is showing up quite a bit in CI. Sippy shows out of 356 runs, it happened 46 times.

Comment 5 zhaozhanqi 2021-06-22 04:21:15 UTC

Verified this bug on 4.9.0-0.nightly-2021-06-22-005403


$ oc exec hello-2sbd6 -- curl router-http-echo-default.apps.ci-ln-ymcf7w2-002ac.ci.azure.devcluster.openshift.com
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   455    0   455    0     0  11167      0 --:--:-- --:--:-- --:--:-- 13382
GET / HTTP/1.1
user-agent: curl/7.52.1
accept: */*
host: router-http-echo-default.apps.ci-ln-ymcf7w2-002ac.ci.azure.devcluster.openshift.com
x-forwarded-host: router-http-echo-default.apps.ci-ln-ymcf7w2-002ac.ci.azure.devcluster.openshift.com
x-forwarded-port: 80
x-forwarded-proto: http
forwarded: for=10.131.0.20;host=router-http-echo-default.apps.ci-ln-ymcf7w2-002ac.ci.azure.devcluster.openshift.com;proto=http
x-forwarded-for: 10.131.0.20

[zzhao@dhcp-0-105 ~]$ oc exec hello-2sbd6 -- ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
3: eth0@if25: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue state UP 
    link/ether 0a:58:0a:83:00:14 brd ff:ff:ff:ff:ff:ff
    inet 10.131.0.20/23 brd 10.131.1.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::2c66:aff:fe21:c7a/64 scope link 
       valid_lft forever preferred_lft forever

Comment 8 errata-xmlrpc 2021-10-18 17:34:12 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759

Note You need to log in before you can comment on or make changes to this bug.