Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1871939

Summary:	[sig-network][Feature:Router] The HAProxy router should set Forwarded headers appropriately
Product:	OpenShift Container Platform	Reporter:	Antonio Ojea <aojeagar>
Component:	Networking	Assignee:	Stephen Greene <sgreene>
Networking sub component:	router	QA Contact:	Hongan Li <hongli>
Status:	CLOSED DUPLICATE	Docs Contact:
Severity:	high
Priority:	low	CC:	aos-bugs, bperkins, ccoleman, jchaloup, mfisher, mmasters, wking
Version:	4.6
Target Milestone:	---
Target Release:	4.9.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:	[sig-network][Feature:Router] The HAProxy router should set Forwarded headers appropriately
Last Closed:	2021-07-01 16:53:11 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Antonio Ojea 2020-08-24 16:19:54 UTC

test:
[sig-network][Feature:Router] The HAProxy router should set Forwarded headers appropriately 

is failing frequently in CI, see search results:
https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=%5C%5Bsig-network%5C%5D%5C%5BFeature%3ARouter%5C%5D+The+HAProxy+router+should+set+Forwarded+headers+appropriately


Failure example
[sig-network][Feature:Router] The HAProxy router should expose prometheus metrics for a route [Suite:openshift/conformance/parallel] expand_less	53s
fail [github.com/openshift/origin@/test/extended/router/metrics.go:219]: Expected
    <float64>: 1095
to be >=
    <float64>: 1767

Comment 7 Stephen Greene 2020-10-01 15:22:26 UTC

I’m adding UpcomingSprint, because I was occupied by fixing bugs with higher
priority/severity, developing new features with higher priority, or developing
new features to improve stability at a macro level. I will revisit this bug
next sprint.

Comment 8 Jan Chaloupka 2020-10-12 14:43:37 UTC

Occuring in 4.5 as well: https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=4.5&maxMatches=5&maxBytes=20971520&groupBy=job&search=%5C%5Bsig-network%5C%5D%5C%5BFeature%3ARouter%5C%5D+The+HAProxy+router+should+set+Forwarded+headers+appropriately

Don't see the original error message for 4.7 so posting one for 4.5:

```
fail [github.com/openshift/origin/test/extended/router/headers.go:102]: Unexpected error:
    <*errors.errorString | 0xc0028c8f30>: {
        s: "host command failed: error running /usr/bin/kubectl --server=https://api.ci-op-jht456ms-daaf9.origin-ci-int-gce.dev.openshift.com:6443 --kubeconfig=/var/run/secrets/ci.openshift.io/multi-stage/kubeconfig exec --namespace=e2e-test-router-headers-7cn7d execpod -- /bin/sh -x -c \n\t\tset -e\n\t\tSTOP=$(($(date '+%s') + 180))\n\t\twhile [ $(date '+%s') -lt $STOP ]; do\n\t\t\trc=0\n\t\t\tcode=$( curl -k -s -m 5 -o /dev/null -w '%{http_code}\\n' --header 'Host: 172.30.125.224' \"http://172.30.125.224:1936/healthz\" ) || rc=$?\n\t\t\tif [[ \"${rc:-0}\" -eq 0 ]]; then\n\t\t\t\techo $code\n\t\t\t\tif [[ $code -eq 200 ]]; then\n\t\t\t\t\texit 0\n\t\t\t\tfi\n\t\t\t\tif [[ $code -ne 503 ]]; then\n\t\t\t\t\texit 1\n\t\t\t\tfi\n\t\t\telse\n\t\t\t\techo \"error ${rc}\" 1>&2\n\t\t\tfi\n\t\t\tsleep 1\n\t\tdone\n\t\t:\nCommand stdout:\n\nstderr:\nUnable to connect to the server: dial tcp 35.196.249.78:6443: i/o timeout\n\nerror:\nexit status 1\n",
    }
    host command failed: error running /usr/bin/kubectl --server=https://api.ci-op-jht456ms-daaf9.origin-ci-int-gce.dev.openshift.com:6443 --kubeconfig=/var/run/secrets/ci.openshift.io/multi-stage/kubeconfig exec --namespace=e2e-test-router-headers-7cn7d execpod -- /bin/sh -x -c 
    		set -e
    		STOP=$(($(date '+%s') + 180))
    		while [ $(date '+%s') -lt $STOP ]; do
    			rc=0
    			code=$( curl -k -s -m 5 -o /dev/null -w '%{http_code}\n' --header 'Host: 172.30.125.224' "http://172.30.125.224:1936/healthz" ) || rc=$?
    			if [[ "${rc:-0}" -eq 0 ]]; then
    				echo $code
    				if [[ $code -eq 200 ]]; then
    					exit 0
    				fi
    				if [[ $code -ne 503 ]]; then
    					exit 1
    				fi
    			else
    				echo "error ${rc}" 1>&2
    			fi
    			sleep 1
    		done
    		:
    Command stdout:
    
    stderr:
    Unable to connect to the server: dial tcp 35.196.249.78:6443: i/o timeout
    
    error:
    exit status 1
    
occurred
```

Feel free to ignore it if it's not relevant.

Comment 12 Clayton Coleman 2021-01-28 18:03:22 UTC

The original two causes of this failure no longer happen.  The new occurrence is a different issue than what is described in the most recent comment, that is addressed by https://bugzilla.redhat.com/show_bug.cgi?id=1921857

Comment 14 Clayton Coleman 2021-06-14 14:37:15 UTC

~50% of GCP runs this is failing (also looks like a lot of azure runs, maybe 25-30%).

Is this test assuming something that only works on AWS?

Raising to high because this is blocking a lot of CI runs and is happening in 4.8

Comment 15 Stephen Greene 2021-06-14 19:53:25 UTC

(In reply to Clayton Coleman from comment #14)
> ~50% of GCP runs this is failing (also looks like a lot of azure runs, maybe
> 25-30%).
> 
> Is this test assuming something that only works on AWS?
> 
> Raising to high because this is blocking a lot of CI runs and is happening
> in 4.8

Recent spike in failures is being tracked via https://bugzilla.redhat.com/show_bug.cgi?id=1971808

Comment 16 Miciah Dashiel Butler Masters 2021-07-01 16:53:11 UTC

Some specific instances of this issue have been resolved (bug 1971808, bug 1921857), and the general issue is being tracked in bug 1802311, so I am marking this report as a duplicate of bug 1802311.

*** This bug has been marked as a duplicate of bug 1802311 ***