Description of problem: The Conformance "should set Forwarded headers appropriately" router test is disabled because of some un-diagnosed issue in 4.x. Either fix the test and re-enable it, or propose deleting the test. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
From the original skip comment: // check who can update TemplateInstances. Via Update(), spec updates // should be rejected (with the exception of spec.metadata fields used // by the garbage collector, not tested here). Status updates should be // silently ignored. Via UpdateStatus(), spec updates should be // silently ignored. Status should only be updatable by a user with // update access to that endpoint. In practice this is intended only to // be the templateinstance controller and system:admin.
Moving the test to broken skip tag in test/extended/util/test: https://github.com/openshift/origin/pull/23814
*** Bug 1748764 has been marked as a duplicate of this bug. ***
This test is fine on both GCP and Azure - the routing must be different on AWS. Do we want to enable this test again, but with an explicit skip for AWS?
(In reply to Andrew McDermott from comment #4) > This test is fine on both GCP and Azure - the routing must be different on > AWS. Do we want to enable this test again, but with an explicit skip for AWS? I think for me to have an opinion here I would need a root cause analysis of the failure on AWS.
(In reply to Dan Mace from comment #5) > (In reply to Andrew McDermott from comment #4) > > This test is fine on both GCP and Azure - the routing must be different on > > AWS. Do we want to enable this test again, but with an explicit skip for AWS? > > I think for me to have an opinion here I would need a root cause analysis of > the failure on AWS. TL;DR - on AWS the route to the router/loadbalancer seems to go out of the cluster (and back again) whereas on GCP and Azure it doesn't yielding the origin IP address to be the POD's IP address and that is what the test asserts for. ### ### GCP ### On GCP I see the following: KUBECONFIG=~/amcdermo-gcp-kubeconfig oc get service -n openshift-ingress -o yaml router-default status: loadBalancer: ingress: - ip: 35.223.10.172 # traceroute 35.223.10.172 traceroute to 35.223.10.172 (35.223.10.172), 30 hops max, 60 byte packets 1 172.10.223.35.bc.googleusercontent.com (35.223.10.172) 0.073 ms 0.027 ms 0.019 ms # ip route get 35.223.10.172 local 35.223.10.172 dev lo src 35.223.10.172 uid 0 cache <local> The test runs like this: STEP: making a request and reading back the echoed headers Oct 16 20:53:39.932: INFO: Running '/usr/bin/kubectl --server=https://api.amcdermo.gcp.devcluster.openshift.com:6443 --kubeconfig=/home/ aim/go-projects/openshift-installer/src/github.com/openshift/installer/amcdermo-gcp/auth/kubeconfig exec --namespace=e2e-test-router-hea ders-n7s5n execpod -- /bin/sh -x -c set -e payload=$( curl -s --header 'Host: router-headers.example.com' "http://35.223.10.172" ) || rc=$? if [[ "${rc:-0}" -eq 0 ]]; then printf "${payload}" exit 0 else echo "error ${rc}" 1>&2 exit 1 fi ' I changed the e2e test to sleep forever so that I could rsh to the execpod and run the test manually: $ oc get pods --all-namespaces -o wide |grep test- e2e-test-router-headers-n7s5n execpod 1/1 Running 0 75m 10.131.0.33 amcder-s82x6-w-a-lg9jj.c.openshift-gce-devel.internal <none> <none> e2e-test-router-headers-n7s5n router-http-echo-1-deploy 0/1 Completed 0 75m 10.131.0.34 amcder-s82x6-w-a-lg9jj.c.openshift-gce-devel.internal <none> <none> e2e-test-router-headers-n7s5n router-http-echo-1-l5vq5 1/1 Running 0 75m 10.129.2.13 amcder-s82x6-w-c-zxhbz.c.openshift-gce-devel.internal <none> <none> $ oc rsh -n e2e-test-router-headers-n7s5n execpod / # curl -s --header "Host: router-headers.example.com" http://35.223.10.172 GET / HTTP/1.1 User-Agent: curl/7.61.1 Accept: */* Host: router-headers.example.com X-Forwarded-Host: router-headers.example.com X-Forwarded-Port: 80 X-Forwarded-Proto: http Forwarded: for=10.131.0.33;host=router-headers.example.com;proto=http;proto-version="" X-Forwarded-For: 10.131.0.33 and "10.131.0.33" is the expected result as it matches the POD IP above. ### ### AWS ### And doing this on AWS I see: KUBECONFIG=~/amcdermo-aws-kubeconfig oc get service -n openshift-ingress -o yaml router-default status: loadBalancer: ingress: - hostname: aec9b866ef7ea435397684e196095c95-245284306.us-east-1.elb.amazonaws.com # dig +short aec9b866ef7ea435397684e196095c95-245284306.us-east-1.elb.amazonaws.com 18.213.226.39 # ip r default via 10.0.128.1 dev ens3 default via 10.0.128.1 dev ens3 proto dhcp metric 100 10.0.128.0/20 dev ens3 proto kernel scope link src 10.0.131.234 10.0.128.0/20 dev ens3 proto kernel scope link src 10.0.131.234 metric 100 10.128.0.0/14 dev tun0 scope link 172.30.0.0/16 dev tun0 # route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 10.0.128.1 0.0.0.0 UG 0 0 0 ens3 0.0.0.0 10.0.128.1 0.0.0.0 UG 100 0 0 ens3 10.0.128.0 0.0.0.0 255.255.240.0 U 0 0 0 ens3 10.0.128.0 0.0.0.0 255.255.240.0 U 100 0 0 ens3 10.128.0.0 0.0.0.0 255.252.0.0 U 0 0 0 tun0 172.30.0.0 0.0.0.0 255.255.0.0 U 0 0 0 tun0 # ip r get 18.213.226.39 18.213.226.39 via 10.0.128.1 dev ens3 src 10.0.131.234 uid 0 # traceroute to 18.213.226.39 (18.213.226.39), 30 hops max, 60 byte packets 1 ip-10-0-11-92.ec2.internal (10.0.11.92) 0.701 ms 0.641 ms 0.601 ms 2 216.182.231.106 (216.182.231.106) 13.045 ms 216.182.229.82 (216.182.229.82) 5.698 ms 216.182.226.20 (216.182.226.20) 16.894 ms So access to the router goes outside of the cluster and back in again. Whois 216.182.226.20? NetRange: 216.182.224.0 - 216.182.239.255 CIDR: 216.182.224.0/20 NetName: AMAZON-AES So given that the test effectively runs as: # curl -s --header "Host: router-headers.example.com" "http://aec9b866ef7ea435397684e196095c95-245284306.us-east-1.elb.amazonaws.com it doesn't matter where we run it from, because: # oc rsh netshoot # curl -s --header "Host: router-headers.example.com" "http://aec9b866ef7ea435397684e196095c95-245284306.us-east-1.elb.amazonaws.com" GET / HTTP/1.1 User-Agent: curl/7.65.1 Accept: */* Host: router-headers.example.com X-Forwarded-Host: router-headers.example.com X-Forwarded-Port: 80 X-Forwarded-Proto: http Forwarded: for=35.170.15.97;host=router-headers.example.com;proto=http;proto-version="" X-Forwarded-For: 35.170.15.97 # dig -x 35.170.15.97 ;; ANSWER SECTION: 97.15.170.35.in-addr.arpa. 60 IN PTR ec2-35-170-15-97.compute-1.amazonaws.com. And 35.170.15.97 is an elastic-ip ("amcdermo-bswxd-eip-us-east-1c"). ### ### Conclusion ### From a testing point of view should this test assert that the origin in the X-Forward-For header is actually the PODs IP address or simply that there is an X-Forward-For header (and others) without considering its value? Or is the routing either borked or works as expected on AWS?
Also see: https://github.com/openshift/origin/pull/23688/commits/769f92152edd6e002902d9fc7e0c33960040d0c7
And also see: https://github.com/kubernetes/kubernetes/issues/57250
https://github.com/openshift/origin/pull/24093
OpenStack CI is now failing the HAProxy router header test after https://github.com/openshift/origin/pull/24093 re-enabled it. https://storage.googleapis.com/origin-ci-test/logs/release-openshift-ocp-installer-e2e-openstack-4.3/400/build-log.txt
(In reply to Martin André from comment #11) > OpenStack CI is now failing the HAProxy router header test after > https://github.com/openshift/origin/pull/24093 re-enabled it. > > https://storage.googleapis.com/origin-ci-test/logs/release-openshift-ocp- > installer-e2e-openstack-4.3/400/build-log.txt See also: https://bugzilla.redhat.com/show_bug.cgi?id=1772125
Verified running openshift/conformance/parallel on an AWS cluster: started: (15/1659/2119) "[Conformance][Area:Networking][Feature:Router] The HAProxy router should set Forwarded headers appropriately [Suite:openshift/conformance/parallel/minimal]" passed: (1m3s) 2019-11-16T00:08:49 "[Conformance][Area:Networking][Feature:Router] The HAProxy router should set Forwarded headers appropriately [Suite:openshift/conformance/parallel/minimal]" 4.3.0-0.nightly-2019-11-13-233341
I am extremely concerned that this test is not verifying that source ip is preserved. That's a key feature and behavior of the router (it's why we enabled proxy protocol) and it should work on AWS. We need to find a way to consistently test the feature as part of this e2e test.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0062
(In reply to Clayton Coleman from comment #14) > I am extremely concerned that this test is not verifying that source ip is > preserved. That's a key feature and behavior of the router (it's why we > enabled proxy protocol) and it should work on AWS. > > We need to find a way to consistently test the feature as part of this e2e > test. The test case as written is checking that if a pod within the cluster connects to a service within the cluster via the router, then the client IP is preserved. I assume that what we actually care about is what happens when a client *outside the cluster* connects to the router? This test is *not* testing that. In the GCP/Azure case, when the CloudProvider sets a loadbalancer on a service, it updates the service.Status.LoadBalancer to have the load balancer's IP. Kube-proxy sees this and then creates a short-circuit rule redirecting traffic directly from the load balancer IP to the service IP. So when a pod in a GCP or Azure cluster tries to connect to a cloud loadbalancer, as in this test case, the connection never actually goes to the cloud load balancer; it is intercepted by iptables rules on the node and forwarded directly to the service endpoint. So you are not testing the client-IP-preserving behavior of the load balancer, you're testing the client-IP-preserving behavior of kube-proxy. With AWS, the CloudProvider does not set the loadbalancer IP on service.Status.LoadBalancer, so kube-proxy can't create the short-circuit rules, and so when a pod tries to connect to a load balancer, the connection leaves the node, gets NATted to the node IP, hits the cloud load balancer, and gets redirected back to the cluster from there. Assuming that the cloud load balancer preserves the client IP correctly, the destination pod should see X-Forwarded-For the node IP of the execpod rather than its pod IP. It would be better to have the test connect to the router directly from the openshift-tests process, rather that connecting from a pod inside the cluster being tested. Then you'd always be hitting the cloud load balancer. The problem with this would be figuring out what client IP is actually expected in that case...
Yeah, this would be like the upgrade availability tests. We probably need a rule that if the test process can't access that, the test is meaningless and should be excluded from whatever weird environment that is (private clusters need to be tested via a tunnel anyway).