Description of problem: On a OpenShiftSDN cluster created on GCP, automatic assignment egressIP is configured, curl external ipecho service, node's IP address instead of configured egressIP address is returned Version-Release number of selected component (if applicable): $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-0.nightly-2021-12-23-153012 True False 6h20m Cluster version is 4.10.0-0.nightly-2021-12-23-153012 $ oc get node NAME STATUS ROLES AGE VERSION jechen-1223b-dkcdl-master-0.c.openshift-qe.internal Ready master 6h38m v1.22.1+6859754 jechen-1223b-dkcdl-master-1.c.openshift-qe.internal Ready master 6h38m v1.22.1+6859754 jechen-1223b-dkcdl-master-2.c.openshift-qe.internal Ready master 6h38m v1.22.1+6859754 jechen-1223b-dkcdl-worker-a-qd45f.c.openshift-qe.internal Ready worker 6h31m v1.22.1+6859754 jechen-1223b-dkcdl-worker-b-mqqn6.c.openshift-qe.internal Ready worker 6h30m v1.22.1+6859754 How reproducible: Steps to Reproduce: 1. get node annotation Annotations: cloud.network.openshift.io/egress-ipconfig: [{"interface":"nic0","ifaddr":{"ipv4":"10.0.128.0/17"},"capacity":{"ip":10}}] csi.volume.kubernetes.io/nodeid: 2. patch hosts with egressCIDR $ oc patch hostsubnet jechen-1223b-dkcdl-worker-a-qd45f.c.openshift-qe.internal --type=merge -p '{"egressCIDRs":["10.0.128.0/17"]}' hostsubnet.network.openshift.io/jechen-1223b-dkcdl-worker-a-qd45f.c.openshift-qe.internal patched $ oc patch hostsubnet jechen-1223b-dkcdl-worker-b-mqqn6.c.openshift-qe.internal --type=merge -p '{"egressCIDRs":["10.0.128.0/17"]}' hostsubnet.network.openshift.io/jechen-1223b-dkcdl-worker-b-mqqn6.c.openshift-qe.internal patched 3. patch project with egressIP $ oc new-project test $ oc patch netnamespace test --type=merge -p '{"egressIPs":["10.0.128.101"]}' netnamespace.network.openshift.io/test patched $ oc get hostsubnet NAME HOST HOST IP SUBNET EGRESS CIDRS EGRESS IPS jechen-1223b-dkcdl-master-0.c.openshift-qe.internal jechen-1223b-dkcdl-master-0.c.openshift-qe.internal 10.0.0.7 10.130.0.0/23 jechen-1223b-dkcdl-master-1.c.openshift-qe.internal jechen-1223b-dkcdl-master-1.c.openshift-qe.internal 10.0.0.6 10.128.0.0/23 jechen-1223b-dkcdl-master-2.c.openshift-qe.internal jechen-1223b-dkcdl-master-2.c.openshift-qe.internal 10.0.0.5 10.129.0.0/23 jechen-1223b-dkcdl-worker-a-qd45f.c.openshift-qe.internal jechen-1223b-dkcdl-worker-a-qd45f.c.openshift-qe.internal 10.0.128.2 10.131.0.0/23 ["10.0.128.0/17"] ["10.0.128.101"] jechen-1223b-dkcdl-worker-b-mqqn6.c.openshift-qe.internal jechen-1223b-dkcdl-worker-b-mqqn6.c.openshift-qe.internal 10.0.128.3 10.128.2.0/23 ["10.0.128.0/17"] Installed ipecho service on an int_svc instance that created with the GCP cluster ssh to the int_svc instance: sudo yum install docker sudo systemctl start docker sudo docker run --name ipecho -d -p 8888:80 docker.io/aosqe/ip-echo add port 88888 as allowed port to firewall rules 4. create test pod, and curl the ipecho service $ oc create -f /home/jechen/automation-work/verification-tests/testdata/networking/list_for_pods.json replicationcontroller/test-rc created $ oc get all NAME READY STATUS RESTARTS AGE pod/test-rc-md9b4 1/1 Running 0 5h38m pod/test-rc-nmvkz 1/1 Running 0 5h38m NAME DESIRED CURRENT READY AGE replicationcontroller/test-rc 2 2 2 5h38m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/test-service ClusterIP 172.30.98.163 <none> 27017/TCP 5h38m $ oc rsh test-rc-md9b4 ~ $ curl 10.0.0.2:8888 10.0.128.2 $ oc debug node/jechen-1223b-dkcdl-worker-a-qd45f.c.openshift-qe.internal Starting pod/jechen-1223b-dkcdl-worker-a-qd45fcopenshift-qeinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.128.2 If you don't see a command prompt, try pressing enter. sh-4.4# chroot /host sh-4.4# curl 10.0.0.2:8888 10.0.128.2 Actual results: curl the ipecho service returned the host IP address Expected results: curl the ipecho service should return egressIP address 10.0.128.101 Additional info:
manual egressIP assignent with OpenShift-SDN cluster on GCP is having same problem.
@jechen I have made changed to the PR. Please let me know if the issue still occurs and if so share the reproduction steps
@pdiak In order to test your PR pre-merged, I have to use cluster-bot to build a cluster, but the cluster-bot does not give me an external VM instance where I can install ipecho service to verify egressIP. Normally, I use Jenkins to build a GCP cluster, and I specify an external VM instance being built along with the cluster by Jenkins. Then I install ipecho service on the external VM instance. I am not able to figure out a way to have cluster-bot not only build me a cluster, but also build me an external VM instance. I think I will have to wait till the PR being merged before I can test the PR.
Verified in 4.10.0-0.nightly-2022-01-15-092722 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-0.nightly-2022-01-15-092722 True False 10m Cluster version is 4.10.0-0.nightly-2022-01-15-092722 $ oc get node NAME STATUS ROLES AGE VERSION jechen-0115d-zmlrk-master-0.c.openshift-qe.internal Ready master 30m v1.23.0+60f5a1c jechen-0115d-zmlrk-master-1.c.openshift-qe.internal Ready master 30m v1.23.0+60f5a1c jechen-0115d-zmlrk-master-2.c.openshift-qe.internal Ready master 30m v1.23.0+60f5a1c jechen-0115d-zmlrk-worker-a-khvzf.c.openshift-qe.internal Ready worker 20m v1.23.0+60f5a1c jechen-0115d-zmlrk-worker-b-4pf7t.c.openshift-qe.internal Ready worker 20m v1.23.0+60f5a1c $ oc patch hostsubnet jechen-0115d-zmlrk-worker-a-khvzf.c.openshift-qe.internal --type=merge -p '{"egressCIDRs":["10.0.128.0/17"]}' hostsubnet.network.openshift.io/jechen-0115d-zmlrk-worker-a-khvzf.c.openshift-qe.internal patched $ oc new-project test Now using project "test" on server "https://api.jechen-0115d.qe.gcp.devcluster.openshift.com:6443". You can add applications to this project with the 'new-app' command. For example, try: oc new-app rails-postgresql-example to build a new example application in Ruby. Or use kubectl to deploy a simple Kubernetes application: kubectl create deployment hello-node --image=k8s.gcr.io/e2e-test-images/agnhost:2.33 -- /agnhost serve-hostname $ oc patch netnamespace test --type=merge -p '{"egressIPs":["10.0.128.101"]}' netnamespace.network.openshift.io/test patched $ oc get hostsubnet NAME HOST HOST IP SUBNET EGRESS CIDRS EGRESS IPS jechen-0115d-zmlrk-master-0.c.openshift-qe.internal jechen-0115d-zmlrk-master-0.c.openshift-qe.internal 10.0.0.5 10.129.0.0/23 jechen-0115d-zmlrk-master-1.c.openshift-qe.internal jechen-0115d-zmlrk-master-1.c.openshift-qe.internal 10.0.0.6 10.130.0.0/23 jechen-0115d-zmlrk-master-2.c.openshift-qe.internal jechen-0115d-zmlrk-master-2.c.openshift-qe.internal 10.0.0.7 10.128.0.0/23 jechen-0115d-zmlrk-worker-a-khvzf.c.openshift-qe.internal jechen-0115d-zmlrk-worker-a-khvzf.c.openshift-qe.internal 10.0.128.2 10.131.0.0/23 ["10.0.128.0/17"] ["10.0.128.101"] jechen-0115d-zmlrk-worker-b-4pf7t.c.openshift-qe.internal jechen-0115d-zmlrk-worker-b-4pf7t.c.openshift-qe.internal 10.0.128.3 10.128.2.0/23 # create test project and test pods $ oc create -f ./verification-tests/testdata/networking/list_for_pods.json replicationcontroller/test-rc created service/test-service created $ oc get pod NAME READY STATUS RESTARTS AGE test-rc-7m5bh 1/1 Running 0 9m18s test-rc-hv69h 1/1 Running 0 9m18s #curl the ip echo service from inside of test pod $ oc rsh test-rc-7m5bh ~ $ curl 10.0.0.2:8888 10.0.128.101~ $ <----- egressIP address is returned correctly $ exit # remove the egressIP, then curl the ip echo service from inside of test pods $ oc patch netnamespace test --type=merge -p '{"egressIPs":[]}' netnamespace.network.openshift.io/test patched $ oc rsh test-rc-7m5bh ~ $ curl 10.0.0.2:8888 10.0.128.3~ $ ~ $ exit $ oc rsh test-rc-hv69h ~ $ curl 10.0.0.2:8888 10.0.128.2~ $ ~ $ exit # added egressIP back, curl ip echo service from inside of test pods $ oc patch netnamespace test --type=merge -p '{"egressIPs":["10.0.128.101"]}' netnamespace.network.openshift.io/test patched $ oc rsh test-rc-7m5bh ~ $ curl 10.0.0.2:8888 10.0.128.101~ $ <----- egressIP address is returned correctly ~ $ exit $ oc rsh test-rc-hv69h ~ $ curl 10.0.0.2:8888 10.0.128.101~ $ <----- egressIP address is returned correctly ~ $ exit
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056