Description of problem: NodePort port not accessible Version-Release number of selected component (if applicable): OCP 4.8.20 How reproducible: $oc -n ui-nprd get services -o wide NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR docker-registry ClusterIP 10.201.219.240 <none> 5000/TCP 24d app=registry docker-registry-lb LoadBalancer 10.201.252.253 internal-xxxxxx.xx-xxxx-1.elb.amazonaws.com 5000:30779/TCP 3d22h app=registry docker-registry-np NodePort 10.201.216.26 <none> 5000:32428/TCP 3d16h app=registry $oc debug node/ip-xxx.ca-central-1.compute.internal Starting pod/ip-xxx.ca-central-1computeinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.81.23.96 If you don't see a command prompt, try pressing enter. sh-4.2# chroot /host sh-4.4# nc -vz 10.81.23.96 32428 Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Connection timed out. In a new-created namespaces the same deployment works: [RHEL7:> oc project Using project "test-c1" on server "https://api.xx.xx.xxxx.xx.xx:6443". [RHEL7:- ~/tmp]> oc port-forward service/docker-registry-np 5000:5000 Forwarding from 127.0.0.1:5000 -> 5000 [1]+ Stopped oc4 port-forward service/docker-registry-np 5000:5000 [RHEL7: ~/tmp]> bg %1 [1]+ oc4 port-forward service/docker-registry-np 5000:5000 & [RHEL7: ~/tmp]> nc -v localhost 5000 Ncat: Version 7.50 ( https://nmap.org/ncat ) Ncat: Connected to 127.0.0.1:5000. Handling connection for 5000 [RHEL7: ~/tmp]> kill %1 [RHEL7: ~/tmp]> [1]+ Terminated oc4 port-forward service/docker-registry-np 5000:5000 [RHEL7: ~/tmp]> oc get services NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE docker-registry-np NodePort 10.201.224.174 <none> 5000:31793/TCP 68s [RHEL7: ~/tmp]> oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES registry-75b7c7fd94-rx29j 1/1 Running 0 7m5s 10.201.1.29 ip-xxx.ca-central-1.compute.internal <none> <none> [RHEL7: ~/tmp]> oc debug node/ip-xxx.ca-central-1.compute.internal Starting pod/ip-xxxca-central-1computeinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.81.23.87 If you don't see a command prompt, try pressing enter. sh-4.2# chroot /host sh-4.4# nc -v 10.81.23.87 31793 Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Connected to 10.81.23.87:31793. Actual results: - Working on new created namespace - Not working on already created namespace Expected results: - Suppose to work on all namespaces. Additional info: - This cluster get upgrade from 4.7.x to 4.8 and then they manually enable OVN. - The issue was happening on all namespaces but after restarting the ovnkube-master-xxxx pods only the newly created namespaces work.
Hi @npinaeva exactly they have a egressfirewall present on the namespaces. When customer remove this egressfirewall the nodeport connection is working fine. Can you help better understand why this happened. thanks
Hi @npinaeva , I was able to reproduce this scenario on my cluster IPI 4.8.18 The cluster is using Gateway Mode: local Here’s the step to reproduces it. 1- oc new-project hello 2- oc new-app --docker-image=docker.io/openshift/hello-openshift --labels='app=hello-openshift' -n hello 3- cat <<EOF | oc create -f - apiVersion: v1 kind: Service metadata: name: lb spec: ports: - name: lb port: 8080 loadBalancerIP: type: LoadBalancer selector: app: hello-openshift EOF 4- cat <<EOF | oc create -f - apiVersion: k8s.ovn.org/v1 kind: EgressFirewall metadata: name: default spec: egress - type: Allow to: dnsName: www.test.com - type: Allow to: cidrSelector: 172.30.0.0/16 - type: Allow to: cidrSelector: 10.128.0.0/14 - type: Deny to: cidrSelector: 0.0.0.0/0 EOF 5- The nodeport is unreachable from the external and also from all nodes into the cluster. Therefore AWS ELB status is OutofService. 6- If I add the 100.64.0.0/16 (OVN GW local switch ip range) ciddr into the egressfirewall like this. Therefore AWS ELB status is InService. And nodeport is reachable from external and also from all nodes into the cluster. cat <<EOF | oc create -f - apiVersion: k8s.ovn.org/v1 kind: EgressFirewall metadata: name: default spec: egress - type: Allow to: dnsName: www.test.com - type: Allow to: cidrSelector: 172.30.0.0/16 - type: Allow to: cidrSelector: 10.128.0.0/14 - type: Allow to: cidrSelector: 100.64.0.0/16 - type: Deny to: cidrSelector: 0.0.0.0/0 EOF
Here’s the output how to test the nodeport from the node directly. ---> Before the egressfirewall oc get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE hello-openshift ClusterIP 172.30.175.164 <none> 8080/TCP,8888/TCP 16s lb LoadBalancer 172.30.73.43 a04a3409e7cec408eb8746845f87bfdc-642313472.us-east-2.elb.amazonaws.com 8080:32064/TCP 3s $ oc get node NAME STATUS ROLES AGE VERSION ip-10-0-130-205.us-east-2.compute.internal Ready master 48m v1.21.6+bb8d50a ip-10-0-140-74.us-east-2.compute.internal Ready worker 38m v1.21.6+bb8d50a ip-10-0-188-79.us-east-2.compute.internal Ready master 48m v1.21.6+bb8d50a ip-10-0-190-185.us-east-2.compute.internal Ready worker 38m v1.21.6+bb8d50a ip-10-0-194-5.us-east-2.compute.internal Ready worker 39m v1.21.6+bb8d50a ip-10-0-215-32.us-east-2.compute.internal Ready master 47m v1.21.6+bb8d50a $oc debug node/ip-10-0-140-74.us-east-2.compute.internal Starting pod/ip-10-0-140-74us-east-2computeinternal-debug ... To use host binaries, run `chroot /host` chroot /host Pod IP: 10.0.140.74 If you don't see a command prompt, try pressing enter. sh-4.4# chroot /host sh-4.4# nc -v 10.0.140.74 32064 Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Connected to 10.0.140.74:32064. ---> After applying the egressfirewall oc debug node/ip-10-0-140-74.us-east-2.compute.internal Starting pod/ip-10-0-140-74us-east-2computeinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.140.74 If you don't see a command prompt, try pressing enter. sh-4.4# chroot /host sh-4.4# nc -v 10.0.140.74 32064 Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Connection timed out. ---> After applying the egressfirewall with the 100.64.0.0/16 Allow ovn logical ciddr oc debug node/ip-10-0-140-74.us-east-2.compute.internal Starting pod/ip-10-0-140-74us-east-2computeinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.140.74 If you don't see a command prompt, try pressing enter. sh-4.4# chroot /host sh-4.4# nc -v 10.0.140.74 32064 Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Connected to 10.0.140.74:32064.
I just try on OCP 4.9.18 with Gateway Mode: shared the result is exactly the same.
This is a bug in our EgressFirewall implementation and we need some time to fix it (it requires fixes for 2 different components actually, so it can be not very fast). While we're working on a bug fix, can we suggest to add - type: Allow to: cidrSelector: 100.64.0.0/16 as a workaround for this customer?
Related OVN bug: https://bugzilla.redhat.com/show_bug.cgi?id=2057426
Verified the fix in 4.13.0-0.nightly-2023-01-17-152326 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.13.0-0.nightly-2023-01-17-152326 True False 75m Cluster version is 4.13.0-0.nightly-2023-01-17-152326 1. create test namespace, and nodeport service $ oc new-project test $ oc label ns test security.openshift.io/scc.podSecurityLabelSync=false pod-security.kubernetes.io/enforce=privileged pod-security.kubernetes.io/audit=privileged pod-security.kubernetes.io/warn=privileged --overwrite namespace/test labeled $ cat pod_httpserver.yaml apiVersion: v1 kind: Pod metadata: name: hello-pod labels: name: hello-pod spec: containers: - name: hello-world image: gcr.io/google-samples/node-hello:1.0 ports: - containerPort: 8080 protocol: TCP $ oc apply -f pod_httpserver.yaml ocpod/hello-pod created $ cat svc_nodeport.yaml kind: Service apiVersion: v1 metadata: name: hello-pod labels: name: hello-pod spec: ports: - name: http port: 27017 protocol: TCP nodePort: 30000 targetPort: 8080 selector: name: hello-pod type: NodePort $ oc apply -f svc_nodeport.yaml service/hello-pod created $ oc get all -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod/hello-pod 0/1 ContainerCreating 0 14s <none> ip-10-0-128-209.us-east-2.compute.internal <none> <none> NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR service/hello-pod NodePort 172.30.10.132 <none> 27017:30000/TCP 6s name=hello-pod 2. before applying egressfirewall rule, curl nodeport service from external bootstrap node got reply [core@ip-10-0-31-153 ~]$ curl 10.0.128.209:30000 Hello Kubernetes! 3. after applying egressfirewall rule, curl nodeport service from external bootstrap node still got reply $ cat egressfirewall_denyall.yaml apiVersion: k8s.ovn.org/v1 kind: EgressFirewall metadata: name: default spec: egress: - type: Deny to: cidrSelector: 0.0.0.0/0 $ oc apply -f egressfirewall_denyall.yaml egressfirewall.k8s.ovn.org/default created [core@ip-10-0-31-153 ~]$ curl 10.0.128.209:30000 Hello Kubernetes! ==> verified the fix
correction for error in comment #18, in step 3, should be: after applying egressfirewall rule, curl nodeport service from external bootstrap node got reply
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.13.0 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:1326