Bug 2047299
| Summary: | nodeport not reachable port connection timeout | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Erik Lalancette <elalance> |
| Component: | Networking | Assignee: | Nadia Pinaeva <npinaeva> |
| Networking sub component: | ovn-kubernetes | QA Contact: | Anurag saxena <anusaxen> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | urgent | ||
| Priority: | high | CC: | elalance, fgrosjea, jcaamano, jechen, npinaeva, sdodson, trozet |
| Version: | 4.8 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.13.0 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-05-17 22:46:32 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Erik Lalancette
2022-01-27 14:38:19 UTC
Hi @npinaeva exactly they have a egressfirewall present on the namespaces. When customer remove this egressfirewall the nodeport connection is working fine. Can you help better understand why this happened. thanks Hi @npinaeva , I was able to reproduce this scenario on my cluster IPI 4.8.18
The cluster is using Gateway Mode: local
Here’s the step to reproduces it.
1- oc new-project hello
2- oc new-app --docker-image=docker.io/openshift/hello-openshift --labels='app=hello-openshift' -n hello
3- cat <<EOF | oc create -f -
apiVersion: v1
kind: Service
metadata:
name: lb
spec:
ports:
- name: lb
port: 8080
loadBalancerIP:
type: LoadBalancer
selector:
app: hello-openshift
EOF
4- cat <<EOF | oc create -f -
apiVersion: k8s.ovn.org/v1
kind: EgressFirewall
metadata:
name: default
spec:
egress
- type: Allow
to:
dnsName: www.test.com
- type: Allow
to:
cidrSelector: 172.30.0.0/16
- type: Allow
to:
cidrSelector: 10.128.0.0/14
- type: Deny
to:
cidrSelector: 0.0.0.0/0
EOF
5- The nodeport is unreachable from the external and also from all nodes into the cluster. Therefore AWS ELB status is OutofService.
6- If I add the 100.64.0.0/16 (OVN GW local switch ip range) ciddr into the egressfirewall like this. Therefore AWS ELB status is InService. And nodeport is reachable from external and also from all nodes into the cluster.
cat <<EOF | oc create -f -
apiVersion: k8s.ovn.org/v1
kind: EgressFirewall
metadata:
name: default
spec:
egress
- type: Allow
to:
dnsName: www.test.com
- type: Allow
to:
cidrSelector: 172.30.0.0/16
- type: Allow
to:
cidrSelector: 10.128.0.0/14
- type: Allow
to:
cidrSelector: 100.64.0.0/16
- type: Deny
to:
cidrSelector: 0.0.0.0/0
EOF
Here’s the output how to test the nodeport from the node directly. ---> Before the egressfirewall oc get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE hello-openshift ClusterIP 172.30.175.164 <none> 8080/TCP,8888/TCP 16s lb LoadBalancer 172.30.73.43 a04a3409e7cec408eb8746845f87bfdc-642313472.us-east-2.elb.amazonaws.com 8080:32064/TCP 3s $ oc get node NAME STATUS ROLES AGE VERSION ip-10-0-130-205.us-east-2.compute.internal Ready master 48m v1.21.6+bb8d50a ip-10-0-140-74.us-east-2.compute.internal Ready worker 38m v1.21.6+bb8d50a ip-10-0-188-79.us-east-2.compute.internal Ready master 48m v1.21.6+bb8d50a ip-10-0-190-185.us-east-2.compute.internal Ready worker 38m v1.21.6+bb8d50a ip-10-0-194-5.us-east-2.compute.internal Ready worker 39m v1.21.6+bb8d50a ip-10-0-215-32.us-east-2.compute.internal Ready master 47m v1.21.6+bb8d50a $oc debug node/ip-10-0-140-74.us-east-2.compute.internal Starting pod/ip-10-0-140-74us-east-2computeinternal-debug ... To use host binaries, run `chroot /host` chroot /host Pod IP: 10.0.140.74 If you don't see a command prompt, try pressing enter. sh-4.4# chroot /host sh-4.4# nc -v 10.0.140.74 32064 Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Connected to 10.0.140.74:32064. ---> After applying the egressfirewall oc debug node/ip-10-0-140-74.us-east-2.compute.internal Starting pod/ip-10-0-140-74us-east-2computeinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.140.74 If you don't see a command prompt, try pressing enter. sh-4.4# chroot /host sh-4.4# nc -v 10.0.140.74 32064 Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Connection timed out. ---> After applying the egressfirewall with the 100.64.0.0/16 Allow ovn logical ciddr oc debug node/ip-10-0-140-74.us-east-2.compute.internal Starting pod/ip-10-0-140-74us-east-2computeinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.140.74 If you don't see a command prompt, try pressing enter. sh-4.4# chroot /host sh-4.4# nc -v 10.0.140.74 32064 Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Connected to 10.0.140.74:32064. I just try on OCP 4.9.18 with Gateway Mode: shared the result is exactly the same. This is a bug in our EgressFirewall implementation and we need some time to fix it (it requires fixes for 2 different components actually, so it can be not very fast).
While we're working on a bug fix, can we suggest to add
- type: Allow
to:
cidrSelector: 100.64.0.0/16
as a workaround for this customer?
Related OVN bug: https://bugzilla.redhat.com/show_bug.cgi?id=2057426 Verified the fix in 4.13.0-0.nightly-2023-01-17-152326
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.13.0-0.nightly-2023-01-17-152326 True False 75m Cluster version is 4.13.0-0.nightly-2023-01-17-152326
1. create test namespace, and nodeport service
$ oc new-project test
$ oc label ns test security.openshift.io/scc.podSecurityLabelSync=false pod-security.kubernetes.io/enforce=privileged pod-security.kubernetes.io/audit=privileged pod-security.kubernetes.io/warn=privileged --overwrite
namespace/test labeled
$ cat pod_httpserver.yaml
apiVersion: v1
kind: Pod
metadata:
name: hello-pod
labels:
name: hello-pod
spec:
containers:
- name: hello-world
image: gcr.io/google-samples/node-hello:1.0
ports:
- containerPort: 8080
protocol: TCP
$ oc apply -f pod_httpserver.yaml
ocpod/hello-pod created
$ cat svc_nodeport.yaml
kind: Service
apiVersion: v1
metadata:
name: hello-pod
labels:
name: hello-pod
spec:
ports:
- name: http
port: 27017
protocol: TCP
nodePort: 30000
targetPort: 8080
selector:
name: hello-pod
type: NodePort
$ oc apply -f svc_nodeport.yaml
service/hello-pod created
$ oc get all -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/hello-pod 0/1 ContainerCreating 0 14s <none> ip-10-0-128-209.us-east-2.compute.internal <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/hello-pod NodePort 172.30.10.132 <none> 27017:30000/TCP 6s name=hello-pod
2. before applying egressfirewall rule, curl nodeport service from external bootstrap node got reply
[core@ip-10-0-31-153 ~]$ curl 10.0.128.209:30000
Hello Kubernetes!
3. after applying egressfirewall rule, curl nodeport service from external bootstrap node still got reply
$ cat egressfirewall_denyall.yaml
apiVersion: k8s.ovn.org/v1
kind: EgressFirewall
metadata:
name: default
spec:
egress:
- type: Deny
to:
cidrSelector: 0.0.0.0/0
$ oc apply -f egressfirewall_denyall.yaml
egressfirewall.k8s.ovn.org/default created
[core@ip-10-0-31-153 ~]$ curl 10.0.128.209:30000
Hello Kubernetes!
==> verified the fix
correction for error in comment #18, in step 3, should be: after applying egressfirewall rule, curl nodeport service from external bootstrap node got reply Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.13.0 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:1326 |