Bug 2100601
| Summary: | Update CNO to allow EgressIP node reachability check | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Mohamed Mahmoud <mmahmoud> |
| Component: | Networking | Assignee: | Mohamed Mahmoud <mmahmoud> |
| Networking sub component: | ovn-kubernetes | QA Contact: | jechen <jechen> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | medium | ||
| Priority: | medium | CC: | huirwang, jechen |
| Version: | 4.11 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.12.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-08-10 11:19:12 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Mohamed Mahmoud
2022-06-23 18:53:05 UTC
Verified in 4.12.0-0.nightly-2022-07-20-030220
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.12.0-0.nightly-2022-07-20-030220 True False 3h4m Cluster version is 4.12.0-0.nightly-2022-07-20-030220
$ oc label node jechen-0720d-l56lv-worker-a-cv5d9.c.openshift-qe.internal "k8s.ovn.org/egress-assignable"=""
node/jechen-0720d-l56lv-worker-a-cv5d9.c.openshift-qe.internal labeled
$ oc label node jechen-0720d-l56lv-worker-b-pr8t6.c.openshift-qe.internal "k8s.ovn.org/egress-assignable"=""
node/jechen-0720d-l56lv-worker-b-pr8t6.c.openshift-qe.internal labeled
$ oc new-project test
$ oc create -f ./SDN-1332-test/config_egressip1_ovn_ns_team_red.yaml
egressip.k8s.ovn.org/egressip1 created
$ oc get egressip
NAME EGRESSIPS ASSIGNED NODE ASSIGNED EGRESSIPS
egressip1 10.0.128.101 jechen-0720d-l56lv-worker-a-cv5d9.c.openshift-qe.internal 10.0.128.101
oc edit networks.operator.openshift.io -oyaml
$ oc get networks.operator.openshift.io -oyaml
apiVersion: v1
items:
- apiVersion: operator.openshift.io/v1
kind: Network
metadata:
annotations:
networkoperator.openshift.io/ovn-cluster-initiator: 10.0.0.5
creationTimestamp: "2022-07-20T21:15:45Z"
generation: 92
name: cluster
resourceVersion: "41478"
uid: a3642a14-7e54-4402-87b2-52d6a221477a
spec:
clusterNetwork:
- cidr: 10.128.0.0/14
hostPrefix: 23
defaultNetwork:
ovnKubernetesConfig:
egressIPConfig:
reachabilityTotalTimeoutSeconds: 10
$ oc scale --replicas 1 -n openshift-cluster-version deployments/cluster-version-operator
deployment.apps/cluster-version-operator scaled
$ oc get -n openshift-cluster-version deployments/cluster-version-operator
NAME READY UP-TO-DATE AVAILABLE AGE
cluster-version-operator 1/1 1 1 3h27m
$ oc debug node/jechen-0720d-l56lv-worker-a-cv5d9.c.openshift-qe.internal
Warning: would violate PodSecurity "restricted:v1.24": host namespaces (hostNetwork=true, hostPID=true), privileged (container "container-00" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "container-00" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "container-00" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volume "host" uses restricted volume type "hostPath"), runAsNonRoot != true (pod or container "container-00" must set securityContext.runAsNonRoot=true), runAsUser=0 (container "container-00" must not set runAsUser=0), seccompProfile (pod or container "container-00" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
Starting pod/jechen-0720d-l56lv-worker-a-cv5d9copenshift-qeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.128.3
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4#
sh-4.4#
sh-4.4#
sh-4.4# shutdown
Shutdown scheduled for Wed 2022-07-20 22:25:06 UTC, use 'shutdown -c' to cancel.
$ oc get egressip
NAME EGRESSIPS ASSIGNED NODE ASSIGNED EGRESSIPS
egressip1 10.0.128.101 jechen-0720d-l56lv-worker-a-cv5d9.c.openshift-qe.internal 10.0.128.101
$ date
Wed Jul 20 18:25:33 EDT 2022
$ oc get egressip
NAME EGRESSIPS ASSIGNED NODE ASSIGNED EGRESSIPS
egressip1 10.0.128.101 jechen-0720d-l56lv-worker-a-cv5d9.c.openshift-qe.internal 10.0.128.101
$ date
Wed Jul 20 18:25:37 EDT 2022
oc get egressip
NAME EGRESSIPS ASSIGNED NODE ASSIGNED EGRESSIPS
egressip1 10.0.128.101 jechen-0720d-l56lv-worker-a-cv5d9.c.openshift-qe.internal 10.0.128.101
$ date
Wed Jul 20 18:25:40 EDT 2022
$ oc get egressip
NAME EGRESSIPS ASSIGNED NODE ASSIGNED EGRESSIPS
egressip1 10.0.128.101
$ date
Wed Jul 20 18:25:45 EDT 2022
$ oc get egressip
NAME EGRESSIPS ASSIGNED NODE ASSIGNED EGRESSIPS
egressip1 10.0.128.101 jechen-0720d-l56lv-worker-b-pr8t6.c.openshift-qe.internal 10.0.128.101
measured the time it took to failover to the second egressNode, it is about 10s
# repeat the steps above, increased reachabilityTotalTimeoutSeconds to 20s, repeated the test above
measured the time it took to failover to the other egressNode, it was about 20s
==> Verified reachabilityTotalTimeoutSeconds
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days |