Bug 2100536 - Update API to allow EgressIP node reachability check
Summary: Update API to allow EgressIP node reachability check
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.11
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.12.0
Assignee: Mohamed Mahmoud
QA Contact: jechen
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-06-23 15:59 UTC by Mohamed Mahmoud
Modified: 2023-01-09 17:51 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-10 11:19:12 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift api pull 1210 0 None open Bug 2100536: Update API to config EgressIP timeout 2022-06-23 16:03:52 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 11:19:22 UTC

Description Mohamed Mahmoud 2022-06-23 15:59:31 UTC
Description of problem:

add new api for egressip to be able to change node reachability check timeout
default is 1 second and if users chose to disable this check value of 0 will do disable the node reachability check.
Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 7 jechen 2022-07-21 00:51:27 UTC
Verified in 4.12.0-0.nightly-2022-07-20-030220

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.12.0-0.nightly-2022-07-20-030220   True        False         3h4m    Cluster version is 4.12.0-0.nightly-2022-07-20-030220


$ oc label node jechen-0720d-l56lv-worker-a-cv5d9.c.openshift-qe.internal "k8s.ovn.org/egress-assignable"=""
node/jechen-0720d-l56lv-worker-a-cv5d9.c.openshift-qe.internal labeled


$ oc label node jechen-0720d-l56lv-worker-b-pr8t6.c.openshift-qe.internal "k8s.ovn.org/egress-assignable"=""
node/jechen-0720d-l56lv-worker-b-pr8t6.c.openshift-qe.internal labeled

$ oc new-project test


$ oc create -f ./SDN-1332-test/config_egressip1_ovn_ns_team_red.yaml
egressip.k8s.ovn.org/egressip1 created

$ oc get egressip
NAME        EGRESSIPS      ASSIGNED NODE                                               ASSIGNED EGRESSIPS
egressip1   10.0.128.101   jechen-0720d-l56lv-worker-a-cv5d9.c.openshift-qe.internal   10.0.128.101


oc edit networks.operator.openshift.io -oyaml

$ oc get networks.operator.openshift.io -oyaml
apiVersion: v1
items:
- apiVersion: operator.openshift.io/v1
  kind: Network
  metadata:
    annotations:
      networkoperator.openshift.io/ovn-cluster-initiator: 10.0.0.5
    creationTimestamp: "2022-07-20T21:15:45Z"
    generation: 92
    name: cluster
    resourceVersion: "41478"
    uid: a3642a14-7e54-4402-87b2-52d6a221477a
  spec:
    clusterNetwork:
    - cidr: 10.128.0.0/14
      hostPrefix: 23
    defaultNetwork:
      ovnKubernetesConfig:
        egressIPConfig:
          reachabilityTotalTimeoutSeconds: 10


$ oc scale --replicas 1 -n openshift-cluster-version deployments/cluster-version-operator
deployment.apps/cluster-version-operator scaled

$ oc get -n openshift-cluster-version deployments/cluster-version-operator
NAME                       READY   UP-TO-DATE   AVAILABLE   AGE
cluster-version-operator   1/1     1            1           3h27m


$ oc debug node/jechen-0720d-l56lv-worker-a-cv5d9.c.openshift-qe.internal
Warning: would violate PodSecurity "restricted:v1.24": host namespaces (hostNetwork=true, hostPID=true), privileged (container "container-00" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "container-00" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "container-00" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volume "host" uses restricted volume type "hostPath"), runAsNonRoot != true (pod or container "container-00" must set securityContext.runAsNonRoot=true), runAsUser=0 (container "container-00" must not set runAsUser=0), seccompProfile (pod or container "container-00" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
Starting pod/jechen-0720d-l56lv-worker-a-cv5d9copenshift-qeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.128.3
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# 
sh-4.4# 
sh-4.4# 
sh-4.4# shutdown
Shutdown scheduled for Wed 2022-07-20 22:25:06 UTC, use 'shutdown -c' to cancel.


$ oc get egressip
NAME        EGRESSIPS      ASSIGNED NODE                                               ASSIGNED EGRESSIPS
egressip1   10.0.128.101   jechen-0720d-l56lv-worker-a-cv5d9.c.openshift-qe.internal   10.0.128.101

$ date
Wed Jul 20 18:25:33 EDT 2022
$ oc get egressip
NAME        EGRESSIPS      ASSIGNED NODE                                               ASSIGNED EGRESSIPS
egressip1   10.0.128.101   jechen-0720d-l56lv-worker-a-cv5d9.c.openshift-qe.internal   10.0.128.101
$ date
Wed Jul 20 18:25:37 EDT 2022
oc get egressip
NAME        EGRESSIPS      ASSIGNED NODE                                               ASSIGNED EGRESSIPS
egressip1   10.0.128.101   jechen-0720d-l56lv-worker-a-cv5d9.c.openshift-qe.internal   10.0.128.101
$ date
Wed Jul 20 18:25:40 EDT 2022
$ oc get egressip
NAME        EGRESSIPS      ASSIGNED NODE   ASSIGNED EGRESSIPS
egressip1   10.0.128.101                   
$ date
Wed Jul 20 18:25:45 EDT 2022
$ oc get egressip
NAME        EGRESSIPS      ASSIGNED NODE                                               ASSIGNED EGRESSIPS
egressip1   10.0.128.101   jechen-0720d-l56lv-worker-b-pr8t6.c.openshift-qe.internal   10.0.128.101


measured the time it took to failover to the second egressNode, it is about 10s

# repeat the steps above, increased reachabilityTotalTimeoutSeconds to 20s, repeated the test above

measured the time it took to failover to the other egressNode, it was about 20s

==> Verified reachabilityTotalTimeoutSeconds

Comment 8 errata-xmlrpc 2022-08-10 11:19:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.