Bug 2100601 - Update CNO to allow EgressIP node reachability check
Summary: Update CNO to allow EgressIP node reachability check
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.11
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.12.0
Assignee: Mohamed Mahmoud
QA Contact: jechen
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-06-23 18:53 UTC by Mohamed Mahmoud
Modified: 2023-09-18 04:40 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-10 11:19:12 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-network-operator pull 1498 0 None open Bug 2100601: Update CNO to config EgressIP timeout for ovnk 2022-06-23 18:57:35 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 11:19:22 UTC

Description Mohamed Mahmoud 2022-06-23 18:53:05 UTC
Description of problem:

CNO changes to configure EgressIP timeout
Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 6 jechen 2022-07-21 00:49:21 UTC
Verified in 4.12.0-0.nightly-2022-07-20-030220

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.12.0-0.nightly-2022-07-20-030220   True        False         3h4m    Cluster version is 4.12.0-0.nightly-2022-07-20-030220


$ oc label node jechen-0720d-l56lv-worker-a-cv5d9.c.openshift-qe.internal "k8s.ovn.org/egress-assignable"=""
node/jechen-0720d-l56lv-worker-a-cv5d9.c.openshift-qe.internal labeled


$ oc label node jechen-0720d-l56lv-worker-b-pr8t6.c.openshift-qe.internal "k8s.ovn.org/egress-assignable"=""
node/jechen-0720d-l56lv-worker-b-pr8t6.c.openshift-qe.internal labeled

$ oc new-project test


$ oc create -f ./SDN-1332-test/config_egressip1_ovn_ns_team_red.yaml
egressip.k8s.ovn.org/egressip1 created

$ oc get egressip
NAME        EGRESSIPS      ASSIGNED NODE                                               ASSIGNED EGRESSIPS
egressip1   10.0.128.101   jechen-0720d-l56lv-worker-a-cv5d9.c.openshift-qe.internal   10.0.128.101


oc edit networks.operator.openshift.io -oyaml

$ oc get networks.operator.openshift.io -oyaml
apiVersion: v1
items:
- apiVersion: operator.openshift.io/v1
  kind: Network
  metadata:
    annotations:
      networkoperator.openshift.io/ovn-cluster-initiator: 10.0.0.5
    creationTimestamp: "2022-07-20T21:15:45Z"
    generation: 92
    name: cluster
    resourceVersion: "41478"
    uid: a3642a14-7e54-4402-87b2-52d6a221477a
  spec:
    clusterNetwork:
    - cidr: 10.128.0.0/14
      hostPrefix: 23
    defaultNetwork:
      ovnKubernetesConfig:
        egressIPConfig:
          reachabilityTotalTimeoutSeconds: 10


$ oc scale --replicas 1 -n openshift-cluster-version deployments/cluster-version-operator
deployment.apps/cluster-version-operator scaled

$ oc get -n openshift-cluster-version deployments/cluster-version-operator
NAME                       READY   UP-TO-DATE   AVAILABLE   AGE
cluster-version-operator   1/1     1            1           3h27m


$ oc debug node/jechen-0720d-l56lv-worker-a-cv5d9.c.openshift-qe.internal
Warning: would violate PodSecurity "restricted:v1.24": host namespaces (hostNetwork=true, hostPID=true), privileged (container "container-00" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "container-00" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "container-00" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volume "host" uses restricted volume type "hostPath"), runAsNonRoot != true (pod or container "container-00" must set securityContext.runAsNonRoot=true), runAsUser=0 (container "container-00" must not set runAsUser=0), seccompProfile (pod or container "container-00" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
Starting pod/jechen-0720d-l56lv-worker-a-cv5d9copenshift-qeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.128.3
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# 
sh-4.4# 
sh-4.4# 
sh-4.4# shutdown
Shutdown scheduled for Wed 2022-07-20 22:25:06 UTC, use 'shutdown -c' to cancel.


$ oc get egressip
NAME        EGRESSIPS      ASSIGNED NODE                                               ASSIGNED EGRESSIPS
egressip1   10.0.128.101   jechen-0720d-l56lv-worker-a-cv5d9.c.openshift-qe.internal   10.0.128.101

$ date
Wed Jul 20 18:25:33 EDT 2022
$ oc get egressip
NAME        EGRESSIPS      ASSIGNED NODE                                               ASSIGNED EGRESSIPS
egressip1   10.0.128.101   jechen-0720d-l56lv-worker-a-cv5d9.c.openshift-qe.internal   10.0.128.101
$ date
Wed Jul 20 18:25:37 EDT 2022
oc get egressip
NAME        EGRESSIPS      ASSIGNED NODE                                               ASSIGNED EGRESSIPS
egressip1   10.0.128.101   jechen-0720d-l56lv-worker-a-cv5d9.c.openshift-qe.internal   10.0.128.101
$ date
Wed Jul 20 18:25:40 EDT 2022
$ oc get egressip
NAME        EGRESSIPS      ASSIGNED NODE   ASSIGNED EGRESSIPS
egressip1   10.0.128.101                   
$ date
Wed Jul 20 18:25:45 EDT 2022
$ oc get egressip
NAME        EGRESSIPS      ASSIGNED NODE                                               ASSIGNED EGRESSIPS
egressip1   10.0.128.101   jechen-0720d-l56lv-worker-b-pr8t6.c.openshift-qe.internal   10.0.128.101


measured the time it took to failover to the second egressNode, it is about 10s

# repeat the steps above, increased reachabilityTotalTimeoutSeconds to 20s, repeated the test above

measured the time it took to failover to the other egressNode, it was about 20s

==> Verified reachabilityTotalTimeoutSeconds

Comment 8 errata-xmlrpc 2022-08-10 11:19:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069

Comment 9 Red Hat Bugzilla 2023-09-18 04:40:02 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.