Bug 1910533 - [OVN] It takes about 5 minutes for EgressIP failover to work
Summary: [OVN] It takes about 5 minutes for EgressIP failover to work
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.7
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.7.0
Assignee: Alexander Constantinescu
QA Contact: huirwang
URL:
Whiteboard:
Depends On:
Blocks: 1920482
TreeView+ depends on / blocked
 
Reported: 2020-12-24 09:25 UTC by huirwang
Modified: 2021-02-24 15:49 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1920482 (view as bug list)
Environment:
Last Closed: 2021-02-24 15:48:24 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ovn-kubernetes pull 422 0 None closed Bug 1910533: Configure GARP for egress IP re-assignment 2021-02-16 15:45:25 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:49:18 UTC

Description huirwang 2020-12-24 09:25:30 UTC
Description of problem:
It takes about 5 minutes for EgressIP failover to work.

Version-Release number of selected component (if applicable):
4.7.0-0.nightly-2020-12-21-131655 

How reproducible:
Sometimes

Steps to Reproduce:
1. Label two nodes as EgressIP nodes.
2. Create EgressIP object
oc get egressip -o yaml
apiVersion: v1
items:
- apiVersion: k8s.ovn.org/v1
  kind: EgressIP
  metadata:
    creationTimestamp: "2020-12-24T08:58:02Z"
    generation: 2
    managedFields:
    - apiVersion: k8s.ovn.org/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:spec:
          .: {}
          f:egressIPs: {}
          f:namespaceSelector:
            .: {}
            f:matchLabels:
              .: {}
              f:team: {}
          f:podSelector:
            .: {}
            f:matchLabels:
              .: {}
              f:team: {}
      manager: oc
      operation: Update
      time: "2020-12-24T08:58:02Z"
    - apiVersion: k8s.ovn.org/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          .: {}
          f:items: {}
      manager: ovnkube
      operation: Update
      time: "2020-12-24T08:58:02Z"
    name: egressip2
    resourceVersion: "551266"
    uid: 2561ea84-af86-4f08-a085-3e0eabac235b
  spec:
    egressIPs:
    - 172.31.249.203
    - 172.31.249.202
    namespaceSelector:
      matchLabels:
        team: red
    podSelector:
      matchLabels:
        team: blue
  status:
    items:
    - egressIP: 172.31.249.203
      node: huirwang-470-rgw66-master-1
    - egressIP: 172.31.249.202
      node: huirwang-470-rgw66-master-2
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

3. Create ns hrw and pods in it, label the pods and namespace according to above matchLabels.

4. From one pod to access the outside, meanwhile, stop kubelet service on the node: huirwang-470-rgw66-master-1 which will make the node NotReady.

oc rsh -n hrw test-rc-gcw4v
~ $ while true; do date; curl 172.31.249.80:9095 --connect-timeout 2;sleep 2;done

172.31.249.203Thu Dec 24 09:03:51 UTC 2020
172.31.249.203Thu Dec 24 09:03:53 UTC 2020
172.31.249.203Thu Dec 24 09:03:55 UTC 2020
172.31.249.203Thu Dec 24 09:03:57 UTC 2020
172.31.249.203Thu Dec 24 09:03:59 UTC 2020
172.31.249.203Thu Dec 24 09:04:01 UTC 2020
172.31.249.203Thu Dec 24 09:04:03 UTC 2020
172.31.249.203Thu Dec 24 09:04:05 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:04:09 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:04:13 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:04:18 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:04:22 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:04:26 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:04:30 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:04:34 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:04:38 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:04:42 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:04:46 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:04:50 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:04:54 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:04:58 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:05:02 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:05:06 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:05:10 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:05:14 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:05:18 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:05:22 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:05:26 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:05:30 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:05:34 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:05:38 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:05:42 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:05:46 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:05:50 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:05:54 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:05:58 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:06:02 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:06:06 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:06:10 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:06:14 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:06:18 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:06:22 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:06:26 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:06:30 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:06:34 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:06:38 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:06:42 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:06:46 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:06:50 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:06:54 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:06:58 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:07:02 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:07:06 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:07:10 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:07:14 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:07:18 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:07:22 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:07:26 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:07:30 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:07:34 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:07:38 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:07:42 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:07:46 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:07:50 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:07:54 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:07:58 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:08:02 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:08:06 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:08:10 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:08:14 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:08:18 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:08:22 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:08:26 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:08:30 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:08:34 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:08:38 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:08:42 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:08:46 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:08:50 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:08:54 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:08:58 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:09:02 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:09:06 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:09:10 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:09:14 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:09:18 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:09:22 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:09:26 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:09:30 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:09:34 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:09:38 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:09:42 UTC 2020
curl: (28) Connection timed out after 2001 milliseconds
Thu Dec 24 09:09:46 UTC 2020
curl: (28) Connection timed out after 2000 milliseconds
Thu Dec 24 09:09:50 UTC 2020
172.31.249.203Thu Dec 24 09:09:52 UTC 2020
172.31.249.203Thu Dec 24 09:09:54 UTC 2020
172.31.249.203Thu Dec 24 09:09:56 UTC 2020
172.31.249.203Thu Dec 24 09:09:58 UTC 2020
172.31.249.203Thu Dec 24 09:10:00 UTC 2020
172.31.249.203Thu Dec 24 09:10:02 UTC 2020




Actual results:

The change in egressip object is very soon, but the connection getting work is slow.  It takes more than 5 minutes to get it work.

oc get egressip -o yaml
apiVersion: v1
items:
- apiVersion: k8s.ovn.org/v1
  kind: EgressIP
  metadata:
    creationTimestamp: "2020-12-24T08:58:02Z"
    generation: 5
    managedFields:
    - apiVersion: k8s.ovn.org/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:spec:
          .: {}
          f:egressIPs: {}
          f:namespaceSelector:
            .: {}
            f:matchLabels:
              .: {}
              f:team: {}
          f:podSelector:
            .: {}
            f:matchLabels:
              .: {}
              f:team: {}
      manager: oc
      operation: Update
      time: "2020-12-24T08:58:02Z"
    - apiVersion: k8s.ovn.org/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          .: {}
          f:items: {}
      manager: ovnkube
      operation: Update
      time: "2020-12-24T08:58:02Z"
    name: egressip2
    resourceVersion: "553045"
    uid: 2561ea84-af86-4f08-a085-3e0eabac235b
  spec:
    egressIPs:
    - 172.31.249.203
    - 172.31.249.202
    namespaceSelector:
      matchLabels:
        team: red
    podSelector:
      matchLabels:
        team: blue
  status:
    items:
    - egressIP: 172.31.249.203
      node: huirwang-470-rgw66-master-2
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""


Expected results:

The failover should work very soon.

Additional info:
I cannot reproduce it each time, but found this happen about 2~3 times.

Comment 2 Alexander Constantinescu 2021-01-06 15:59:35 UTC
FYI: Upstream PR: https://github.com/ovn-org/ovn-kubernetes/pull/1939

Comment 7 errata-xmlrpc 2021-02-24 15:48:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.