Bug 2111246 - Unable to assign nodes for EgressIP even if the egress-assignable label is set
Summary: Unable to assign nodes for EgressIP even if the egress-assignable label is set
Keywords:
Status: CLOSED DUPLICATE of bug 2105657
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.10
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Ben Bennett
QA Contact: Anurag saxena
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-07-26 19:58 UTC by Will Russell
Modified: 2022-07-27 13:50 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-07-27 13:50:37 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Will Russell 2022-07-26 19:58:32 UTC
Description of problem:
MachineSet is defined with spec option for proper label of egress-assignable=""

spec:
  lifecycleHooks: {}
  metadata:
    labels:
      k8s.ovn.org/egress-assignable: ''

Machineset scale-up will deploy a properly labeled node. However, OVN will not recognize the target node as a valid host, and will alert that there are no available nodes for egress assignment until ovn-master pods are restarted.

Version-Release number of selected component (if applicable):
OCP 4.10.20


How reproducible:
Every time

Steps to Reproduce:
~~~
I deleted each ovnkube-master pod starting with the pod deployed on master-0 then master-1 then master-2 and checked the EgressIP after each one. After I restarted the pod deployed on master-2, the EgressIP got assigned to the worker node.


$ oc get egressips -A
NAME                   EGRESSIPS      ASSIGNED NODE                  ASSIGNED EGRESSIPS
egress-tst             10.201.35.69   <*>-worker-mgc6q               10.201.35.69
I repeated the previous steps that reproduced the error:


- I scale up the worker MachineSet to 2

- I wait for the new worker node to be marked as Ready

- I marked the node that has the EgressIP for deletion using the machine annotation machine.openshift.io/cluster-api-delete-machine = true

- I scaled the worker MachineSet to 1

- The EgressIP is no longer assigned to a node


$ oc get egressips -A
NAME                   EGRESSIPS      ASSIGNED NODE   ASSIGNED EGRESSIPS
egress-tst             10.201.35.69
~~~

Actual results:
Nodes are not adequately detected as valid hosts for egressIP if deployed as part of a machineset instead of issued a manual label update.
Restarting Ovn-master pods are necessary to trigger egressIP allocation


Expected results:
New Node scale up should be viewed with same weight as fresh label or pod restart; could be race condition- checking too early for condition of node, marked failed and not re-checked (speculation).

Additional info:
See attached case; SOSreports for all nodes available + must-gather and in next comment is detailed validation steps confirming that all else is as expected.

See previous BZ that is very similar to this one for 4.8 (eratta):
https://bugzilla.redhat.com/show_bug.cgi?id=1942856


Note You need to log in before you can comment on or make changes to this bug.