Bug 1871856

Summary: ovn-kubernetes egress-ip controller doesn't seem to handle ovn losses
Product: OpenShift Container Platform Reporter: Casey Callendrello <cdc>
Component: NetworkingAssignee: Alexander Constantinescu <aconstan>
Networking sub component: ovn-kubernetes QA Contact: Anurag saxena <anusaxen>
Status: CLOSED NOTABUG Docs Contact:
Severity: high    
Priority: unspecified CC: aconstan, bbennett, ricarril
Version: 4.6   
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-09-09 10:02:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Casey Callendrello 2020-08-24 13:10:13 UTC
All ovn-kubernetes control loops need to be able to handle the loss of the ovn northd. This means their reconciliation loop cannot short-circuit if, for example, annotations exist on the kubernetes object.

The EgressIP controller contains this comment:

> If the status is set at this point, then we know it's valid from syncEgressIP and we have no assignment to do.
> Just initialize all watchers (which should not re-create any already existing items in the OVN DB)

which implies that it does not correctly handle the loss of nbdb.

Please verify the code and, if necessary, ensure that nbdb entries are recreated as necessary.

Comment 1 Ricardo Carrillo Cruz 2020-08-24 14:11:07 UTC
Assigning to you Casey, if you want someone else to work it just give me a ping on Slack.

Comment 2 Alexander Constantinescu 2020-08-26 13:25:30 UTC
I don't follow on how any of that is a bug? Has it been observed? 

The comment cited:

> If the status is set at this point, then we know it's valid from syncEgressIP and we have no assignment to do.
> Just initialize all watchers (which should not re-create any already existing items in the OVN DB)

refers to the fact that we should short-circuit `assignEgressIPs` if the status is already valid when we reboot (as to avoid useless re-assignment). The nbdb re-creation has nothing to do with any of that. The database entities will be created/re-created when the egress IP matches a pod / namespace.

To be clear, that comment only affects the assignment procedure, it makes no reference to the DB creation/re-creation (which should work as it does for all other API resources).

Comment 3 Alexander Constantinescu 2020-08-26 13:36:49 UTC
Ahh, now I follow on where the confusion comes from. 

EgressIP has not been enabled for OCP yet. Any EgressIP creation on OCP has no impact on OVN yet, I will enable this the coming week. I've been holding off waiting for a new OVN version which was critical for egress IP to work. This has been done with PR: https://github.com/ovn-org/ovn-kubernetes/pull/1616 so now PR: https://github.com/ovn-org/ovn-kubernetes/pull/1567 should bring in the proper stuff. Once that is rebased downstream, egress IP will be enabled.

Comment 6 Alexander Constantinescu 2020-09-09 10:02:52 UTC
Closing this as this was clearly incorrectly created and should not be tested by QE.