Bug 1732609

Summary: openshift-machine-api nodeline-controller stuck in tight loop
Product: OpenShift Container Platform Reporter: Michael Gugino <mgugino>
Component: Cloud ComputeAssignee: Alberto <agarcial>
Status: CLOSED DEFERRED QA Contact: Jianwei Hou <jhou>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.1.0CC: agarcial
Target Milestone: ---Keywords: Reopened
Target Release: 4.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-02-21 09:39:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Michael Gugino 2019-07-23 21:17:42 UTC
Description of problem:

Nodelink controller never rests.  With a very small amount of nodes and machines, it is constantly emitting dozens of logs ever second, forever.

We should figure out why it's burning up some much CPU time and possibly have it trigger actions on node/machine update events.

Comment 1 Alberto 2019-07-24 07:29:25 UTC
This is by design atm. It's triggering actions on node/machine update events which happen very frequently due to node heartbeating status updates. Once we make machines "fire and forget" we can let the nodelink controller to stop watching nodes and only watch machines which will reduce the reconciliation cadence

Comment 2 Alberto 2020-02-21 09:39:08 UTC
I'm closing this for now as per https://bugzilla.redhat.com/show_bug.cgi?id=1732609#c1.
In the near future we'll hopefylly drop the node link controller completekly, move the linking logic in to the machine controller and probably slighty reducing its resync period.