Bug 1741066

Summary: MCO operation seems modify node resource result in it becomes unschedulable
Product: OpenShift Container Platform Reporter: ge liu <geliu>
Component: Machine Config OperatorAssignee: Kirsten Garrison <kgarriso>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Micah Abbott <miabbott>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.2.0CC: chuyu, jialiu, kgarriso, xxia
Target Milestone: ---Keywords: Regression
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-27 16:58:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 1 Stephen Greene 2019-08-15 21:42:41 UTC
Can you provide additional information about this bug? Where was the cluster used deployed? How where the kube node resources modified?

Could you also provide a complete must-gather, as the must-gather tarball provided is missing some important info, namely the `openshift-machine-config-operator` namespace folder. (ie. running the must-gather binary without any operator parameters).

Comment 2 Stephen Greene 2019-08-16 22:02:38 UTC
How long was the affected master reporting as unschedulable for? I have tried to re-create this bug several times, and have noticed that once the created MC has been successfully rolled out to all of the masters, the master nodes are schedulable again.  During the upgrade process, masters will be marked as unschedulable one at at time to allow for the upgrades to happen.  If you could please provide the complete must-gather logs, then I can confirm that the master node was reporting unschedulable since it was in the process of updating.  If you are noticing master nodes being marked as unschedulable even after the new MC is successfully rolled out, please advise.

Comment 3 ge liu 2019-08-19 09:46:53 UTC
The test env have been reclaimed already, and I can't recreated this issue now, so I can't provide more info. it happened several times in last week, the cluster deployed on aws, if the node have been in unschedulalbe statue, it will not recover(more than several hours).

Comment 4 Kirsten Garrison 2019-08-27 16:58:24 UTC
As we aren't able to reproduce, please reopen if you encounter again and provide full logs.