Bug 1882101

Summary: machine-api-operator fails to deploy due to security constraint
Product: OpenShift Container Platform Reporter: Michael Gugino <mgugino>
Component: kube-apiserverAssignee: David Eads <deads>
Status: CLOSED DUPLICATE QA Contact: Ke Wang <kewang>
Severity: high Docs Contact:
Priority: high    
Version: 4.6CC: aos-bugs, deads, mfojtik, wking, xxia, yanyang
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-09-30 14:49:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Michael Gugino 2020-09-23 19:08:10 UTC
Some CI runs are failing on Azure such as: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-4.6/1308571357872656384

First indication is there are no worker nodes.  Upon further investigation, the machine-controller is not deployed, and this is because the machine-api-operator is not deployed.  Checking artifacts/deployment.json reveals:

{
    "lastTransitionTime": "2020-09-23T01:15:10Z",
    "lastUpdateTime": "2020-09-23T01:15:10Z",
    "message": "Deployment does not have minimum availability.",
    "reason": "MinimumReplicasUnavailable",
    "status": "False",
    "type": "Available"
},
{
    "lastTransitionTime": "2020-09-23T01:15:10Z",
    "lastUpdateTime": "2020-09-23T01:15:10Z",
    "message": "pods \"machine-api-operator-5c99d74d58-\" is forbidden: unable to validate against any security context constraint: []",
    "reason": "FailedCreate",
    "status": "True",
    "type": "ReplicaFailure"
},
{
    "lastTransitionTime": "2020-09-23T01:25:11Z",
    "lastUpdateTime": "2020-09-23T01:25:11Z",
    "message": "ReplicaSet \"machine-api-operator-5c99d74d58\" has timed out progressing.",
    "reason": "ProgressDeadlineExceeded",
    "status": "False",
    "type": "Progressing"
}


Many operators seem broken, including kube-apiserver:
Operator unavailable (StaticPods_ZeroNodesActive): StaticPodsAvailable: 0 nodes are active; 3 nodes are at revision 0; 0 nodes have achieved new revision 2

Unsure what the root cause is.

For reference, the initial 3 master machines are created by the installer and join the bootstrap cluster.  The machine-api has no control over the initial creation or configuration of the master machines.

Comment 1 David Eads 2020-09-23 19:18:15 UTC
I would like to see an [early] test that checks to see if we have the events and flake.  I think this may happen more often on azure: something about the LB perhaps?

Also, I think I see logs that indicate an io timeout from CVO to internal LB.

Comment 3 David Eads 2020-09-30 14:49:08 UTC

*** This bug has been marked as a duplicate of bug 1883458 ***