Bug 1733271
Summary: | Machine-controller not creating Nodes for all machines | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Naveen Malik <nmalik> | ||||||||
Component: | Cloud Compute | Assignee: | Alberto <agarcial> | ||||||||
Status: | CLOSED NOTABUG | QA Contact: | Jianwei Hou <jhou> | ||||||||
Severity: | unspecified | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 4.1.z | CC: | agarcial, jeder, walters | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | 4.3.0 | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2019-11-06 12:43:55 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Naveen Malik
2019-07-25 14:49:25 UTC
Created attachment 1593429 [details]
nodelink-controller log
Created attachment 1593430 [details]
machine-controller log
Created attachment 1593431 [details]
controller-manager log
Probably this is similar to https://bugzilla.redhat.com/show_bug.cgi?id=1723955 look at `oc get csr`. This is expected when aws limit is reached. You'll get a prometheus alert due to missmatch between nodes and machines and once the instance is created, you'll need to manually approve by design. If anything we could try to store the timestamp for the time the instance was actually created so machine approver will consider it legit so bumping to 4.3 to farther considering that Since we introduced machine phases this should be reflected in the machine phase as provisioning/failed giving a more meaningful output in addition to the alerts. Also multiple fixes were merge for the machine approver which tolerates a bigger timeout now. I'm closing this please reopen if still relevant |