Bug 1965034
| Summary: | Ignition fails to get machine config for worker nodes | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Arjun Naik <anaik> |
| Component: | Installer | Assignee: | aos-install |
| Installer sub component: | openshift-installer | QA Contact: | Gaoyun Pei <gpei> |
| Status: | CLOSED NOTABUG | Docs Contact: | |
| Severity: | unspecified | ||
| Priority: | unspecified | CC: | behoward, dornelas, gshereme, jaharrin, jerzhang, jligon, kbater, mrussell, mstaeble, nstielau, slowrie |
| Version: | 4.8 | Keywords: | ServiceDeliveryBlocker |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-06-21 17:32:58 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Arjun Naik
2021-05-26 15:24:16 UTC
Based on the machine-config-server logs, none of the logs had any requests from worker nodes (I think the one line of "Pool worker requested by address:" was your manual curl). This indicates that the worker nodes were unable to reach the MCS endpoint. The console logs also seem to point to this, which could mean a networking issue of some sort. I'm not very familiar with STS credentials, so maybe we need some other eyes on this. Moving to the CoreOS team to see if they have more insight from Ignition's perspective From the logs posted Ignition is receiving a 500 Internal Server Error when querying for the config from the MCS and is correctly retrying. Is there an actual hard-stop error that occurs or is it a timeout with this loop occuring? I'm quite surprised that there would be no logs in the MCS when it's consistently returning a 500 Internal Server Error, is there a case where that can occur @jerzhang? The MCS does have situations where it can return a StatusInternalServerError, but it would also be accompanied by error logs in the MCS. At the very least it would have logged the request before throwing the StatusInternalServerError, which made me suspect that the node wasn't able to actual reach the server. I'm not aware of another situation that the MCS would return StatusInternalServerError outside of those scenarios STS is an alternative new style of OCP installation. Instead of supplying AWS access key and secret access key and storing them in the cluster, Customer passes in role names to use. It's tech preview in 4.7: https://docs.openshift.com/container-platform/4.7/authentication/managing_cloud_provider_credentials/cco-mode-sts.html sending in 4 STS IAM roles works. We see the problem when sending in 5. Any chance there's some truncation happening somewhere? Can you please include full logs & a sanitized version of the Ignition config? The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |