Bug 2104657
| Summary: | Openshift private cluster fails to install due to missing worker nodes on ASH | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Mike Gahagan <mgahagan> |
| Component: | Installer | Assignee: | OCP Installer <ocp-installer> |
| Installer sub component: | openshift-installer | QA Contact: | Gaoyun Pei <gpei> |
| Status: | CLOSED DEFERRED | Docs Contact: | |
| Severity: | high | ||
| Priority: | medium | CC: | mimccune, miyadav, padillon, talessio, zhsun |
| Version: | 4.11 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.12.z | ||
| Hardware: | x86_64 | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-03-09 01:23:30 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 2060508 | ||
|
Description
Mike Gahagan
2022-07-06 19:39:05 UTC
Full Status message from oc describe:
Status:
Conditions:
Last Transition Time: 2022-07-06T13:26:47Z
Status: True
Type: Drainable
Last Transition Time: 2022-07-06T13:26:47Z
Message: Instance has not been created
Reason: InstanceNotCreated
Severity: Warning
Status: False
Type: InstanceExists
Last Transition Time: 2022-07-06T13:26:47Z
Status: True
Type: Terminable
Error Message: failed to reconcile machine "mgahagan220706-ff6dd-worker-mtcazs-c6mr4": network.LoadBalancersClient#Get: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code="ResourceNotFound" Message="The Resource 'Microsoft.Network/loadBalancers/mgahagan220706-ff6dd' under resource group 'mgahagan220706-ff6dd-rg' was not found. For more details please go to https://aka.ms/ARMResourceNotFoundFix"
Error Reason: InvalidConfiguration
Last Updated: 2022-07-06T13:27:42Z
Phase: Failed
Provider Status:
Conditions:
Last Transition Time: 2022-07-06T13:27:21Z
Message: failed to create nic mgahagan220706-ff6dd-worker-mtcazs-c6mr4-nic for machine mgahagan220706-ff6dd-worker-mtcazs-c6mr4: unable to create VM network interface: load balancer mgahagan220706-ff6dd not found: network.LoadBalancersClient#Get: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code="ResourceNotFound" Message="The Resource 'Microsoft.Network/loadBalancers/mgahagan220706-ff6dd' under resource group 'mgahagan220706-ff6dd-rg' was not found. For more details please go to https://aka.ms/ARMResourceNotFoundFix"
Reason: MachineCreationFailed
Status: True
Type: MachineCreated
Metadata:
talking with the cloud team, we need to do more investigation around the "Internal" publishing option, it's possible we missed a case when adding the implementation for the create machine logic. we are trying to figure out how many users this might affect, @mgahagan do you have any information about the prevalence of private clusters, or perhaps a little more information about this deployment method? we aren't sure that we are testing the private cluster option thoroughly and want to understand more about this use case. I know that private clusters are quite a common request on Azure public cloud as well as other cloud providers, I have not heard of any requests for private clusters on ASH specifically. In the case of ASH it appears that the entire environment is private so I'm not sure what the use case for an internal-only API is since the whole cloud is essentially "private". When creating private clusters on Azure public I still see an additional load balancer is created but there are no rules assigned to it, I'm not sure if that's helpful in debugging the issue we are seeing here. thanks Mike, it's helpful for us in building a little more context around the issue. we discussed this issue again during our team standup and we would like to reach out to our ASH contacts to understand a little more about the differences between public Azure and ASH in these scenarios, we had seen an issue related to availability sets in the past when we encountered this. we are going to try and replicate this as well. Is there a must gather associated with this issue? Or if someone can reproduce, could they provide a must gather (perhaps via google drive). Having reviewed the attached bug, I suspect this is an issue with the input the installer is providing to Machine API. Based on the knowledge we have so far, if this has never worked, this wouldn't break any existing user and, therefore, should not be considered a blocker for this release. We will try to investigate during the next sprint, though a must gather would help us to get a conclusion quicker as our access to ASH environments is limited OpenShift has moved to Jira for its defect tracking! This bug can now be found in the OCPBUGS project in Jira. https://issues.redhat.com/browse/OCPBUGS-9367 |