Bug 2060508

Summary: Deployment of private cluster fails on Azure Stack Hub
Product: OpenShift Container Platform Reporter: bmangoen
Component: InstallerAssignee: Patrick Dillon <padillon>
Installer sub component: openshift-installer QA Contact: Mike Gahagan <mgahagan>
Status: CLOSED DEFERRED Docs Contact:
Severity: high    
Priority: high CC: fgrosjea, mgahagan, padillon, sdodson
Version: 4.10-rc2   
Target Milestone: ---   
Target Release: 4.11.z   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-03-09 01:14:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 2104657    
Bug Blocks:    
Description Flags
openshift_install.log none

Description bmangoen 2022-03-03 16:26:31 UTC
Created attachment 1864040 [details]


$ openshift-install version
openshift-install 4.10.0-rc.2
built from commit 1ddc64b523042f450f21cc45f1150d29cb01ecc1
release image quay.io/openshift-release-dev/ocp-release@sha256:da16f451dddad3ca26ecd7c5c16423fb3658d7883dadc108977675edf80c696a
release architecture amd64

Platform: Azure Stack Hub

During the private cluster installation and with using an existing vnet, Terraform fails when the `publish: Internal` is set into the install-config.yaml.

It looks like that there is no default value for the following variables https://github.com/openshift/installer/blob/release-4.10/data/data/azurestack/bootstrap/variables.tf#L1-L9
While they are set for Azure (https://github.com/openshift/installer/blob/release-4.10/data/data/azure/bootstrap/variables.tf#L1-L11)

Here are some type issues (`[]` instead of `null`):
* https://github.com/openshift/installer/blob/release-4.10/data/data/azurestack/bootstrap/main.tf#L102  
* https://github.com/openshift/installer/blob/release-4.10/data/data/azurestack/cluster/master/master.tf#L21

Comment 1 Patrick Dillon 2022-03-03 20:46:29 UTC
Note that api and *.apps DNS records are not being created with private clusters.

For the api record, the installer (during a private install) should create it but point it at the internal load balancer.

For *.apps we may need to create a BZ with the ingress operator.

Comment 5 dnastaci 2022-03-15 14:46:51 UTC
We are hitting what seems to be a similar problem with the IPI installers and a private cluster in 4.10.3, which makes sense given that the current target release for the bug is 4.11.0.

Our master workers are all stuck in the initialization cycle, with this message:

> 2392.938646] ignition[833]: GET error: Get "https://api-int.test41.stackpoc.cloudpak-bringup.com:22623/config/master": dial tcp: lookup api-int.test41.stackpoc.cloudpak-bringup.com on no such host

In the UPI process, this address would be created as part of the user-provisioned resources, but I don't see anywhere in the IPI instruction how to create the corresponding infrastructure resources. I went as far as going back to the UPI instructions and executing the steps under "Creating networking and load balancing components in Azure Stack Hub": https://docs.openshift.com/container-platform/4.10/installing/installing_azure_stack_hub/installing-azure-stack-hub-user-infra.html#installation-creating-azure-dns_installing-azure-stack-hub-user-infra

That created the address, but there is nothing that will put the workers behind the availability set or load balancer created of those instructions.

Comment 6 dnastaci 2022-03-15 14:50:46 UTC
Regarding the target release, 4.11, is there anything that could be done via docs for 4.10?

I think private clusters may be quite common for experimentation purposes or even initial installation followed by creation of public DNS entries.

Comment 7 Matthew Staebler 2022-03-15 15:24:09 UTC
(In reply to dnastaci from comment #6)
> Regarding the target release, 4.11, is there anything that could be done via
> docs for 4.10?

This bug fix will be backported to a 4.10 z-stream release.

Comment 8 Mike Gahagan 2022-03-15 20:54:31 UTC
I tried this with 4.11.0-0.nightly-2022-03-15-060211 and I only got dns records for api-int and *.apps, this eventually causes the installation to fail because the installer is trying to go to the api hostname which has no DNS entry.

Comment 9 Matthew Staebler 2022-03-15 21:51:31 UTC
The BZ is linked to the wrong PR. It should be linked to https://github.com/openshift/installer/pull/5691, which has not yet merged.

Comment 10 Matthew Staebler 2022-03-15 22:50:46 UTC
(In reply to Matthew Staebler from comment #9)
> The BZ is linked to the wrong PR. It should be linked to
> https://github.com/openshift/installer/pull/5691, which has not yet merged.

Ugh. I'm mixed up on this. The PR's were correctly linked. The issue in this BZ is that terraform fails. The API DNS records creation is covered in https://bugzilla.redhat.com/show_bug.cgi?id=2061549.

Comment 11 Mike Gahagan 2022-03-16 14:29:42 UTC
I'll re-test this once the fix for https://bugzilla.redhat.com/show_bug.cgi?id=2061549 lands

Comment 22 Shiftzilla 2023-03-09 01:14:11 UTC
OpenShift has moved to Jira for its defect tracking! This bug can now be found in the OCPBUGS project in Jira.