Bug 2060508 - Deployment of private cluster fails on Azure Stack Hub
Summary: Deployment of private cluster fails on Azure Stack Hub
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.10-rc2
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.11.z
Assignee: Patrick Dillon
QA Contact: Mike Gahagan
Depends On: 2104657
TreeView+ depends on / blocked
Reported: 2022-03-03 16:26 UTC by bmangoen
Modified: 2023-03-09 01:14 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2023-03-09 01:14:11 UTC
Target Upstream Version:

Attachments (Terms of Use)
openshift_install.log (96.37 KB, text/plain)
2022-03-03 16:26 UTC, bmangoen
no flags Details

System ID Private Priority Status Summary Last Updated
Github openshift installer pull 5678 0 None Merged Bug 2060508: azurestack: fix backend address pools for internal publishing 2022-03-15 22:55:35 UTC

Description bmangoen 2022-03-03 16:26:31 UTC
Created attachment 1864040 [details]


$ openshift-install version
openshift-install 4.10.0-rc.2
built from commit 1ddc64b523042f450f21cc45f1150d29cb01ecc1
release image quay.io/openshift-release-dev/ocp-release@sha256:da16f451dddad3ca26ecd7c5c16423fb3658d7883dadc108977675edf80c696a
release architecture amd64

Platform: Azure Stack Hub

During the private cluster installation and with using an existing vnet, Terraform fails when the `publish: Internal` is set into the install-config.yaml.

It looks like that there is no default value for the following variables https://github.com/openshift/installer/blob/release-4.10/data/data/azurestack/bootstrap/variables.tf#L1-L9
While they are set for Azure (https://github.com/openshift/installer/blob/release-4.10/data/data/azure/bootstrap/variables.tf#L1-L11)

Here are some type issues (`[]` instead of `null`):
* https://github.com/openshift/installer/blob/release-4.10/data/data/azurestack/bootstrap/main.tf#L102  
* https://github.com/openshift/installer/blob/release-4.10/data/data/azurestack/cluster/master/master.tf#L21

Comment 1 Patrick Dillon 2022-03-03 20:46:29 UTC
Note that api and *.apps DNS records are not being created with private clusters.

For the api record, the installer (during a private install) should create it but point it at the internal load balancer.

For *.apps we may need to create a BZ with the ingress operator.

Comment 5 dnastaci 2022-03-15 14:46:51 UTC
We are hitting what seems to be a similar problem with the IPI installers and a private cluster in 4.10.3, which makes sense given that the current target release for the bug is 4.11.0.

Our master workers are all stuck in the initialization cycle, with this message:

> 2392.938646] ignition[833]: GET error: Get "https://api-int.test41.stackpoc.cloudpak-bringup.com:22623/config/master": dial tcp: lookup api-int.test41.stackpoc.cloudpak-bringup.com on no such host

In the UPI process, this address would be created as part of the user-provisioned resources, but I don't see anywhere in the IPI instruction how to create the corresponding infrastructure resources. I went as far as going back to the UPI instructions and executing the steps under "Creating networking and load balancing components in Azure Stack Hub": https://docs.openshift.com/container-platform/4.10/installing/installing_azure_stack_hub/installing-azure-stack-hub-user-infra.html#installation-creating-azure-dns_installing-azure-stack-hub-user-infra

That created the address, but there is nothing that will put the workers behind the availability set or load balancer created of those instructions.

Comment 6 dnastaci 2022-03-15 14:50:46 UTC
Regarding the target release, 4.11, is there anything that could be done via docs for 4.10?

I think private clusters may be quite common for experimentation purposes or even initial installation followed by creation of public DNS entries.

Comment 7 Matthew Staebler 2022-03-15 15:24:09 UTC
(In reply to dnastaci from comment #6)
> Regarding the target release, 4.11, is there anything that could be done via
> docs for 4.10?

This bug fix will be backported to a 4.10 z-stream release.

Comment 8 Mike Gahagan 2022-03-15 20:54:31 UTC
I tried this with 4.11.0-0.nightly-2022-03-15-060211 and I only got dns records for api-int and *.apps, this eventually causes the installation to fail because the installer is trying to go to the api hostname which has no DNS entry.

Comment 9 Matthew Staebler 2022-03-15 21:51:31 UTC
The BZ is linked to the wrong PR. It should be linked to https://github.com/openshift/installer/pull/5691, which has not yet merged.

Comment 10 Matthew Staebler 2022-03-15 22:50:46 UTC
(In reply to Matthew Staebler from comment #9)
> The BZ is linked to the wrong PR. It should be linked to
> https://github.com/openshift/installer/pull/5691, which has not yet merged.

Ugh. I'm mixed up on this. The PR's were correctly linked. The issue in this BZ is that terraform fails. The API DNS records creation is covered in https://bugzilla.redhat.com/show_bug.cgi?id=2061549.

Comment 11 Mike Gahagan 2022-03-16 14:29:42 UTC
I'll re-test this once the fix for https://bugzilla.redhat.com/show_bug.cgi?id=2061549 lands

Comment 22 Shiftzilla 2023-03-09 01:14:11 UTC
OpenShift has moved to Jira for its defect tracking! This bug can now be found in the OCPBUGS project in Jira.


Note You need to log in before you can comment on or make changes to this bug.