Bug 1977352 - [4.8.0] [SNO] No DNS to cluster API from assisted-installer-controller
Summary: [4.8.0] [SNO] No DNS to cluster API from assisted-installer-controller
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: assisted-installer
Version: 4.8
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: 4.8.0
Assignee: Igal Tsoiref
QA Contact: Udi Kalifon
URL:
Whiteboard: AI-Team-Core
Depends On: 1976769
Blocks: 1977292
TreeView+ depends on / blocked
 
Reported: 2021-06-29 13:58 UTC by Igal Tsoiref
Modified: 2023-09-15 01:10 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1976769
Environment:
Last Closed: 2021-07-27 23:13:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift assisted-service pull 2123 0 None closed OCPBUGSM-31519: Adding api.<cluster-name>.<domain> entry to sno dnsmasq configuration 2021-06-29 13:59:00 UTC
Github openshift assisted-service pull 2126 0 None open Bug 1977352: Adding api.<cluster-name>.<domain> entry to sno dnsmasq configuration 2021-06-30 09:26:04 UTC
Red Hat Bugzilla 1976769 1 high CLOSED [master] [SNO] No DNS to cluster API from assisted-installer-controller 2022-08-28 08:47:34 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 23:14:11 UTC

Description Igal Tsoiref 2021-06-29 13:58:54 UTC
+++ This bug was initially created as a clone of Bug #1976769 +++

Description of problem:
Link to the cluster - https://cloud.redhat.com/openshift/assisted-installer/clusters/28c7e3d1-90ae-47bc-9c59-ad9dc1260160

The assisted installer controller is failing to resolve the cluster API in case of SNO installation.
The only DNS entries configured in dnsmasq are:
1.  api-int...
2.  *.apps...

This cause 2 different issues:
1. the cotroller fail to apply costum manifests (OLM manifests):

time="2021-06-27T20:30:03Z" level=error msg="Failed to apply manifest file." error="failed executing bash [-c oc --kubeconfig=/tmp/controller-custom-manifests-114984170/kubeconfig-noingress apply -f /tmp/controller-custom-manifests-114984170/custom_manifests.yaml], Error exit status 1, LastOutput \"... -114984170/custom_manifests.yaml\": Get \"https://api.pc-openshift.hokd.pro-crafting.com:6443/api?timeout=32s\": dial tcp: lookup api.pc-openshift.hokd.pro-crafting.com on 136.243.34.170:53: no such host\""

This issue cause an installation failure

2. The cotroller fail to run the must-gather:
 
time="2021-06-27T20:46:50Z" level=info msg="failed executing bash [-c cd /tmp/controller-must-gather-logs-680691528 && oc --kubeconfig=/tmp/controller-must-gather-logs-680691528/kubeconfig-noingress adm must-gather --image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8f4cc4b4c95cfebdb701f8de519a0d5ac38111b4f173913fcf61956655072d65]
... (lot's of forbidden errors)
Unable to connect to the server: dial tcp: lookup api.pc-openshift.hokd.pro-crafting.com on 136.243.34.170:53: no such host\""

Version-Release number of selected component (if applicable):


How reproducible:
100% if you enable LSO or CNV when installing SNO

Steps to Reproduce:
1. Install SNO from here https://cloud.redhat.com/openshift/assisted-installer/clusters
2. Enable CNV and LSO
3.

Actual results:
While the CVO sttatus is avilable and the OCP installation completed successfully the failure to apply the OLM manifests led to a timeout that failed the installation 

Events:
/27/2021, 11:38:09 PM	
error
 Host static.170.34.243.136.clients.your-server.de: updated status from "installed" to "error" (Host is part of a cluster that failed to install)
6/27/2021, 11:38:03 PM	
critical
 Failed installing cluster pc-openshift. Reason: timed out
6/27/2021, 11:38:03 PM	Updated status of cluster pc-openshift to error
6/27/2021, 11:24:02 PM	Cluster version status: available message: Done applying 4.8.0-rc.0

Expected results:

Installation success

Additional info:

--- Additional comment from lgamliel on 20210629T10:16:01

When did the allowed CNV/LSO on SNO
we expose it in the UI on 23/06/2021

When did we deploy the release with this change:  https://github.com/openshift/assisted-installer/pull/271
v1.0.21.3

What is our success rate (should be 0) when installing SNO with CNV/LSO since the above release

Comment 3 Udi Kalifon 2021-07-05 07:51:59 UTC
In /etc/dnsmasq.d/single-node.conf I see:

address=/apps.titan...redhat.com/192.168.123.119
address=/api-int.titan...redhat.com/192.168.123.119
address=/api.titan...redhat.com/192.168.123.119

Shouldn't the first line start with a *? For example:
address=/*.apps.titan...redhat.com/192.168.123.119

Comment 5 Udi Kalifon 2021-07-09 09:11:19 UTC
Verified that the exposed routes under *.apps are reachable.

Comment 8 errata-xmlrpc 2021-07-27 23:13:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Comment 9 Red Hat Bugzilla 2023-09-15 01:10:43 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.