Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1977352

Summary: [4.8.0] [SNO] No DNS to cluster API from assisted-installer-controller
Product: OpenShift Container Platform Reporter: Igal Tsoiref <itsoiref>
Component: assisted-installerAssignee: Igal Tsoiref <itsoiref>
assisted-installer sub component: Installer QA Contact: Udi Kalifon <ukalifon>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: high CC: alazar, aos-bugs, asegurap, ercohen, itsoiref, lgamliel, mko
Version: 4.8Keywords: Triaged
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: AI-Team-Core
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1976769 Environment:
Last Closed: 2021-07-27 23:13:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1976769    
Bug Blocks: 1977292    

Description Igal Tsoiref 2021-06-29 13:58:54 UTC
+++ This bug was initially created as a clone of Bug #1976769 +++

Description of problem:
Link to the cluster - https://cloud.redhat.com/openshift/assisted-installer/clusters/28c7e3d1-90ae-47bc-9c59-ad9dc1260160

The assisted installer controller is failing to resolve the cluster API in case of SNO installation.
The only DNS entries configured in dnsmasq are:
1.  api-int...
2.  *.apps...

This cause 2 different issues:
1. the cotroller fail to apply costum manifests (OLM manifests):

time="2021-06-27T20:30:03Z" level=error msg="Failed to apply manifest file." error="failed executing bash [-c oc --kubeconfig=/tmp/controller-custom-manifests-114984170/kubeconfig-noingress apply -f /tmp/controller-custom-manifests-114984170/custom_manifests.yaml], Error exit status 1, LastOutput \"... -114984170/custom_manifests.yaml\": Get \"https://api.pc-openshift.hokd.pro-crafting.com:6443/api?timeout=32s\": dial tcp: lookup api.pc-openshift.hokd.pro-crafting.com on 136.243.34.170:53: no such host\""

This issue cause an installation failure

2. The cotroller fail to run the must-gather:
 
time="2021-06-27T20:46:50Z" level=info msg="failed executing bash [-c cd /tmp/controller-must-gather-logs-680691528 && oc --kubeconfig=/tmp/controller-must-gather-logs-680691528/kubeconfig-noingress adm must-gather --image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8f4cc4b4c95cfebdb701f8de519a0d5ac38111b4f173913fcf61956655072d65]
... (lot's of forbidden errors)
Unable to connect to the server: dial tcp: lookup api.pc-openshift.hokd.pro-crafting.com on 136.243.34.170:53: no such host\""

Version-Release number of selected component (if applicable):


How reproducible:
100% if you enable LSO or CNV when installing SNO

Steps to Reproduce:
1. Install SNO from here https://cloud.redhat.com/openshift/assisted-installer/clusters
2. Enable CNV and LSO
3.

Actual results:
While the CVO sttatus is avilable and the OCP installation completed successfully the failure to apply the OLM manifests led to a timeout that failed the installation 

Events:
/27/2021, 11:38:09 PM	
error
 Host static.170.34.243.136.clients.your-server.de: updated status from "installed" to "error" (Host is part of a cluster that failed to install)
6/27/2021, 11:38:03 PM	
critical
 Failed installing cluster pc-openshift. Reason: timed out
6/27/2021, 11:38:03 PM	Updated status of cluster pc-openshift to error
6/27/2021, 11:24:02 PM	Cluster version status: available message: Done applying 4.8.0-rc.0

Expected results:

Installation success

Additional info:

--- Additional comment from lgamliel on 20210629T10:16:01

When did the allowed CNV/LSO on SNO
we expose it in the UI on 23/06/2021

When did we deploy the release with this change:  https://github.com/openshift/assisted-installer/pull/271
v1.0.21.3

What is our success rate (should be 0) when installing SNO with CNV/LSO since the above release

Comment 3 Udi Kalifon 2021-07-05 07:51:59 UTC
In /etc/dnsmasq.d/single-node.conf I see:

address=/apps.titan...redhat.com/192.168.123.119
address=/api-int.titan...redhat.com/192.168.123.119
address=/api.titan...redhat.com/192.168.123.119

Shouldn't the first line start with a *? For example:
address=/*.apps.titan...redhat.com/192.168.123.119

Comment 5 Udi Kalifon 2021-07-09 09:11:19 UTC
Verified that the exposed routes under *.apps are reachable.

Comment 8 errata-xmlrpc 2021-07-27 23:13:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Comment 9 Red Hat Bugzilla 2023-09-15 01:10:43 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days