Bug 1686358

Summary: [OSP] Bootstrap and api instance can not resolve internal osp with default subnet dns_nameservers
Product: OpenShift Container Platform Reporter: weiwei jiang <wjiang>
Component: InstallerAssignee: Eric Duen <eduen>
Installer sub component: openshift-installer QA Contact: weiwei jiang <wjiang>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: bleanhar
Version: 4.1.0   
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
CLOSED / CURRENTRELEASE
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-16 06:27:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description weiwei jiang 2019-03-07 10:20:28 UTC
Description of problem:
After openshift-install provision all OSP resources, found that api and bootstrap hang at boot stage.

(openstack) console log show wjiang-ocp-4s47r-bootstrap
[K[   [31m*[1;31m*[0m[31m*[0m] A start job is running for Ignition (disks) (1min 38s / no limit)[K[    [31m*[1;31m*[0m] A start job is running for Ignition (disks) (1min 38s / no limit)[K[     [31m*[0m] A start job is running for Ignition (disks) (1min 39s / no limit)[K[
   [31m*[1;31m*[0m] A start job is running for Ignition (disks) (1min 39s / no limit)[K[   [31m*[1;31m*[0m[31m*[0m] A start job is running for Ignition (disks) (1min 40s / no limit)[K[  [31m*[1;31m*[0m[31m* [0m] A start job is running for Ignition (disks) (1min 40s / no limit)[K[ [31m*[1;31m*[0m[31m*  [0m] A start job is running for Ignition (disks) (1min 41s / no limit)[K[[31m*[1;31m*[0m[31m*   [0m] A start job is running for Ignition (disks) (1min 41s / no limit)[K[[1;31m*[0m[31m*    [0m] A start job is running for Ignition (disks) (1min 42s / no limit)[K[[0m[31m*     [0m] A start job is running for Ignition (disks) (1min 42s / no limit)[  105.174332] ignition[456]: GET https://rhos-d.infra.prod.upshift.rdu2.redhat.com:13808/v1/AUTH_542c6ebd48bf40fa857fc245c7572e30/wjiang-ocp.shiftstack.com/load-balancer.ign?temp_url_sig=e553e9b920b2de5396d27c376bf3b56bfb525842&temp_url_expires=1551954402: attempt #22
[  105.382646] ignition[456]: GET error: Get https://rhos-d.infra.prod.upshift.rdu2.redhat.com:13808/v1/AUTH_542c6ebd48bf40fa857fc245c7572e30/wjiang-ocp.shiftstack.com/load-balancer.ign?temp_url_sig=e553e9b920b2de5396d27c376bf3b56bfb525842&temp_url_expires=1551954402: dial tcp: lookup rhos-d.infra.prod.upshift.rdu2.redhat.com on 128.31.24.11:53: server misbehaving


Version-Release number of the following components:
➜  installer git:(master) ✗ bin/openshift-install version                                      
bin/openshift-install unreleased-master-524-g13a752ea0fcae927cba6795782f87ffa332d5b75


How reproducible:
Always

Steps to Reproduce:
1. Setup cluster with openshift-installer on OSP
2. After all osp resources provisioned, check api and bootstrap boot log
3.

Actual results:
(openstack) console log show wjiang-ocp-4s47r-bootstrap
[K[   [31m*[1;31m*[0m[31m*[0m] A start job is running for Ignition (disks) (1min 38s / no limit)[K[    [31m*[1;31m*[0m] A start job is running for Ignition (disks) (1min 38s / no limit)[K[     [31m*[0m] A start job is running for Ignition (disks) (1min 39s / no limit)[K[
   [31m*[1;31m*[0m] A start job is running for Ignition (disks) (1min 39s / no limit)[K[   [31m*[1;31m*[0m[31m*[0m] A start job is running for Ignition (disks) (1min 40s / no limit)[K[  [31m*[1;31m*[0m[31m* [0m] A start job is running for Ignition (disks) (1min 40s / no limit)[K[ [31m*[1;31m*[0m[31m*  [0m] A start job is running for Ignition (disks) (1min 41s / no limit)[K[[31m*[1;31m*[0m[31m*   [0m] A start job is running for Ignition (disks) (1min 41s / no limit)[K[[1;31m*[0m[31m*    [0m] A start job is running for Ignition (disks) (1min 42s / no limit)[K[[0m[31m*     [0m] A start job is running for Ignition (disks) (1min 42s / no limit)[  105.174332] ignition[456]: GET https://rhos-d.infra.prod.upshift.rdu2.redhat.com:13808/v1/AUTH_542c6ebd48bf40fa857fc245c7572e30/wjiang-ocp.shiftstack.com/load-balancer.ign?temp_url_sig=e553e9b920b2de5396d27c376bf3b56bfb525842&temp_url_expires=1551954402: attempt #22
[  105.382646] ignition[456]: GET error: Get https://rhos-d.infra.prod.upshift.rdu2.redhat.com:13808/v1/AUTH_542c6ebd48bf40fa857fc245c7572e30/wjiang-ocp.shiftstack.com/load-balancer.ign?temp_url_sig=e553e9b920b2de5396d27c376bf3b56bfb525842&temp_url_expires=1551954402: dial tcp: lookup rhos-d.infra.prod.upshift.rdu2.redhat.com on 128.31.24.11:53: server misbehaving

Expected results:
Dns_nameservers for subnet should be customized in install-config.yaml


Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 weiwei jiang 2019-03-11 02:33:32 UTC
Work around as following:

[openshift@dhcp-140-70 installer]$ git diff
diff --git a/data/data/openstack/topology/private-network.tf b/data/data/openstack/topology/private-network.tf
index aaf3badfa..93ce3b174 100644
--- a/data/data/openstack/topology/private-network.tf
+++ b/data/data/openstack/topology/private-network.tf
@@ -21,7 +21,7 @@ resource "openstack_networking_subnet_v2" "service" {
   ip_version      = 4
   network_id      = "${openstack_networking_network_v2.openshift-private.id}"
   tags            = ["openshiftClusterID=${var.cluster_id}"]
-  dns_nameservers = ["1.1.1.1", "208.67.222.222"]
+  dns_nameservers = ["10.72.17.5"]
 }

Comment 2 Flavio Percoco 2019-03-13 11:33:33 UTC
This has been fixed upstream. We're improving this fix to allow for setting custom DNS names if needed: https://github.com/openshift/installer/pull/1386

Comment 5 weiwei jiang 2019-08-01 06:21:06 UTC
Checked with 4.2.0-0.nightly-2019-07-31-162901, 
this would not be an issue any more.

Comment 6 Eric Duen 2019-08-05 18:57:29 UTC
Returning to QE to close out since BZ has been validated to work on a nightly.

Comment 7 weiwei jiang 2019-08-06 08:19:15 UTC
Verified on 4.2.0-0.nightly-2019-08-05-223032, move to verified.

Comment 8 errata-xmlrpc 2019-10-16 06:27:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922