1638525 – [3.10] Validation of static pod fails due to inconsistent names

Bug 1638525 - [3.10] Validation of static pod fails due to inconsistent names

Summary: [3.10] Validation of static pod fails due to inconsistent names

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	3.10.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	3.10.z
Assignee:	Michael Gugino
QA Contact:	Johnny Liu
Docs Contact:
URL:
Whiteboard:
Depends On:	1614904
Blocks:
TreeView+	depends on / blocked

Reported:	2018-10-11 19:36 UTC by Scott Dodson
Modified:	2018-11-16 09:00 UTC (History)
CC List:	28 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1614904
Environment:
Last Closed:	2018-11-11 16:39:26 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2018:2709	0	None	None	None	2018-11-11 16:40:16 UTC

Comment 3 Johnny Liu 2018-10-18 10:34:15 UTC

Testing this bug with the following scenario with openshift-ansible-3.10.59-1.git.0.f9ba890.el7.noarch, and PASS.

Scenario #1:
Try to install a new 3.10 cluster with openshift_kubelet_name_override set. Installs should fail.

PLAY [Fail openshift_kubelet_name_override for new hosts] **********************

TASK [Gathering Facts] *********************************************************
Thursday 18 October 2018  17:01:59 +0800 (0:00:00.078)       0:00:00.078 ****** 
ok: [host-8-253-129.host.centralci.eng.rdu2.redhat.com]

TASK [Fail when openshift_kubelet_name_override is defined] ********************
Thursday 18 October 2018  17:02:00 +0800 (0:00:00.608)       0:00:00.687 ****** 
fatal: [host-8-253-129.host.centralci.eng.rdu2.redhat.com]: FAILED! => {"changed": false, "failed": true, "msg": "openshift_kubelet_name_override Cannot be defined for new hosts"}


Scenario #2:
Install a new 3.10 cluster without openshift_kubelet_name_override set, the cluster is runnig on OSP without cloudprovider enabled + short hostname, PASS.


[root@qe-jialiu310-merrn-1 ~]# oc get node
NAME                   STATUS    ROLES            AGE       VERSION
qe-jialiu310-merrn-1   Ready     compute,master   1h        v1.10.0+b81c8f8
[root@qe-jialiu310-merrn-1 ~]# hostname
qe-jialiu310-merrn-1
[root@qe-jialiu310-merrn-1 ~]# hostname -f
qe-jialiu310-merrn-1.int.1018-mi1.qe.rhcloud.com
[root@qe-jialiu310-merrn-1 ~]# ls /etc/origin/cloudprovider/
[root@qe-jialiu310-merrn-1 ~]# oc get po -n kube-system
NAME                                      READY     STATUS    RESTARTS   AGE
master-api-qe-jialiu310-merrn-1           1/1       Running   0          1h
master-controllers-qe-jialiu310-merrn-1   1/1       Running   0          1h
master-etcd-qe-jialiu310-merrn-1          1/1       Running   0          1h

TASK [Gather Cluster facts] ****************************************************
Thursday 18 October 2018  17:11:53 +0800 (0:00:00.094)       0:01:03.113 ****** 
changed: [host-8-253-129.host.centralci.eng.rdu2.redhat.com] => {"ansible_facts": {"openshift": {"common": {"all_hostnames": ["172.16.122.72", "host-8-253-129.host.centralci.eng.rdu2.redhat.com", "qe-jialiu310-merrn-1.int.1018-mi1.qe.rhcloud.com"], "config_base": "/etc/origin", "dns_domain": "cluster.local", "generate_no_proxy_hosts": true, "hostname": "qe-jialiu310-merrn-1.int.1018-mi1.qe.rhcloud.com", "internal_hostnames": ["172.16.122.72", "qe-jialiu310-merrn-1.int.1018-mi1.qe.rhcloud.com"], "ip": "172.16.122.72", "kube_svc_ip": "172.30.0.1", "portal_net": "172.30.0.0/16", "public_hostname": "host-8-253-129.host.centralci.eng.rdu2.redhat.com", "public_ip": "172.16.122.72", "raw_hostname": "qe-jialiu310-merrn-1"}, "current_config": {}}}, "changed": true, "failed": false}

TASK [openshift_control_plane : Wait for all control plane pods to become ready] ***
Thursday 18 October 2018  17:18:14 +0800 (0:00:00.025)       0:05:18.400 ****** 
FAILED - RETRYING: Wait for all control plane pods to become ready (60 retries left).

FAILED - RETRYING: Wait for all control plane pods to become ready (59 retries left).

FAILED - RETRYING: Wait for all control plane pods to become ready (58 retries left).

FAILED - RETRYING: Wait for all control plane pods to become ready (57 retries left).

FAILED - RETRYING: Wait for all control plane pods to become ready (56 retries left).

FAILED - RETRYING: Wait for all control plane pods to become ready (55 retries left).

FAILED - RETRYING: Wait for all control plane pods to become ready (54 retries left).

FAILED - RETRYING: Wait for all control plane pods to become ready (53 retries left).

FAILED - RETRYING: Wait for all control plane pods to become ready (52 retries left).

FAILED - RETRYING: Wait for all control plane pods to become ready (51 retries left).

FAILED - RETRYING: Wait for all control plane pods to become ready (50 retries left).

FAILED - RETRYING: Wait for all control plane pods to become ready (49 retries left).

FAILED - RETRYING: Wait for all control plane pods to become ready (48 retries left).

ok: [host-8-253-129.host.centralci.eng.rdu2.redhat.com] => (item=etcd) => {"attempts": 14, "changed": false, "failed": false, "item": "etcd", "results": {"cmd": "/usr/bin/oc get pod master-etcd-qe-jialiu310-merrn-1 -o json -n kube-system", "results": 
.....
.....
.....

Comment 4 Scott Dodson 2018-10-18 12:35:59 UTC

In openshift-ansible-3.10.58-1 and later.

Comment 5 Johnny Liu 2018-10-22 10:00:38 UTC

Beside comment 3, run some other extra testing with openshift-ansible-3.10.60-1.git.0.7e781a5.el7.noarch, and PASS.

cluster install on OSP 10 without cloudprovider enabled + short hostname, PASS.
cluster install on OSP 10 with cloudprovider enabled + short hostname, PASS.
cluster install on GCP without cloudprovider enabled + short hostname, PASS.
cluster install on GcP with cloudprovider enabled + short hostname, PASS.

Comment 7 errata-xmlrpc 2018-11-11 16:39:26 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2709

Note You need to log in before you can comment on or make changes to this bug.

aleks
aos-bugs
brian.millett
byount
dhwanil.raval
fshaikh
jcrumple
jialiu
jkaur
jokerman
jolee
mark.vinkx
mgugino
mmccomas
msomasun
openshift-bugs-escalate
rhowe
rkant
rkshirsa
schoudha
scuppett
sdodson
sgarciam
sheldyakov
shlao
stwalter
torben
wmeng