Bug 1629726 - [3.10] ose-plane pods names are taken from openstack's metadata although cloud provider is not configured
Summary: [3.10] ose-plane pods names are taken from openstack's metadata although clou...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.10.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 3.10.z
Assignee: Vadim Rutkovsky
QA Contact: Johnny Liu
URL:
Whiteboard:
Depends On: 1623335
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-09-17 12:14 UTC by Scott Dodson
Modified: 2019-01-10 09:27 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: openstack metadata was used although cloudprovider was unset Consequence: unexpected hostnames were used by ansible playbook Fix: metadata from cloudprovider is not used if cloudprovider is not set Result: installation proceeds and uses `hostname -f` output regardless of underlying cloud
Clone Of: 1623335
Environment:
Last Closed: 2019-01-10 09:27:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
installation log with inventory file embedded for 3.10 (670.43 KB, text/plain)
2018-09-26 13:46 UTC, Johnny Liu
no flags Details
installation log with inventory file embedded for 3.11 (1.50 MB, text/plain)
2018-09-26 13:46 UTC, Johnny Liu
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0026 0 None None None 2019-01-10 09:27:16 UTC

Comment 1 Vadim Rutkovsky 2018-09-21 09:47:07 UTC
Fix is available in openshift-ansible-3.10.48-1

Comment 2 Johnny Liu 2018-09-26 13:43:14 UTC
Re-test this bug with openshift-ansible-3.10.50-1.git.0.96a93c5.el7.noarch, FAIL.

I tried the same scenarios for 3.10 just like https://bugzilla.redhat.com/show_bug.cgi?id=1623335#c16

Scenario #1:
cluster running on OSP without cloudprovider enabled + short hostname, FAIL

[root@qe-jialiu310bug-mrre-1 ~]# hostname
qe-jialiu310bug-mrre-1
[root@qe-jialiu310bug-mrre-1 ~]# hostname -f
qe-jialiu310bug-mrre-1.int.0926-hg7.qe.rhcloud.com
[root@qe-jialiu310bug-mrre-1 ~]# oc get po -n kube-system
NAME                                        READY     STATUS    RESTARTS   AGE
master-api-qe-jialiu310bug-mrre-1           1/1       Running   0          5m
master-controllers-qe-jialiu310bug-mrre-1   1/1       Running   0          5m
master-etcd-qe-jialiu310bug-mrre-1          1/1       Running   0          5m

TASK [Gather Cluster facts] ****************************************************
Wednesday 26 September 2018  18:05:15 +0800 (0:00:00.083)       0:03:23.254 *** 
changed: [host-8-249-48.host.centralci.eng.rdu2.redhat.com] => {"ansible_facts": {"openshift": {"common": {"all_hostnames": ["host-8-249-48.host.centralci.eng.rdu2.redhat.com", "172.16.122.44", "qe-jialiu310bug-mrre-1.int.0926-hg7.qe.rhcloud.com"], "config_base": "/etc/origin", "dns_domain": "cluster.local", "generate_no_proxy_hosts": true, "hostname": "qe-jialiu310bug-mrre-1.int.0926-hg7.qe.rhcloud.com", "internal_hostnames": ["172.16.122.44", "qe-jialiu310bug-mrre-1.int.0926-hg7.qe.rhcloud.com"], "ip": "172.16.122.44", "kube_svc_ip": "172.30.0.1", "portal_net": "172.30.0.0/16", "public_hostname": "host-8-249-48.host.centralci.eng.rdu2.redhat.com", "public_ip": "172.16.122.44", "raw_hostname": "qe-jialiu310bug-mrre-1"}, "current_config": {}}}, "changed": true, "failed": false}

Installation failed at "Wait for all control plane pods to become ready" due to inconsistent pod name.

failed: [host-8-249-48.host.centralci.eng.rdu2.redhat.com] (item=api) => {"attempts": 60, "changed": false, "failed": true, "item": "api", "results": {"cmd": "/usr/bin/oc get pod master-api-qe-jialiu310bug-mrre-1.int.0926-hg7.qe.rhcloud.com -o json -n kube-system", "results": [{}], "returncode": 0, "stderr": "Error from server (NotFound): pods \"master-api-qe-jialiu310bug-mrre-1.int.0926-hg7.qe.rhcloud.com\" not found\n", "stdout": ""}, "state": "list"}

failed: [host-8-249-48.host.centralci.eng.rdu2.redhat.com] (item=controllers) => {"attempts": 60, "changed": false, "failed": true, "item": "controllers", "results": {"cmd": "/usr/bin/oc get pod master-controllers-qe-jialiu310bug-mrre-1.int.0926-hg7.qe.rhcloud.com -o json -n kube-system", "results": [{}], "returncode": 0, "stderr": "Error from server (NotFound): pods \"master-controllers-qe-jialiu310bug-mrre-1.int.0926-hg7.qe.rhcloud.com\" not found\n", "stdout": ""}, "state": "list"}

Scenario #2:
cluster running on OSP without cloudprovider enabled + a FDQN hostname, PASS

[root@qe-jialiu310bug2-mrre-1 ~]# hostname
qe-jialiu310bug2-mrre-1.int.0926-hsu.qe.rhcloud.com
[root@qe-jialiu310bug2-mrre-1 ~]# hostname -f
qe-jialiu310bug2-mrre-1.int.0926-hsu.qe.rhcloud.com
[root@qe-jialiu310bug2-mrre-1 ~]# oc get po -n kube-system
NAME                                                                     READY     STATUS    RESTARTS   AGE
master-api-qe-jialiu310bug2-mrre-1.int.0926-hsu.qe.rhcloud.com           1/1       Running   0          56m
master-controllers-qe-jialiu310bug2-mrre-1.int.0926-hsu.qe.rhcloud.com   1/1       Running   0          56m
master-etcd-qe-jialiu310bug2-mrre-1.int.0926-hsu.qe.rhcloud.com          1/1       Running   0          56m

TASK [Gather Cluster facts] ****************************************************
Wednesday 26 September 2018  18:04:43 +0800 (0:00:00.083)       0:00:11.229 *** 

ok: [host-8-241-132.host.centralci.eng.rdu2.redhat.com] => {"ansible_facts": {"openshift": {"common": {"all_hostnames": ["172.16.122.22", "host-8-241-132.host.centralci.eng.rdu2.redhat.com", "qe-jialiu310bug2-mrre-1.int.0926-hsu.qe.rhcloud.com"], "config_base": "/etc/origin", "dns_domain": "cluster.local", "generate_no_proxy_hosts": true, "hostname": "qe-jialiu310bug2-mrre-1.int.0926-hsu.qe.rhcloud.com", "internal_hostnames": ["172.16.122.22", "qe-jialiu310bug2-mrre-1.int.0926-hsu.qe.rhcloud.com"], "ip": "172.16.122.22", "kube_svc_ip": "172.30.0.1", "portal_net": "172.30.0.0/16", "public_hostname": "host-8-241-132.host.centralci.eng.rdu2.redhat.com", "public_ip": "172.16.122.22", "raw_hostname": "qe-jialiu310bug2-mrre-1.int.0926-hsu.qe.rhcloud.com"}, "current_config": {"roles": ["node"]}, "node": {"bootstrapped": false, "nodename": "qe-jialiu310bug2-mrre-1.int.0926-hsu.qe.rhcloud.com", "sdn_mtu": "1450"}}}, "changed": false, "failed": false}

Scenarios #3:
cluster running on OSP with cloudprovider enabled + a short hostname, PASS.
[root@qe-jialiu310bug1-mrre-1 ~]# hostname
qe-jialiu310bug1-mrre-1
[root@qe-jialiu310bug1-mrre-1 ~]# hostname -f
qe-jialiu310bug1-mrre-1.int.0926-0vu.qe.rhcloud.com
[root@qe-jialiu310bug1-mrre-1 ~]# oc get po -n kube-system
NAME                                         READY     STATUS    RESTARTS   AGE
master-api-qe-jialiu310bug1-mrre-1           1/1       Running   0          1h
master-controllers-qe-jialiu310bug1-mrre-1   1/1       Running   0          1h
master-etcd-qe-jialiu310bug1-mrre-1          1/1       Running   0          1h


TASK [Gather Cluster facts] ****************************************************
Wednesday 26 September 2018  18:04:35 +0800 (0:00:00.081)       0:03:14.084 *** 

changed: [host-8-254-58.host.centralci.eng.rdu2.redhat.com] => {"ansible_facts": {"openshift": {"common": {"all_hostnames": ["10.8.254.58", "qe-jialiu310bug1-mrre-1", "172.16.122.87", "host-8-254-58.host.centralci.eng.rdu2.redhat.com"], "cloudprovider": "openstack", "config_base": "/etc/origin", "dns_domain": "cluster.local", "generate_no_proxy_hosts": true, "hostname": "qe-jialiu310bug1-mrre-1", "internal_hostnames": ["qe-jialiu310bug1-mrre-1", "172.16.122.87"], "ip": "172.16.122.87", "kube_svc_ip": "172.30.0.1", "portal_net": "172.30.0.0/16", "public_hostname": "host-8-254-58.host.centralci.eng.rdu2.redhat.com", "public_ip": "10.8.254.58", "raw_hostname": "qe-jialiu310bug1-mrre-1"}, "current_config": {}, "provider": {"metadata": {"availability_zone": "nova", "devices": [], "ec2_compat": {"ami-id": "ami-0000b2df", "ami-launch-index": "0", "ami-manifest-path": "FIXME", "block-device-mapping": {"ami": "vda", "root": "/dev/vda"}, "hostname": "qe-jialiu310bug1-mrre-1", "instance-action": "none", "instance-id": "i-00a2b1f2", "instance-type": "m1.medium", "local-hostname": "qe-jialiu310bug1-mrre-1", "local-ipv4": "172.16.122.87", "placement": {"availability-zone": "nova"}, "public-hostname": "qe-jialiu310bug1-mrre-1", "public-ipv4": "10.8.254.58", "public-keys/": "0=libra", "reservation-id": "r-omh8e6eq", "security-groups": "default"}, "hostname": "qe-jialiu310bug1-mrre-1", "keys": [{"data": "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDUq7W38xCZ9WGSWCvustaMGMT04tRohw6AKGzI7P7xql5lhCAReyt72n9qWQRZsE1YiCSQuTfXI1oc8NpSM7+lMLwj12G8z3I1YT31JHr9LLYg/XIcExkzfBI920CaS82VqmKOpI9+ARHSJBdIbKRI0f5Y+u4xbc5UzKCJX8jcKGG7nEiw8zm+cvAlfOgssMK+qJppIbVcb2iZNTsw5i2aX6FDMyC+b17DQHzBGpNbhZYxuoERZVRcnYctgIzuo6fD60gniX0fVvrchlOnubB1sRYbloP2r6UE22w/dpLKOFE5i7CA0ZzNBERZ94cIKumIH9MiJs1a6bMe89VOjjNV", "name": "libra", "type": "ssh"}], "launch_index": 0, "name": "qe-jialiu310bug1-mrre-1", "project_id": "e0fa85b6a06443959d2d3b497174bed6", "uuid": "7f70e1b4-e34f-4f0e-830c-0d2f70c52d73"}, "name": "openstack", "network": {"hostname": "qe-jialiu310bug1-mrre-1", "interfaces": [], "ip": "172.16.122.87", "ipv6_enabled": false, "public_hostname": "10.8.254.58", "public_ip": "10.8.254.58"}, "zone": "nova"}}}, "changed": true, "failed": false}


For scenario #1, I re-run another install with openshift-ansible-3.11.15-1.git.0.31c5933.el7.noarch build, 3.11 do not re-create such issue.
[root@qe-jialiu311bug-mrre-1 ~]# oc version
oc v3.11.15
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://qe-jialiu311bug-mrre-1.int.0926-4gc.qe.rhcloud.com:8443
openshift v3.11.15
kubernetes v1.11.0+d4cacc0
[root@qe-jialiu311bug-mrre-1 ~]# hostname
qe-jialiu311bug-mrre-1
[root@qe-jialiu311bug-mrre-1 ~]# hostname -f
qe-jialiu311bug-mrre-1.int.0926-4gc.qe.rhcloud.com
[root@qe-jialiu311bug-mrre-1 ~]# oc get po -n kube-system
NAME                                        READY     STATUS    RESTARTS   AGE
master-api-qe-jialiu311bug-mrre-1           1/1       Running   0          1h
master-controllers-qe-jialiu311bug-mrre-1   1/1       Running   0          1h
master-etcd-qe-jialiu311bug-mrre-1          1/1       Running   0          1h


Maybe the failed scenario #1 is similar to https://bugzilla.redhat.com/show_bug.cgi?id=1614904

Comment 3 Johnny Liu 2018-09-26 13:46:16 UTC
Created attachment 1487247 [details]
installation log with inventory file embedded for 3.10

Comment 4 Johnny Liu 2018-09-26 13:46:53 UTC
Created attachment 1487248 [details]
installation log with inventory file embedded for 3.11

Comment 5 Johnny Liu 2018-09-29 06:42:42 UTC
I am not very sure if the scenarios #1 is the same issue as https://bugzilla.redhat.com/show_bug.cgi?id=1614904, if it is, we could move this bug to 'VERIFIED', and track the scenarios #1 issue in BZ#1614904.

Comment 6 Johnny Liu 2018-10-23 02:34:35 UTC
scenarios #1 will be tracked in https://bugzilla.redhat.com/show_bug.cgi?id=1614904, so move this bug to ON_QA.

Comment 7 Johnny Liu 2018-10-23 02:35:57 UTC
Per comment 2, 5, 6, move this bug to VERIFIED.

Comment 8 Johnny Liu 2018-10-23 02:38:19 UTC
BZ#1614904 is changed for tracking 3.11, BZ#1638525 is a newly cloned bug for tracking 3.10.

Comment 10 errata-xmlrpc 2019-01-10 09:27:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0026


Note You need to log in before you can comment on or make changes to this bug.