1454321 – Ansible playbook fails due the incorrect openshift-master.kubeconfig

Bug 1454321 - Ansible playbook fails due the incorrect openshift-master.kubeconfig

Summary: Ansible playbook fails due the incorrect openshift-master.kubeconfig

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	3.4.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Andrew Butcher
QA Contact:	Gaoyun Pei
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1462276 1462280 1462282 1462283
TreeView+	depends on / blocked

Reported:	2017-05-22 13:08 UTC by Vladislav Walek
Modified:	2017-12-19 02:57 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Previously, installation would fail in multi-master environments in which the load balanced API was listening on a different port than that of the OpenShift API/console. We now account for this difference and ensure the master loopback client config is configured to interact with the local master.
Clone Of:
Clones:	1462276 1462280 1462282 1462283 (view as bug list)
Environment:
Last Closed:	2017-08-10 05:25:32 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2017:1716	0	normal	SHIPPED_LIVE	Red Hat OpenShift Container Platform 3.6 RPM Release Advisory	2017-08-10 09:02:50 UTC

Description Vladislav Walek 2017-05-22 13:08:21 UTC

Description of problem:

Hello,
Customer is running the playbook to install the openshift on brand new environment, from rpm. Unfortunately, the installation fails on:

2017-05-21 05:44:33,986 p=2 u=root |  fatal: [master1.example.com]: FAILED! => {
    "changed": false,
    "cmd": [
        "oc",
        "create",
        "-f",
        "/usr/share/openshift/hosted",
        "--config=/tmp/openshift-ansible-aaaaa/admin.kubeconfig",
        "-n",
        "openshift"
    ],
    "delta": "0:00:40.250837",
    "end": "2017-05-21 05:44:28.101373",
    "failed": true,
    "failed_when_result": true,
    "rc": 1,
    "start": "2017-05-21 05:43:47.850536",
    "warnings": []
}

STDERR:

Error from server: templates "logging-deployer-account-template" is forbidden: not yet ready to handle request
Error from server: templates "logging-deployer-template" is forbidden: not yet ready to handle request
Error from server: error when creating "/usr/share/openshift/hosted/metrics-deployer.yaml": templates "metrics-deployer-template" is forbidden: not yet ready to handle request
Error from server: error when creating "/usr/share/openshift/hosted/registry-console.yaml": templates "registry-console" is forbidden: not yet ready to handle request

After investigation, we found that the problem behind is, that when the master starts, it will show the errors:
" .... User \"system:anonymous\" cannot get ... "

Customer uses the F5 loadbalancer and uses two master url, public and private. The private loadbalancer is configured for the openshift masters.

We found that the issue is in openshift-master.kubeconfig, when the certificates are recreated, the kubeconfig is also created but modified during the installation. The kubeconfig has 3 servers and 3 context configured:

- public.loadbalancer.example.com:443
- private.loadbalancer.example.com:443
- master1.example.com:8443

For some reason, when the current-context is configured for the "default/master1.example.com:8443/system:openshift-master" it shows the system:anonymous error, when it is for "default/private.loadbalancer.example.com:443/system:openshift-master" then the master works.

The problem is in the modifying the kubeconfig during the installation. As the customer installation was running long, the difference can be seen, the kubeconfig is modified 1 hour after the certificates are generated.

I will attach the logs afterwards.

The nodes were not installed. 

Version-Release number of selected component (if applicable):
OpenShift Container Platform 3.4.1.12
openshift-ansible-playbooks-3.4.74-1.git.0.6542413.el7.noarch

How reproducible:
Reproduced on customer env with changing the context - after changing to the private loadbalancer, the installation worked.

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 13 Ryan Howe 2017-05-23 18:02:41 UTC

We create the openshift-master.kubeconfig file here. 

https://github.com/openshift/openshift-ansible/blob/release-1.4/roles/openshift_master_certificates/tasks/main.yml#L71-L86


# oc adm create-api-client-config --certificate-authority  ca.crt --client-dir=test --groups="system:masters,system:openshift-master" --public-master=public.master.com:443 --master=local.master.com:8443 --signer-cert=ca.crt --signer-key=ca.key --signer-serial=ca.serial.txt --user=system:openshift-master --basename=openshift-master

Then the context is set here

https://github.com/openshift/openshift-ansible/blob/release-1.4/roles/openshift_master/tasks/set_loopback_context.yml#L22

Using loopback_context_name which sets the name to a user with the wrong port referenced 

https://github.com/openshift/openshift-ansible/blob/release-1.4/roles/openshift_facts/library/openshift_facts.py#L677


My issue currently is that I can not reproduce this with manual steps. The command above creates a user with  "name: 'system:openshift-master/:'"

I must be missing some option.

Comment 14 Ryan Howe 2017-05-23 18:15:15 UTC

oc adm create-api-client-config --certificate-authority  ca.crt --client-dir=test --groups="system:masters,system:openshift-master" --public-master=https://public.master.com:443 --master=https://local.master.com:8443 --signer-cert=ca.crt --signer-key=ca.key --signer-serial=ca.serial.txt --user=system:openshift-master --basename=openshift-master

This will set the user name correctly.

Comment 23 Gaoyun Pei 2017-06-15 07:21:57 UTC

Verify this bug with openshift-ansible-roles-3.6.109-1.git.0.256e658.el7.noarch

1. Prepare an haproxy load-balancer listens on 443 while the backend masters listen on 8443
[root@openshift-133 ~]# hostname
openshift-133.test.com
[root@openshift-133 ~]# tail /etc/haproxy/haproxy.cfg -n 12
frontend  atomic-openshift-api
    bind *:443
    default_backend atomic-openshift-api
    mode tcp
    option tcplog

backend atomic-openshift-api
    balance source
    mode tcp
    server      master0 192.168.2.148:8443 check
    server      master1 192.168.2.149:8443 check
    server      master2 192.168.2.150:8443 check
[root@openshift-133 ~]# iptables -nL |grep 443
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0            state NEW tcp dpt:8443
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0            state NEW tcp dpt:443

2. Configure inventory file like below
[OSEv3:children]
masters
nodes
etcd

[OSEv3:vars]
<-snip->
openshift_master_cluster_method=native
openshift_master_cluster_hostname=openshift-133.test.com
openshift_master_cluster_public_hostname=openshift-133.test.com

penshift_master_console_port=8443
openshift_master_api_port=8443

openshift_master_api_url=https://openshift-133.test.com:443
openshift_master_console_url=https://openshift-133.test.com:443/console
openshift_master_public_api_url=https://openshift-133.test.com:443
openshift_master_public_console_url=https://openshift-133.test.com:443/console
<-snip->

[masters]
openshift-126.test.com openshift_public_hostname=openshift-126.test.com openshift_hostname=openshift-126.test.com
openshift-139.test.com openshift_public_hostname=openshift-139.test.com openshift_hostname=openshift-139.test.com
openshift-128.test.com openshift_public_hostname=openshift-128.test.com openshift_hostname=openshift-128.test.com

[nodes]
openshift-152.test.com 

[etcd]
openshift-126.test.com  
openshift-139.test.com 
openshift-128.test.com 

3. Run installation playbook
The installation is successful without error, ocp cluster is working well

4. Check openshift-master.kubeconfig on 3 masters
The user referred in openshift-master.kubeconfig are all pointing to local master with correct port, such as on the first master openshift-126.test.com
[root@qe-gpei-36-ha-1-master-etcd-1 ~]# hostname
openshift-126.test.com
[root@qe-gpei-36-ha-1-master-etcd-1 ~]# cat /etc/origin/master/openshift-master.kubeconfig
<-snip->
- cluster:
    certificate-authority-data: <-snip->
    server: https://openshift-126.test.com:8443
  name: openshift-126-test-com:8443
contexts:
- context:
    cluster: openshift-126-test-com:8443
    namespace: default
    user: system:openshift-master/openshift-126-test-com:8443
  name: default/openshift-126-test-com:8443/system:openshift-master
current-context: default/openshift-126-test-com:8443/system:openshift-master
kind: Config
preferences: {}
users:
- name: system:openshift-master/openshift-126-test-com:8443
<-snip->

5. Stop 2/3 masters' controllers service in turn, each of the 3 masters could work well

Comment 25 errata-xmlrpc 2017-08-10 05:25:32 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1716

Note You need to log in before you can comment on or make changes to this bug.