Bug 1454321
| Summary: | Ansible playbook fails due the incorrect openshift-master.kubeconfig | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Vladislav Walek <vwalek> | |
| Component: | Installer | Assignee: | Andrew Butcher <abutcher> | |
| Status: | CLOSED ERRATA | QA Contact: | Gaoyun Pei <gpei> | |
| Severity: | urgent | Docs Contact: | ||
| Priority: | high | |||
| Version: | 3.4.1 | CC: | abutcher, aos-bugs, jokerman, mmccomas, rhowe, tcarlin, vwalek | |
| Target Milestone: | --- | |||
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: |
Previously, installation would fail in multi-master environments in which the load balanced API was listening on a different port than that of the OpenShift API/console. We now account for this difference and ensure the master loopback client config is configured to interact with the local master.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1462276 1462280 1462282 1462283 (view as bug list) | Environment: | ||
| Last Closed: | 2017-08-10 05:25:32 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1462276, 1462280, 1462282, 1462283 | |||
We create the openshift-master.kubeconfig file here. https://github.com/openshift/openshift-ansible/blob/release-1.4/roles/openshift_master_certificates/tasks/main.yml#L71-L86 # oc adm create-api-client-config --certificate-authority ca.crt --client-dir=test --groups="system:masters,system:openshift-master" --public-master=public.master.com:443 --master=local.master.com:8443 --signer-cert=ca.crt --signer-key=ca.key --signer-serial=ca.serial.txt --user=system:openshift-master --basename=openshift-master Then the context is set here https://github.com/openshift/openshift-ansible/blob/release-1.4/roles/openshift_master/tasks/set_loopback_context.yml#L22 Using loopback_context_name which sets the name to a user with the wrong port referenced https://github.com/openshift/openshift-ansible/blob/release-1.4/roles/openshift_facts/library/openshift_facts.py#L677 My issue currently is that I can not reproduce this with manual steps. The command above creates a user with "name: 'system:openshift-master/:'" I must be missing some option. oc adm create-api-client-config --certificate-authority ca.crt --client-dir=test --groups="system:masters,system:openshift-master" --public-master=https://public.master.com:443 --master=https://local.master.com:8443 --signer-cert=ca.crt --signer-key=ca.key --signer-serial=ca.serial.txt --user=system:openshift-master --basename=openshift-master This will set the user name correctly. Verify this bug with openshift-ansible-roles-3.6.109-1.git.0.256e658.el7.noarch
1. Prepare an haproxy load-balancer listens on 443 while the backend masters listen on 8443
[root@openshift-133 ~]# hostname
openshift-133.test.com
[root@openshift-133 ~]# tail /etc/haproxy/haproxy.cfg -n 12
frontend atomic-openshift-api
bind *:443
default_backend atomic-openshift-api
mode tcp
option tcplog
backend atomic-openshift-api
balance source
mode tcp
server master0 192.168.2.148:8443 check
server master1 192.168.2.149:8443 check
server master2 192.168.2.150:8443 check
[root@openshift-133 ~]# iptables -nL |grep 443
ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:8443
ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:443
2. Configure inventory file like below
[OSEv3:children]
masters
nodes
etcd
[OSEv3:vars]
<-snip->
openshift_master_cluster_method=native
openshift_master_cluster_hostname=openshift-133.test.com
openshift_master_cluster_public_hostname=openshift-133.test.com
penshift_master_console_port=8443
openshift_master_api_port=8443
openshift_master_api_url=https://openshift-133.test.com:443
openshift_master_console_url=https://openshift-133.test.com:443/console
openshift_master_public_api_url=https://openshift-133.test.com:443
openshift_master_public_console_url=https://openshift-133.test.com:443/console
<-snip->
[masters]
openshift-126.test.com openshift_public_hostname=openshift-126.test.com openshift_hostname=openshift-126.test.com
openshift-139.test.com openshift_public_hostname=openshift-139.test.com openshift_hostname=openshift-139.test.com
openshift-128.test.com openshift_public_hostname=openshift-128.test.com openshift_hostname=openshift-128.test.com
[nodes]
openshift-152.test.com
[etcd]
openshift-126.test.com
openshift-139.test.com
openshift-128.test.com
3. Run installation playbook
The installation is successful without error, ocp cluster is working well
4. Check openshift-master.kubeconfig on 3 masters
The user referred in openshift-master.kubeconfig are all pointing to local master with correct port, such as on the first master openshift-126.test.com
[root@qe-gpei-36-ha-1-master-etcd-1 ~]# hostname
openshift-126.test.com
[root@qe-gpei-36-ha-1-master-etcd-1 ~]# cat /etc/origin/master/openshift-master.kubeconfig
<-snip->
- cluster:
certificate-authority-data: <-snip->
server: https://openshift-126.test.com:8443
name: openshift-126-test-com:8443
contexts:
- context:
cluster: openshift-126-test-com:8443
namespace: default
user: system:openshift-master/openshift-126-test-com:8443
name: default/openshift-126-test-com:8443/system:openshift-master
current-context: default/openshift-126-test-com:8443/system:openshift-master
kind: Config
preferences: {}
users:
- name: system:openshift-master/openshift-126-test-com:8443
<-snip->
5. Stop 2/3 masters' controllers service in turn, each of the 3 masters could work well
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1716 |
Description of problem: Hello, Customer is running the playbook to install the openshift on brand new environment, from rpm. Unfortunately, the installation fails on: 2017-05-21 05:44:33,986 p=2 u=root | fatal: [master1.example.com]: FAILED! => { "changed": false, "cmd": [ "oc", "create", "-f", "/usr/share/openshift/hosted", "--config=/tmp/openshift-ansible-aaaaa/admin.kubeconfig", "-n", "openshift" ], "delta": "0:00:40.250837", "end": "2017-05-21 05:44:28.101373", "failed": true, "failed_when_result": true, "rc": 1, "start": "2017-05-21 05:43:47.850536", "warnings": [] } STDERR: Error from server: templates "logging-deployer-account-template" is forbidden: not yet ready to handle request Error from server: templates "logging-deployer-template" is forbidden: not yet ready to handle request Error from server: error when creating "/usr/share/openshift/hosted/metrics-deployer.yaml": templates "metrics-deployer-template" is forbidden: not yet ready to handle request Error from server: error when creating "/usr/share/openshift/hosted/registry-console.yaml": templates "registry-console" is forbidden: not yet ready to handle request After investigation, we found that the problem behind is, that when the master starts, it will show the errors: " .... User \"system:anonymous\" cannot get ... " Customer uses the F5 loadbalancer and uses two master url, public and private. The private loadbalancer is configured for the openshift masters. We found that the issue is in openshift-master.kubeconfig, when the certificates are recreated, the kubeconfig is also created but modified during the installation. The kubeconfig has 3 servers and 3 context configured: - public.loadbalancer.example.com:443 - private.loadbalancer.example.com:443 - master1.example.com:8443 For some reason, when the current-context is configured for the "default/master1.example.com:8443/system:openshift-master" it shows the system:anonymous error, when it is for "default/private.loadbalancer.example.com:443/system:openshift-master" then the master works. The problem is in the modifying the kubeconfig during the installation. As the customer installation was running long, the difference can be seen, the kubeconfig is modified 1 hour after the certificates are generated. I will attach the logs afterwards. The nodes were not installed. Version-Release number of selected component (if applicable): OpenShift Container Platform 3.4.1.12 openshift-ansible-playbooks-3.4.74-1.git.0.6542413.el7.noarch How reproducible: Reproduced on customer env with changing the context - after changing to the private loadbalancer, the installation worked. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: