Description of problem: Hello, Customer is running the playbook to install the openshift on brand new environment, from rpm. Unfortunately, the installation fails on: 2017-05-21 05:44:33,986 p=2 u=root | fatal: [master1.example.com]: FAILED! => { "changed": false, "cmd": [ "oc", "create", "-f", "/usr/share/openshift/hosted", "--config=/tmp/openshift-ansible-aaaaa/admin.kubeconfig", "-n", "openshift" ], "delta": "0:00:40.250837", "end": "2017-05-21 05:44:28.101373", "failed": true, "failed_when_result": true, "rc": 1, "start": "2017-05-21 05:43:47.850536", "warnings": [] } STDERR: Error from server: templates "logging-deployer-account-template" is forbidden: not yet ready to handle request Error from server: templates "logging-deployer-template" is forbidden: not yet ready to handle request Error from server: error when creating "/usr/share/openshift/hosted/metrics-deployer.yaml": templates "metrics-deployer-template" is forbidden: not yet ready to handle request Error from server: error when creating "/usr/share/openshift/hosted/registry-console.yaml": templates "registry-console" is forbidden: not yet ready to handle request After investigation, we found that the problem behind is, that when the master starts, it will show the errors: " .... User \"system:anonymous\" cannot get ... " Customer uses the F5 loadbalancer and uses two master url, public and private. The private loadbalancer is configured for the openshift masters. We found that the issue is in openshift-master.kubeconfig, when the certificates are recreated, the kubeconfig is also created but modified during the installation. The kubeconfig has 3 servers and 3 context configured: - public.loadbalancer.example.com:443 - private.loadbalancer.example.com:443 - master1.example.com:8443 For some reason, when the current-context is configured for the "default/master1.example.com:8443/system:openshift-master" it shows the system:anonymous error, when it is for "default/private.loadbalancer.example.com:443/system:openshift-master" then the master works. The problem is in the modifying the kubeconfig during the installation. As the customer installation was running long, the difference can be seen, the kubeconfig is modified 1 hour after the certificates are generated. I will attach the logs afterwards. The nodes were not installed. Version-Release number of selected component (if applicable): OpenShift Container Platform 3.4.1.12 openshift-ansible-playbooks-3.4.74-1.git.0.6542413.el7.noarch How reproducible: Reproduced on customer env with changing the context - after changing to the private loadbalancer, the installation worked. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
We create the openshift-master.kubeconfig file here. https://github.com/openshift/openshift-ansible/blob/release-1.4/roles/openshift_master_certificates/tasks/main.yml#L71-L86 # oc adm create-api-client-config --certificate-authority ca.crt --client-dir=test --groups="system:masters,system:openshift-master" --public-master=public.master.com:443 --master=local.master.com:8443 --signer-cert=ca.crt --signer-key=ca.key --signer-serial=ca.serial.txt --user=system:openshift-master --basename=openshift-master Then the context is set here https://github.com/openshift/openshift-ansible/blob/release-1.4/roles/openshift_master/tasks/set_loopback_context.yml#L22 Using loopback_context_name which sets the name to a user with the wrong port referenced https://github.com/openshift/openshift-ansible/blob/release-1.4/roles/openshift_facts/library/openshift_facts.py#L677 My issue currently is that I can not reproduce this with manual steps. The command above creates a user with "name: 'system:openshift-master/:'" I must be missing some option.
oc adm create-api-client-config --certificate-authority ca.crt --client-dir=test --groups="system:masters,system:openshift-master" --public-master=https://public.master.com:443 --master=https://local.master.com:8443 --signer-cert=ca.crt --signer-key=ca.key --signer-serial=ca.serial.txt --user=system:openshift-master --basename=openshift-master This will set the user name correctly.
Verify this bug with openshift-ansible-roles-3.6.109-1.git.0.256e658.el7.noarch 1. Prepare an haproxy load-balancer listens on 443 while the backend masters listen on 8443 [root@openshift-133 ~]# hostname openshift-133.test.com [root@openshift-133 ~]# tail /etc/haproxy/haproxy.cfg -n 12 frontend atomic-openshift-api bind *:443 default_backend atomic-openshift-api mode tcp option tcplog backend atomic-openshift-api balance source mode tcp server master0 192.168.2.148:8443 check server master1 192.168.2.149:8443 check server master2 192.168.2.150:8443 check [root@openshift-133 ~]# iptables -nL |grep 443 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:8443 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:443 2. Configure inventory file like below [OSEv3:children] masters nodes etcd [OSEv3:vars] <-snip-> openshift_master_cluster_method=native openshift_master_cluster_hostname=openshift-133.test.com openshift_master_cluster_public_hostname=openshift-133.test.com penshift_master_console_port=8443 openshift_master_api_port=8443 openshift_master_api_url=https://openshift-133.test.com:443 openshift_master_console_url=https://openshift-133.test.com:443/console openshift_master_public_api_url=https://openshift-133.test.com:443 openshift_master_public_console_url=https://openshift-133.test.com:443/console <-snip-> [masters] openshift-126.test.com openshift_public_hostname=openshift-126.test.com openshift_hostname=openshift-126.test.com openshift-139.test.com openshift_public_hostname=openshift-139.test.com openshift_hostname=openshift-139.test.com openshift-128.test.com openshift_public_hostname=openshift-128.test.com openshift_hostname=openshift-128.test.com [nodes] openshift-152.test.com [etcd] openshift-126.test.com openshift-139.test.com openshift-128.test.com 3. Run installation playbook The installation is successful without error, ocp cluster is working well 4. Check openshift-master.kubeconfig on 3 masters The user referred in openshift-master.kubeconfig are all pointing to local master with correct port, such as on the first master openshift-126.test.com [root@qe-gpei-36-ha-1-master-etcd-1 ~]# hostname openshift-126.test.com [root@qe-gpei-36-ha-1-master-etcd-1 ~]# cat /etc/origin/master/openshift-master.kubeconfig <-snip-> - cluster: certificate-authority-data: <-snip-> server: https://openshift-126.test.com:8443 name: openshift-126-test-com:8443 contexts: - context: cluster: openshift-126-test-com:8443 namespace: default user: system:openshift-master/openshift-126-test-com:8443 name: default/openshift-126-test-com:8443/system:openshift-master current-context: default/openshift-126-test-com:8443/system:openshift-master kind: Config preferences: {} users: - name: system:openshift-master/openshift-126-test-com:8443 <-snip-> 5. Stop 2/3 masters' controllers service in turn, each of the 3 masters could work well
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1716