Bug 1467775
Summary: | Unable to perform initial IP allocation check for container OCP | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Anping Li <anli> | ||||
Component: | Cluster Version Operator | Assignee: | Scott Dodson <sdodson> | ||||
Status: | CLOSED DEFERRED | QA Contact: | Anping Li <anli> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 3.6.0 | CC: | aos-bugs, cynthia.devaraj, jfoots, jokerman, mmccomas, rhowe, sdodson, wmeng | ||||
Target Milestone: | --- | Keywords: | Reopened | ||||
Target Release: | 3.7.z | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: |
Cause:
The upgrade playbooks improperly modified /etc/origin/master/openshift-master.kubeconfig with the intent of correcting an error in environments provisioned in 3.5 and earlier.
Consequence:
Under certain circumstances this created API server errors.
Fix:
The process for updating the kubeconfig file has been updated to handle missing contexts.
Result:
The kubeconfig should be updated properly in all situations.
|
Story Points: | --- | ||||
Clone Of: | |||||||
: | 1636238 1636558 1636559 (view as bug list) | Environment: | |||||
Last Closed: | 2019-02-28 14:57:02 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1636238, 1636558, 1636559 | ||||||
Attachments: |
|
Description
Anping Li
2017-07-05 05:45:03 UTC
Created attachment 1294447 [details]
The inventory and logs
hit same issue two times with same inventory file
1. install ocp 3.5 and upgrade to ocp 3.6.133 by openshift-ansible v3.6.133.
logs-20170705053310-upgrade
2. install OCP 3.6 by openshift-ansible 3.6.96. logs-20170704145436-config
I'd be very surprised if this was an installer issue but lets verify we can reproduce it and then see if we can get some better logs. Couldn't reproduce :-( Started a fresh 3 node cluster at image tag (as specified in your inventory) v3.5.5.24 and then ran an upgrade with image tag (per your log) v3.6.133. Initial install landed docker-1.12.6-32.git88a4867.el7.x86_64, upgraded manually to newer docker-1.12.6-32.git88a4867.el7.x86_64. Rebooted. No changes. All atomic-openshift services are running and happy. Maybe it's just a fluke? At least 5 times. Both jiajliu and I had hit this issue. I will try to reproduce it and leave the Env. Could it for I couldn't reproduce it. I will open it when hit same issue. Reopening bug as multiple clusters have hit this issue recently: Clusters installed with 3.6 or earlier running playbooks to upgrade control plan either with 3.6 or 3.7 playbooks: # ansible-playbook -i </path/to/inventory/file> \ /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_6/upgrade_control_plane.yml # ansible-playbook -i </path/to/inventory/file> \ /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_7/upgrade_control_plane.yml End up with the API service failing due to a change in the openshift-master.kubeconfig file where the users.name changes showing the openshift_cluster_hostname but the user set in the current context is pointing to a users.name using the local masters hostname, this user is not present in the users section. Example: BEFORE: apiVersion: v1 clusters: - cluster: certificate-authority-data: REDACTED server: https://master03.ocp36.example.com:8443 name: master03-ocp36-example-com:8443 contexts: - context: cluster: master03-ocp36-example-com:8443 namespace: default user: system:openshift-master/master03-ocp36-example-com:8443 name: default/master03-ocp36-example-com:8443/system:openshift-master current-context: default/master03-ocp36-example-com:8443/system:openshift-master kind: Config preferences: {} users: - name: system:openshift-master/master03-ocp36-example-com:8443 user: client-certificate-data: REDACTED client-key-data: REDACTED AFTER: apiVersion: v1 clusters: - cluster: certificate-authority-data: REDACTED server: https://master03.ocp36.example.com:8443 name: master03-ocp36-example-com:8443 - cluster: certificate-authority-data: REDACTED server: https://cluster.ocp36.example.com:8443 name: cluster-ocp36-example-com:8443 contexts: - context: cluster: master03-ocp36-example-com:8443 namespace: default user: system:openshift-master/master03-ocp36-example-com:8443 name: default/master03-ocp36-example-com:8443/system:openshift-master - context: cluster: cluster-ocp36-example-com:8443 namespace: default user: system:openshift-master/cluster-ocp36-example-com:8443 name: default/cluster-ocp36-example-com:8443/system:openshift-master current-context: default/master03-ocp36-example-com:8443/system:openshift-master kind: Config preferences: {} users: - name: system:openshift-master/cluster-ocp36-example-com:8443 user: client-certificate-data: REDACTED client-key-data: REDACTED Further research points to this happening on older cluster install around 3.2 time. Was related and was supposed to fix issue with loopback, I believe this left the kubeconfig in a state that the upgrade will cause this issue to be hit. https://bugzilla.redhat.com/show_bug.cgi?id=1306011 The before example above is incorrect, I will try to get an example. Root issue is with the following role. roles/openshift_master/tasks/set_loopback_context.yml The context is changed using a user that is not present. To fix this we need to add a task that sets credentials when set_loopback_cluster is changed. 3.6 -> 3.9 playbooks roles/openshift_master/tasks/set_loopback_context.yml 3.10+ playbooks roles/openshift_control_plane/tasks/set_loopback_context.yml ADD TASK: - command: > {{ openshift.common.client_binary }} config set-credentials --client-certificate={{ openshift_master_config_dir }}/openshift-master.crt --client-key={{ openshift_master_config_dir }}/openshift-master.key --embed-certs=true {{ openshift.master.loopback_user }} --config={{ openshift_master_loopback_config }} when: set_loopback_cluster | changed register: set_loopback_credentials With https://github.com/openshift/openshift-ansible/pull/10325 merged this does in fact appear to work correctly. I've crafted a kubeconfig where items were capitalized as reported in the customer's logs then run an upgrade. First, the full kubeconfig. [root@ose3-master ~]# oc config view --config /etc/origin/master/openshift-master.kubeconfig apiVersion: v1 clusters: - cluster: certificate-authority-data: REDACTED server: https://OSE3-MASTER.example.com:8443 name: OSE3-MASTER-example-com:8443 - cluster: certificate-authority-data: REDACTED server: https://ose3-master.example.com:8443 name: ose3-master-example-com:8443 contexts: - context: cluster: OSE3-MASTER-example-com:8443 namespace: default user: system:openshift-master/OSE3-MASTER-example-com:8443 name: default/OSE3-MASTER-example-com:8443/system:openshift-master - context: cluster: ose3-master-example-com:8443 namespace: default user: system:openshift-master/ose3-master-example-com:8443 name: default/ose3-master-example-com:8443/system:openshift-master current-context: default/ose3-master-example-com:8443/system:openshift-master kind: Config preferences: {} users: - name: system:openshift-master/OSE3-MASTER-example-com:8443 user: client-certificate-data: REDACTED client-key-data: REDACTED - name: system:openshift-master/ose3-master-example-com:8443 user: client-certificate-data: REDACTED client-key-data: REDACTED Now the minified version which only shows the current context. [root@ose3-master ~]# oc config view --config /etc/origin/master/openshift-master.kubeconfig --minify apiVersion: v1 clusters: - cluster: certificate-authority-data: REDACTED server: https://ose3-master.example.com:8443 name: ose3-master-example-com:8443 contexts: - context: cluster: ose3-master-example-com:8443 namespace: default user: system:openshift-master/ose3-master-example-com:8443 name: default/ose3-master-example-com:8443/system:openshift-master current-context: default/ose3-master-example-com:8443/system:openshift-master kind: Config preferences: {} users: - name: system:openshift-master/ose3-master-example-com:8443 user: client-certificate-data: REDACTED client-key-data: REDACTED Now, verify that it works. [root@ose3-master ~]# oc get nodes --config /etc/origin/master/openshift-master.kubeconfig NAME STATUS AGE VERSION ose3-master.example.com Ready 50m v1.7.6+a08f5eeb62 ose3-node1.example.com Ready 22m v1.7.6+a08f5eeb62 ose3-node2.example.com Ready 50m v1.7.6+a08f5eeb62 *** Bug 1636238 has been marked as a duplicate of this bug. *** There appear to be no active cases related to this bug. As such we're closing this bug in order to focus on bugs that are still tied to active customer cases. Please re-open this bug if you feel it was closed in error or a new active case is attached. |