Bug 1588142
| Summary: | node scaleup fails - Node approval failed | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Vikas Laad <vlaad> | ||||
| Component: | Master | Assignee: | Jordan Liggitt <jliggitt> | ||||
| Status: | CLOSED WORKSFORME | QA Contact: | Wang Haoran <haowang> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 3.10.0 | CC: | aos-bugs, hongli, jmencak, jokerman, mfojtik, mifiedle, mmccomas, sdodson, vlaad | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 3.10.0 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2018-06-12 03:40:08 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Created attachment 1448412 [details]
ansible log with -vvv
I have been able to successfully scaleup nodes on an existing cluster and have not reproduced this bug. Based on the logs, the task 'Get CSRs' should always report something and stdout is blank. This is indicative of the openshift API not being responsive for some reason. The other attempts to run scaleup as shown in the log indicate the cluster might not be healthy and therefore scaleup would not be successful. Moving to the master team based on the assessment that the api server goes unresponsive while we're attempting to sign required CSRs. AFAIK it's also known that the time to sign a static number of CSRs is affected by the overall number of nodes in the cluster. https://docs.google.com/spreadsheets/d/1eg4_nLJuBr8Es04gdI-GSHM26eoQ-cVWAPvNOgqSDrA/edit#gid=0 Jiri Mencak has also observed the same behavior and may be able to provide more detail. I am not able to re-produce this issue, tried scaleup few times. Not sure if something changes recently. Please close if needed. nothing changed recently in that area, but I'm not able to reproduce either |
Description of problem: Scaleup is failing with following error changed: [ec2-54-149-214-42.us-west-2.compute.amazonaws.com] => { "changed": true, "cmd": [ "oc", "describe", "csr", "--config=/etc/origin/master/admin.kubeconfig" ], "delta": "0:00:00.127132", "end": "2018-06-06 18:28:51.602703", "failed": false, "invocation": { "module_args": { "_raw_params": "oc describe csr --config=/etc/origin/master/admin.kubeconfig", "_uses_shell": false, "chdir": null, "creates": null, "executable": null, "removes": null, "stdin": null, "warn": true } }, "rc": 0, "start": "2018-06-06 18:28:51.475571", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": [] } TASK [Report approval errors] *************************************************************************************************************** task path: /usr/share/ansible/openshift-ansible/playbooks/openshift-node/private/join.yml:41 fatal: [ec2-54-149-214-42.us-west-2.compute.amazonaws.com]: FAILED! => { "changed": false, "failed": true, "msg": "Node approval failed" } Version-Release number of the following components: rpm -q openshift-ansible openshift-ansible-3.10.0-0.60.0.git.0.bf95bf8.el7.noarch rpm -q ansible ansible-2.4.4.0-1.el7ae.noarch ansible --version ansible 2.4.4.0 config file = /etc/ansible/ansible.cfg configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules'] ansible python module location = /usr/lib/python2.7/site-packages/ansible executable location = /usr/bin/ansible python version = 2.7.5 (default, May 4 2018, 09:38:16) [GCC 4.8.5 20150623 (Red Hat 4.8.5-34)] Steps to Reproduce: 1. create a 3.10.0-0.60 cluster 2. run node scaleup playbook /usr/share/ansible/openshift-ansible/playbooks/openshift-node/scaleup.yml Actual results: Please include the entire output from the last TASK line through the end of output if an error is generated Expected results: Scaleup should succeed Additional info: Please attach logs from ansible-playbook with the -vvv flag