Bug 1586010
| Summary: | Redeploy cert playbook fail at TASK [Wait for node to be ready] | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Gaoyun Pei <gpei> |
| Component: | Installer | Assignee: | Scott Dodson <sdodson> |
| Status: | CLOSED DEFERRED | QA Contact: | Gaoyun Pei <gpei> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 3.10.0 | CC: | aos-bugs, jokerman, mmccomas, rteague |
| Target Milestone: | --- | ||
| Target Release: | 3.10.z | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-11-19 20:54:09 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
The cert re-deploy playbooks are likely broken pretty badly. We'll try to address these in a 0-day. There appear to be no active cases related to this bug. As such we're closing this bug in order to focus on bugs that are still tied to active customer cases. Please re-open this bug if you feel it was closed in error or a new active case is attached. |
Description of problem: Run redeploy-certificates.yml playbook against an ocp-3.10 cluster, playbook may fail as below: PLAY [Restart nodes] ******************************************************************************* ... TASK [Restart docker] ******************************************************************************************************************************************************* changed: [ec2-52-90-247-129.compute-1.amazonaws.com] => {"attempts": 1, "changed": true, "failed": false, "name": "docker", "state": "started", ... TASK [Wait for master API to come back online] ****************************************************************************************************************************** ok: [ec2-52-90-247-129.compute-1.amazonaws.com] => {"changed": false, "elapsed": 25, "failed": false, "path": null, "port": 8443, "search_regex": null, "state": "started"} TASK [restart node] ********************************************************************************************************************************************************* changed: [ec2-52-90-247-129.compute-1.amazonaws.com] => {"changed": true, "failed": false, "name": "atomic-openshift-node", "state": "started", "status": ... TASK [Wait for node to be ready] ******************************************************************************************************************************************** fatal: [ec2-52-90-247-129.compute-1.amazonaws.com]: FAILED! => {"failed": true, "msg": "The conditional check 'node_output.results.returncode == 0 and node_output.results.results[0].status.conditions | selectattr('type', 'match', '^Ready$') | map(attribute='status') | join | bool == True' failed. The error was: error while evaluating conditional (node_output.results.returncode == 0 and node_output.results.results[0].status.conditions | selectattr('type', 'match', '^Ready$') | map(attribute='status') | join | bool == True): 'dict object' has no attribute 'results'"} to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/redeploy-certificates.retry Sometimes master-api service still not available when running "TASK [Wait for node to be ready] ", since docker restart will also trigger master-api/controllers service restart, so maybe we need "verify API server" task before restart node. Version-Release number of the following components: openshift-ansible-3.10.0-0.58.0.git.0.d8f6377.el7.noarch How reproducible: 50% (3 failures in 6 attempts) Steps to Reproduce: 1.Run openshift cert redeploy playbook ansible-playbook -i host/310 -vvv /usr/share/ansible/openshift-ansible/playbooks/redeploy-certificates.yml Actual results: Playbook fails as in Description, then log into master, run "/usr/bin/oc get node ip-172-18-11-10.ec2.internal -o json -n default", it could work. Expected results: Additional info: Ansible inventory file and full log with "-vvv" could be found in attachment