Bug 1586010 - Redeploy cert playbook fail at TASK [Wait for node to be ready]
Summary: Redeploy cert playbook fail at TASK [Wait for node to be ready]
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.10.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.10.z
Assignee: Scott Dodson
QA Contact: Gaoyun Pei
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-05 10:06 UTC by Gaoyun Pei
Modified: 2018-11-19 20:54 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-11-19 20:54:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1585978 0 high CLOSED Redeploy openshift ca playbook failed 2021-02-22 00:41:40 UTC

Internal Links: 1585978

Description Gaoyun Pei 2018-06-05 10:06:35 UTC
Description of problem:
Run redeploy-certificates.yml playbook against an ocp-3.10 cluster, playbook may fail as below:

PLAY [Restart nodes] *******************************************************************************
...

TASK [Restart docker] *******************************************************************************************************************************************************
changed: [ec2-52-90-247-129.compute-1.amazonaws.com] => {"attempts": 1, "changed": true, "failed": false, "name": "docker", "state": "started", ...


TASK [Wait for master API to come back online] ******************************************************************************************************************************
ok: [ec2-52-90-247-129.compute-1.amazonaws.com] => {"changed": false, "elapsed": 25, "failed": false, "path": null, "port": 8443, "search_regex": null, "state": "started"}

TASK [restart node] *********************************************************************************************************************************************************
changed: [ec2-52-90-247-129.compute-1.amazonaws.com] => {"changed": true, "failed": false, "name": "atomic-openshift-node", "state": "started", "status": ...

TASK [Wait for node to be ready] ********************************************************************************************************************************************
fatal: [ec2-52-90-247-129.compute-1.amazonaws.com]: FAILED! => {"failed": true, "msg": "The conditional check 'node_output.results.returncode == 0 and node_output.results.results[0].status.conditions | selectattr('type', 'match', '^Ready$') | map(attribute='status') | join | bool == True' failed. The error was: error while evaluating conditional (node_output.results.returncode == 0 and node_output.results.results[0].status.conditions | selectattr('type', 'match', '^Ready$') | map(attribute='status') | join | bool == True): 'dict object' has no attribute 'results'"}
	to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/redeploy-certificates.retry



Sometimes master-api service still not available when running "TASK [Wait for node to be ready] ", since docker restart will also trigger master-api/controllers service restart, so maybe we need "verify API server" task before restart node.


Version-Release number of the following components:
openshift-ansible-3.10.0-0.58.0.git.0.d8f6377.el7.noarch


How reproducible:
50% (3 failures in 6 attempts)

Steps to Reproduce:
1.Run openshift cert redeploy playbook
ansible-playbook -i host/310 -vvv /usr/share/ansible/openshift-ansible/playbooks/redeploy-certificates.yml


Actual results:
Playbook fails as in Description, then log into master, run "/usr/bin/oc get node ip-172-18-11-10.ec2.internal -o json -n default", it could work.


Expected results:


Additional info:
Ansible inventory file and full log with "-vvv" could be found in attachment

Comment 3 Scott Dodson 2018-06-05 15:13:50 UTC
The cert re-deploy playbooks are likely broken pretty badly. We'll try to address these in a 0-day.

Comment 5 Russell Teague 2018-11-19 20:54:09 UTC
There appear to be no active cases related to this bug. As such we're closing this bug in order to focus on bugs that are still tied to active customer cases. Please re-open this bug if you feel it was closed in error or a new active case is attached.


Note You need to log in before you can comment on or make changes to this bug.