1586008 – "Wait for all control plane pods to become ready" task failed when etcd is not co-located with master

Bug 1586008 - "Wait for all control plane pods to become ready" task failed when etcd is not co-located with master

Summary: "Wait for all control plane pods to become ready" task failed when etcd is no...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	3.10.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	3.10.0
Assignee:	Scott Dodson
QA Contact:	Johnny Liu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-06-05 09:56 UTC by Johnny Liu
Modified:	2018-07-23 13:14 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:	undefined
Clone Of:
Environment:
Last Closed:	2018-06-18 17:03:40 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
installation log with inventory file embedded (1.96 MB, text/plain) 2018-06-05 09:56 UTC, Johnny Liu	no flags	Details
View All

Description Johnny Liu 2018-06-05 09:56:16 UTC

Created attachment 1447786 [details]
installation log with inventory file embedded

Description of problem:
In 93a2fcd9 commit, we have such code is merged:
- name: Wait for all control plane pods to become ready
  oc_obj:
    state: list
    kind: pod
    name: "master-{{ item }}-{{ openshift.node.nodename | lower }}"
    namespace: kube-system
 <--snip-->
  retries: 60
  delay: 5
  with_items:
  - "{{ 'etcd' if inventory_hostname in groups['oo_etcd_to_config'] else omit }}"
  - api
  - controllers

when etcd is not co-located with master, the "omit" would return some invalid value, which lead to the installation failure.

Version-Release number of the following components:
openshift-ansible-3.10.0-0.58.0.git.0.d8f6377.el7.noarch

How reproducible:
Always

Steps to Reproduce:
1. define etcd group and master group, they are not co-located together.
2. trigger installation
3.

Actual results:
TASK [openshift_control_plane : Wait for all control plane pods to become ready] ***
<--snip-->
FAILED - RETRYING: Wait for all control plane pods to become ready (1 retries left).
failed: [ec2-54-146-104-127.compute-1.amazonaws.com] (item=__omit_place_holder__ee64c3e15fab51456d4a1fd4ea054f1dc27b73d6) => {"attempts": 60, "changed": false, "failed": true, "item": "__omit_place_holder__ee64c3e15fab51456d4a1fd4ea054f1dc27b73d6", "results": {"cmd": "/usr/local/bin/oc get pod master-__omit_place_holder__ee64c3e15fab51456d4a1fd4ea054f1dc27b73d6-ip-172-18-6-122.ec2.internal -o json -n kube-system", "results": [{}], "returncode": 0, "stderr": "Error from server (NotFound): pods \"master-__omit_place_holder__ee64c3e15fab51456d4a1fd4ea054f1dc27b73d6-ip-172-18-6-122.ec2.internal\" not found\n", "stdout": ""}, "state": "list"}
<--snip-->


Expected results:
installation should be passed.

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Scott Dodson 2018-06-05 12:51:17 UTC

Need to filter for etcd hosts that are also masters.

Comment 2 Scott Dodson 2018-06-05 13:45:22 UTC

https://github.com/openshift/openshift-ansible/pull/8636

Comment 3 Scott Dodson 2018-06-05 17:33:31 UTC

Correction, need to stop using omit.

Comment 5 Johnny Liu 2018-06-07 03:31:08 UTC

Verified this bug with openshift-ansible-3.10.0-0.63.0.git.0.961c60d.el7.noarch, and PASS.

TASK [openshift_control_plane : Wait for all control plane pods to become ready] ***
Wednesday 06 June 2018  22:11:11 -0400 (0:00:00.061)       0:26:28.607 ******** 
skipping: [ec2-52-90-178-192.compute-1.amazonaws.com] => (item=)  => {"changed": false, "item": "", "skip_reason": "Conditional result was False", "skipped": true}
skipping: [ec2-75-101-190-167.compute-1.amazonaws.com] => (item=)  => {"changed": false, "item": "", "skip_reason": "Conditional result was False", "skipped": true}
skipping: [ec2-54-242-184-100.compute-1.amazonaws.com] => (item=)  => {"changed": false, "item": "", "skip_reason": "Conditional result was False", "skipped": true}
<--snip-->

only check api and controller pod, etcd pod is skipped now.

Note You need to log in before you can comment on or make changes to this bug.