Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1587860

Summary:	Failed to verify etcd cluster healthy while upgrade cri-o based environment
Product:	OpenShift Container Platform	Reporter:	Gan Huang <ghuang>
Component:	Installer	Assignee:	Scott Dodson <sdodson>
Status:	CLOSED DUPLICATE	QA Contact:	Johnny Liu <jialiu>
Severity:	high	Docs Contact:
Priority:	high
Version:	3.10.0	CC:	aos-bugs, jokerman, mmccomas
Target Milestone:	---
Target Release:	3.10.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-06-06 18:33:19 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Gan Huang 2018-06-06 08:19:09 UTC

Description of problem:
Failed to verify etcd cluster healthy while upgrade cri-o based environment 

TASK [etcd : Verify cluster is healthy] ****************************************
<--snip-->
FAILED - RETRYING: Verify cluster is healthy (1 retries left).

fatal: [qe-ghuang-master-etcd-1.0606-2o9.qe.rhcloud.com]: FAILED! => {"attempts": 30, "changed": true, "cmd": ["/usr/local/bin/master-exec", "etcd", "etcd", "etcdctl", "--cert-file", "/etc/etcd/peer.crt", "--key-file", "/etc/etcd/peer.key", "--ca-file", "/etc/etcd/ca.crt", "-C", "https://qe-ghuang-master-etcd-1:2379", "cluster-health"], "delta": "0:00:00.039182", "end": "2018-06-06 03:39:06.764653", "failed": true, "rc": 0, "start": "2018-06-06 03:39:06.725471", "stderr": "Component etcd is stopped or not running", "stderr_lines": ["Component etcd is stopped or not running"], "stdout": "", "stdout_lines": []}


Version-Release number of the following components:
openshift-ansible-3.10.0-0.60.0.git.0.bf95bf8.el7.noarch.rpm

How reproducible:
always

Steps to Reproduce:
1. Trigger 3.9 rpm installation with cri-o enabled
2. Upgrade to 3.10


Actual results:
Installation failed at task "Verify cluster is healthy"

Expected results:

Additional info:
All containers now should be managed by cri-o, we can't use docker cli to manage the containers.

Comment 1 Gan Huang 2018-06-06 16:01:36 UTC

The issue is that script (/usr/local/bin/master-exec) is only available against docker containers.

In this case, the static pods were created via cri-o interface, hence they were unable to be managed via docker cli (docker exec, docker ps, etc).

Comment 2 Scott Dodson 2018-06-06 18:33:19 UTC

I believe this is a dupe of https://bugzilla.redhat.com/show_bug.cgi?id=1572440

*** This bug has been marked as a duplicate of bug 1572440 ***