Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1587860

Summary: Failed to verify etcd cluster healthy while upgrade cri-o based environment
Product: OpenShift Container Platform Reporter: Gan Huang <ghuang>
Component: InstallerAssignee: Scott Dodson <sdodson>
Status: CLOSED DUPLICATE QA Contact: Johnny Liu <jialiu>
Severity: high Docs Contact:
Priority: high    
Version: 3.10.0CC: aos-bugs, jokerman, mmccomas
Target Milestone: ---   
Target Release: 3.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-06 18:33:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Gan Huang 2018-06-06 08:19:09 UTC
Description of problem:
Failed to verify etcd cluster healthy while upgrade cri-o based environment 

TASK [etcd : Verify cluster is healthy] ****************************************
<--snip-->
FAILED - RETRYING: Verify cluster is healthy (1 retries left).

fatal: [qe-ghuang-master-etcd-1.0606-2o9.qe.rhcloud.com]: FAILED! => {"attempts": 30, "changed": true, "cmd": ["/usr/local/bin/master-exec", "etcd", "etcd", "etcdctl", "--cert-file", "/etc/etcd/peer.crt", "--key-file", "/etc/etcd/peer.key", "--ca-file", "/etc/etcd/ca.crt", "-C", "https://qe-ghuang-master-etcd-1:2379", "cluster-health"], "delta": "0:00:00.039182", "end": "2018-06-06 03:39:06.764653", "failed": true, "rc": 0, "start": "2018-06-06 03:39:06.725471", "stderr": "Component etcd is stopped or not running", "stderr_lines": ["Component etcd is stopped or not running"], "stdout": "", "stdout_lines": []}


Version-Release number of the following components:
openshift-ansible-3.10.0-0.60.0.git.0.bf95bf8.el7.noarch.rpm

How reproducible:
always

Steps to Reproduce:
1. Trigger 3.9 rpm installation with cri-o enabled
2. Upgrade to 3.10


Actual results:
Installation failed at task "Verify cluster is healthy"

Expected results:

Additional info:
All containers now should be managed by cri-o, we can't use docker cli to manage the containers.

Comment 1 Gan Huang 2018-06-06 16:01:36 UTC
The issue is that script (/usr/local/bin/master-exec) is only available against docker containers.

In this case, the static pods were created via cri-o interface, hence they were unable to be managed via docker cli (docker exec, docker ps, etc).

Comment 2 Scott Dodson 2018-06-06 18:33:19 UTC
I believe this is a dupe of https://bugzilla.redhat.com/show_bug.cgi?id=1572440

*** This bug has been marked as a duplicate of bug 1572440 ***