Bug 1587860 - Failed to verify etcd cluster healthy while upgrade cri-o based environment
Summary: Failed to verify etcd cluster healthy while upgrade cri-o based environment
Keywords:
Status: CLOSED DUPLICATE of bug 1572440
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.10.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.10.0
Assignee: Scott Dodson
QA Contact: Johnny Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-06 08:19 UTC by Gan Huang
Modified: 2018-06-06 18:33 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-06-06 18:33:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Gan Huang 2018-06-06 08:19:09 UTC
Description of problem:
Failed to verify etcd cluster healthy while upgrade cri-o based environment 

TASK [etcd : Verify cluster is healthy] ****************************************
<--snip-->
FAILED - RETRYING: Verify cluster is healthy (1 retries left).

fatal: [qe-ghuang-master-etcd-1.0606-2o9.qe.rhcloud.com]: FAILED! => {"attempts": 30, "changed": true, "cmd": ["/usr/local/bin/master-exec", "etcd", "etcd", "etcdctl", "--cert-file", "/etc/etcd/peer.crt", "--key-file", "/etc/etcd/peer.key", "--ca-file", "/etc/etcd/ca.crt", "-C", "https://qe-ghuang-master-etcd-1:2379", "cluster-health"], "delta": "0:00:00.039182", "end": "2018-06-06 03:39:06.764653", "failed": true, "rc": 0, "start": "2018-06-06 03:39:06.725471", "stderr": "Component etcd is stopped or not running", "stderr_lines": ["Component etcd is stopped or not running"], "stdout": "", "stdout_lines": []}


Version-Release number of the following components:
openshift-ansible-3.10.0-0.60.0.git.0.bf95bf8.el7.noarch.rpm

How reproducible:
always

Steps to Reproduce:
1. Trigger 3.9 rpm installation with cri-o enabled
2. Upgrade to 3.10


Actual results:
Installation failed at task "Verify cluster is healthy"

Expected results:

Additional info:
All containers now should be managed by cri-o, we can't use docker cli to manage the containers.

Comment 1 Gan Huang 2018-06-06 16:01:36 UTC
The issue is that script (/usr/local/bin/master-exec) is only available against docker containers.

In this case, the static pods were created via cri-o interface, hence they were unable to be managed via docker cli (docker exec, docker ps, etc).

Comment 2 Scott Dodson 2018-06-06 18:33:19 UTC
I believe this is a dupe of https://bugzilla.redhat.com/show_bug.cgi?id=1572440

*** This bug has been marked as a duplicate of bug 1572440 ***


Note You need to log in before you can comment on or make changes to this bug.