Description of problem: Failed to verify etcd cluster healthy while upgrade cri-o based environment TASK [etcd : Verify cluster is healthy] **************************************** <--snip--> FAILED - RETRYING: Verify cluster is healthy (1 retries left). fatal: [qe-ghuang-master-etcd-1.0606-2o9.qe.rhcloud.com]: FAILED! => {"attempts": 30, "changed": true, "cmd": ["/usr/local/bin/master-exec", "etcd", "etcd", "etcdctl", "--cert-file", "/etc/etcd/peer.crt", "--key-file", "/etc/etcd/peer.key", "--ca-file", "/etc/etcd/ca.crt", "-C", "https://qe-ghuang-master-etcd-1:2379", "cluster-health"], "delta": "0:00:00.039182", "end": "2018-06-06 03:39:06.764653", "failed": true, "rc": 0, "start": "2018-06-06 03:39:06.725471", "stderr": "Component etcd is stopped or not running", "stderr_lines": ["Component etcd is stopped or not running"], "stdout": "", "stdout_lines": []} Version-Release number of the following components: openshift-ansible-3.10.0-0.60.0.git.0.bf95bf8.el7.noarch.rpm How reproducible: always Steps to Reproduce: 1. Trigger 3.9 rpm installation with cri-o enabled 2. Upgrade to 3.10 Actual results: Installation failed at task "Verify cluster is healthy" Expected results: Additional info: All containers now should be managed by cri-o, we can't use docker cli to manage the containers.
The issue is that script (/usr/local/bin/master-exec) is only available against docker containers. In this case, the static pods were created via cri-o interface, hence they were unable to be managed via docker cli (docker exec, docker ps, etc).
I believe this is a dupe of https://bugzilla.redhat.com/show_bug.cgi?id=1572440 *** This bug has been marked as a duplicate of bug 1572440 ***