Bug 1625534

Summary: Fail to upgrade ocp with standalone etcd_container due to wrong etcd command used
Product: OpenShift Container Platform Reporter: liujia <jiajliu>
Component: Cluster Version OperatorAssignee: Vadim Rutkovsky <vrutkovs>
Status: CLOSED ERRATA QA Contact: ge liu <geliu>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.11.0CC: aos-bugs, jokerman, mmccomas, wmeng
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: incorrect etcdctl command used during etcd backup for system containers Consequence: etcd backup fails during upgrade Fix: etcd system container now identified correctly Result: upgrade succeeds with etcd in system container
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-01-10 09:03:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description liujia 2018-09-05 07:22:01 UTC
Description of problem:
Upgrade against ocp with standalone etcd_container failed due to etcdctl command was used wrongly for an etcd_container service.
TASK [etcd : Generate etcd backup] *********************************************
task path: /usr/share/ansible/openshift-ansible/roles/etcd/tasks/backup/backup.yml:47
Wednesday 05 September 2018  06:51:02 +0000 (0:00:01.539)       0:04:00.845 *** 
fatal: [x]: FAILED! => {"changed": false, "cmd": "etcdctl backup --data-dir=/var/lib/etcd/ --backup-dir=/var/lib/etcd//openshift-backup-pre-upgrade-20180905065054", "msg": "[Errno 2] No such file or directory", "rc": 2}

Checked etcd running as a container on the standalone etcd host. So here should add “docker exec etcd_container” to run etcd backup command. 
[root@ip-172-18-8-100 ~]# docker ps
CONTAINER ID        IMAGE                                   COMMAND             CREATED             STATUS              PORTS               NAMES
9ad4be1d5032        registry.access.redhat.com/rhel7/etcd   "/usr/bin/etcd"     About an hour ago   Up About an hour                        etcd_container
[root@ip-172-18-8-100 ~]# rpm -qa|grep etcd
[root@ip-172-18-8-100 ~]# etcdctl
-bash: etcdctl: command not found


Version-Release number of the following components:
ansible-2.6.3-1.el7ae.noarch
openshift-ansible-3.11.0-0.25.0.git.0.7497e69.el7.noarch

How reproducible:
always

Steps to Reproduce:
1. Docker container install ocp v3.9 with standalone etcd on rhel
2. Upgrade v3.9 to v3.10, etcd still run as container after upgrade
3. Upgrade v3.10 to v3.11

Actual results:
Upgrade failed.

Expected results:
Upgrade succeed.

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 3 Scott Dodson 2018-11-01 14:17:58 UTC
There were some recent changes related to etcd upgrade playbooks, can this please be tested with the latest 3.11 code?

Comment 4 ge liu 2018-11-05 10:04:21 UTC
Verified with: ansible-2.6.6-1.el7ae.noarch, openshift-ansible-3.11.39-1.git.0.fe42b3b.el7.noarch

1. Docker container install ocp v3.9 with standalone etcd on rhel
2. Upgrade v3.9 to v3.10
3. Upgrade v3.10(change the openshift_release to v3.11 in inventory file) to v3.11 successfully.

Comment 6 errata-xmlrpc 2019-01-10 09:03:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0024