Bug 1625534 - Fail to upgrade ocp with standalone etcd_container due to wrong etcd command used
Summary: Fail to upgrade ocp with standalone etcd_container due to wrong etcd command ...
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 3.11.z
Assignee: Vadim Rutkovsky
QA Contact: ge liu
Depends On:
TreeView+ depends on / blocked
Reported: 2018-09-05 07:22 UTC by liujia
Modified: 2019-01-10 09:04 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: incorrect etcdctl command used during etcd backup for system containers Consequence: etcd backup fails during upgrade Fix: etcd system container now identified correctly Result: upgrade succeeds with etcd in system container
Clone Of:
Last Closed: 2019-01-10 09:03:58 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0024 0 None None None 2019-01-10 09:04:05 UTC

Description liujia 2018-09-05 07:22:01 UTC
Description of problem:
Upgrade against ocp with standalone etcd_container failed due to etcdctl command was used wrongly for an etcd_container service.
TASK [etcd : Generate etcd backup] *********************************************
task path: /usr/share/ansible/openshift-ansible/roles/etcd/tasks/backup/backup.yml:47
Wednesday 05 September 2018  06:51:02 +0000 (0:00:01.539)       0:04:00.845 *** 
fatal: [x]: FAILED! => {"changed": false, "cmd": "etcdctl backup --data-dir=/var/lib/etcd/ --backup-dir=/var/lib/etcd//openshift-backup-pre-upgrade-20180905065054", "msg": "[Errno 2] No such file or directory", "rc": 2}

Checked etcd running as a container on the standalone etcd host. So here should add “docker exec etcd_container” to run etcd backup command. 
[root@ip-172-18-8-100 ~]# docker ps
CONTAINER ID        IMAGE                                   COMMAND             CREATED             STATUS              PORTS               NAMES
9ad4be1d5032        registry.access.redhat.com/rhel7/etcd   "/usr/bin/etcd"     About an hour ago   Up About an hour                        etcd_container
[root@ip-172-18-8-100 ~]# rpm -qa|grep etcd
[root@ip-172-18-8-100 ~]# etcdctl
-bash: etcdctl: command not found

Version-Release number of the following components:

How reproducible:

Steps to Reproduce:
1. Docker container install ocp v3.9 with standalone etcd on rhel
2. Upgrade v3.9 to v3.10, etcd still run as container after upgrade
3. Upgrade v3.10 to v3.11

Actual results:
Upgrade failed.

Expected results:
Upgrade succeed.

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 3 Scott Dodson 2018-11-01 14:17:58 UTC
There were some recent changes related to etcd upgrade playbooks, can this please be tested with the latest 3.11 code?

Comment 4 ge liu 2018-11-05 10:04:21 UTC
Verified with: ansible-2.6.6-1.el7ae.noarch, openshift-ansible-3.11.39-1.git.0.fe42b3b.el7.noarch

1. Docker container install ocp v3.9 with standalone etcd on rhel
2. Upgrade v3.9 to v3.10
3. Upgrade v3.10(change the openshift_release to v3.11 in inventory file) to v3.11 successfully.

Comment 6 errata-xmlrpc 2019-01-10 09:03:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.