Bug 1572016 - Error during 'Copy etcd v3 data store' (3.10->3.10+ upgrade)
Summary: Error during 'Copy etcd v3 data store' (3.10->3.10+ upgrade)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.10.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 3.10.0
Assignee: Vadim Rutkovsky
QA Contact: liujia
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-26 00:46 UTC by Justin Pierce
Modified: 2018-07-30 19:14 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-07-30 19:14:14 UTC
Target Upstream Version:


Attachments (Terms of Use)
openshift-ansible output detailing error (2.13 KB, text/plain)
2018-04-26 00:46 UTC, Justin Pierce
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:1816 None None None 2018-07-30 19:14:34 UTC

Description Justin Pierce 2018-04-26 00:46:42 UTC
Created attachment 1426943 [details]
openshift-ansible output detailing error

Description of problem:
During an upgrade of OCP v3.10.0-0.27.0 -> 0.29.0, the upgrade failed while trying to backup etcd data. See attachment for ansible playbook output.

Version-Release number of the following components:
v3.10.0-0.29.0

Comment 2 Dan Mace 2018-04-30 13:15:54 UTC
Fix for this SEEMS like it might be straightforward:

https://github.com/openshift/openshift-ansible/blob/HEAD/roles/etcd/tasks/backup/backup.yml#L67

The `{{ l_etcd_backup_dir }}/member/snap/` directory needs to exist before the referenced `cp` command is executed. I traced through the code and history and haven't yet figured out how the current code worked to begin with except by accident (given the apparent lack of explicit code to create the directory), but I wasn't yet able to set up an upgrade test to really investigate deeply. I'm probably missing something.

Comment 3 Scott Dodson 2018-05-02 12:34:20 UTC
The problem is that it's attempting to run the static pod backup method on a host that's not been converted to use static pods. We need to convert the host to use static pods and clean up the logic that determines whether the host is running static pods or not. Ideally not relying on inventory variables but instead relying on the host's actual configuration as it's clear we cannot trust inventory state to capture the whole picture.

Comment 4 Scott Dodson 2018-05-04 17:54:47 UTC
https://github.com/openshift/openshift-ansible/pull/8239 should resolve this.

Comment 5 Vadim Rutkovsky 2018-05-07 09:24:31 UTC
Fix is available in openshift-ansible-3.10.0-0.35.0

Comment 6 liujia 2018-05-14 09:27:07 UTC
Verified on openshift-ansible-3.10.0-0.41.0.git.0.88119e4.el7.noarch

Upgrade from ocp 3.10.0-0.29.0->0.41.0 

TASK [etcd : Copy etcd v3 data store] ***************************************************************************************************************************************
changed: [x] => {"changed": true, "cmd": ["cp", "-a", "/var/lib/etcd//member/snap/db", "/var/lib/etcd//openshift-backup-pre-upgrade-20180514045129/member/snap/"], "delta": "0:00:00.132804", "end": "2018-05-14 04:51:45.360001", "rc": 0, "start": "2018-05-14 04:51:45.227197", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}

Comment 8 errata-xmlrpc 2018-07-30 19:14:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1816


Note You need to log in before you can comment on or make changes to this bug.