Bug 1572016

Summary: Error during 'Copy etcd v3 data store' (3.10->3.10+ upgrade)
Product: OpenShift Container Platform Reporter: Justin Pierce <jupierce>
Component: InstallerAssignee: Vadim Rutkovsky <vrutkovs>
Status: CLOSED ERRATA QA Contact: liujia <jiajliu>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.10.0CC: aos-bugs, dmace, jokerman, mmccomas, wmeng
Target Milestone: ---Keywords: OpsBlocker
Target Release: 3.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-07-30 19:14:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
openshift-ansible output detailing error none

Description Justin Pierce 2018-04-26 00:46:42 UTC
Created attachment 1426943 [details]
openshift-ansible output detailing error

Description of problem:
During an upgrade of OCP v3.10.0-0.27.0 -> 0.29.0, the upgrade failed while trying to backup etcd data. See attachment for ansible playbook output.

Version-Release number of the following components:
v3.10.0-0.29.0

Comment 2 Dan Mace 2018-04-30 13:15:54 UTC
Fix for this SEEMS like it might be straightforward:

https://github.com/openshift/openshift-ansible/blob/HEAD/roles/etcd/tasks/backup/backup.yml#L67

The `{{ l_etcd_backup_dir }}/member/snap/` directory needs to exist before the referenced `cp` command is executed. I traced through the code and history and haven't yet figured out how the current code worked to begin with except by accident (given the apparent lack of explicit code to create the directory), but I wasn't yet able to set up an upgrade test to really investigate deeply. I'm probably missing something.

Comment 3 Scott Dodson 2018-05-02 12:34:20 UTC
The problem is that it's attempting to run the static pod backup method on a host that's not been converted to use static pods. We need to convert the host to use static pods and clean up the logic that determines whether the host is running static pods or not. Ideally not relying on inventory variables but instead relying on the host's actual configuration as it's clear we cannot trust inventory state to capture the whole picture.

Comment 4 Scott Dodson 2018-05-04 17:54:47 UTC
https://github.com/openshift/openshift-ansible/pull/8239 should resolve this.

Comment 5 Vadim Rutkovsky 2018-05-07 09:24:31 UTC
Fix is available in openshift-ansible-3.10.0-0.35.0

Comment 6 liujia 2018-05-14 09:27:07 UTC
Verified on openshift-ansible-3.10.0-0.41.0.git.0.88119e4.el7.noarch

Upgrade from ocp 3.10.0-0.29.0->0.41.0 

TASK [etcd : Copy etcd v3 data store] ***************************************************************************************************************************************
changed: [x] => {"changed": true, "cmd": ["cp", "-a", "/var/lib/etcd//member/snap/db", "/var/lib/etcd//openshift-backup-pre-upgrade-20180514045129/member/snap/"], "delta": "0:00:00.132804", "end": "2018-05-14 04:51:45.360001", "rc": 0, "start": "2018-05-14 04:51:45.227197", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}

Comment 8 errata-xmlrpc 2018-07-30 19:14:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1816