1572016 – Error during 'Copy etcd v3 data store' (3.10->3.10+ upgrade)

Bug 1572016 - Error during 'Copy etcd v3 data store' (3.10->3.10+ upgrade)

Summary: Error during 'Copy etcd v3 data store' (3.10->3.10+ upgrade)

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	3.10.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	3.10.0
Assignee:	Vadim Rutkovsky
QA Contact:	liujia
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-04-26 00:46 UTC by Justin Pierce
Modified:	2018-07-30 19:14 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-07-30 19:14:14 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
openshift-ansible output detailing error (2.13 KB, text/plain) 2018-04-26 00:46 UTC, Justin Pierce	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2018:1816	0	None	None	None	2018-07-30 19:14:34 UTC

Description Justin Pierce 2018-04-26 00:46:42 UTC

Created attachment 1426943 [details]
openshift-ansible output detailing error

Description of problem:
During an upgrade of OCP v3.10.0-0.27.0 -> 0.29.0, the upgrade failed while trying to backup etcd data. See attachment for ansible playbook output.

Version-Release number of the following components:
v3.10.0-0.29.0

Comment 2 Dan Mace 2018-04-30 13:15:54 UTC

Fix for this SEEMS like it might be straightforward:

https://github.com/openshift/openshift-ansible/blob/HEAD/roles/etcd/tasks/backup/backup.yml#L67

The `{{ l_etcd_backup_dir }}/member/snap/` directory needs to exist before the referenced `cp` command is executed. I traced through the code and history and haven't yet figured out how the current code worked to begin with except by accident (given the apparent lack of explicit code to create the directory), but I wasn't yet able to set up an upgrade test to really investigate deeply. I'm probably missing something.

Comment 3 Scott Dodson 2018-05-02 12:34:20 UTC

The problem is that it's attempting to run the static pod backup method on a host that's not been converted to use static pods. We need to convert the host to use static pods and clean up the logic that determines whether the host is running static pods or not. Ideally not relying on inventory variables but instead relying on the host's actual configuration as it's clear we cannot trust inventory state to capture the whole picture.

Comment 4 Scott Dodson 2018-05-04 17:54:47 UTC

https://github.com/openshift/openshift-ansible/pull/8239 should resolve this.

Comment 5 Vadim Rutkovsky 2018-05-07 09:24:31 UTC

Fix is available in openshift-ansible-3.10.0-0.35.0

Comment 6 liujia 2018-05-14 09:27:07 UTC

Verified on openshift-ansible-3.10.0-0.41.0.git.0.88119e4.el7.noarch

Upgrade from ocp 3.10.0-0.29.0->0.41.0 

TASK [etcd : Copy etcd v3 data store] ***************************************************************************************************************************************
changed: [x] => {"changed": true, "cmd": ["cp", "-a", "/var/lib/etcd//member/snap/db", "/var/lib/etcd//openshift-backup-pre-upgrade-20180514045129/member/snap/"], "delta": "0:00:00.132804", "end": "2018-05-14 04:51:45.360001", "rc": 0, "start": "2018-05-14 04:51:45.227197", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}

Comment 8 errata-xmlrpc 2018-07-30 19:14:14 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1816

Note You need to log in before you can comment on or make changes to this bug.