Created attachment 1426943 [details] openshift-ansible output detailing error Description of problem: During an upgrade of OCP v3.10.0-0.27.0 -> 0.29.0, the upgrade failed while trying to backup etcd data. See attachment for ansible playbook output. Version-Release number of the following components: v3.10.0-0.29.0
Fix for this SEEMS like it might be straightforward: https://github.com/openshift/openshift-ansible/blob/HEAD/roles/etcd/tasks/backup/backup.yml#L67 The `{{ l_etcd_backup_dir }}/member/snap/` directory needs to exist before the referenced `cp` command is executed. I traced through the code and history and haven't yet figured out how the current code worked to begin with except by accident (given the apparent lack of explicit code to create the directory), but I wasn't yet able to set up an upgrade test to really investigate deeply. I'm probably missing something.
The problem is that it's attempting to run the static pod backup method on a host that's not been converted to use static pods. We need to convert the host to use static pods and clean up the logic that determines whether the host is running static pods or not. Ideally not relying on inventory variables but instead relying on the host's actual configuration as it's clear we cannot trust inventory state to capture the whole picture.
https://github.com/openshift/openshift-ansible/pull/8239 should resolve this.
Fix is available in openshift-ansible-3.10.0-0.35.0
Verified on openshift-ansible-3.10.0-0.41.0.git.0.88119e4.el7.noarch Upgrade from ocp 3.10.0-0.29.0->0.41.0 TASK [etcd : Copy etcd v3 data store] *************************************************************************************************************************************** changed: [x] => {"changed": true, "cmd": ["cp", "-a", "/var/lib/etcd//member/snap/db", "/var/lib/etcd//openshift-backup-pre-upgrade-20180514045129/member/snap/"], "delta": "0:00:00.132804", "end": "2018-05-14 04:51:45.360001", "rc": 0, "start": "2018-05-14 04:51:45.227197", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1816