Bug 1762932

Summary: Backup on only 1 master causing issues in - openshift_certificate_expiry : Check cert expirys on host task
Product: OpenShift Container Platform Reporter: Vladislav Walek <vwalek>
Component: InstallerAssignee: Russell Teague <rteague>
Installer sub component: openshift-ansible QA Contact: Gaoyun Pei <gpei>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: unspecified CC: jjerezro, rteague
Version: 3.11.0   
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Certificates were only backed up on the first master. Consequence: If the redeploy-certificates playbook failed during execution, it could happen that certificates were deleted on all masters which would result in the playbook failing when run again.  To recover, certificates would have to be restored from backup which could be time-consuming. Fix: Back up certificates on all masters. Result: If certificates need to be recovered for any master, they are available in a locally generated file archive.
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-18 14:52:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Vladislav Walek 2019-10-17 20:04:20 UTC
Description of problem:

When running the playbook "openshift-ansible/playbooks/redeploy-certificates.yml" there is task to create backup on one master, but remove all certs on all masters:
https://github.com/openshift/openshift-ansible/blob/release-3.11/playbooks/openshift-master/private/certificates-backup.yml

If the playbook fails, the next run will fail on task:
TASK [openshift_certificate_expiry : Check cert expirys on host] 
Because the certs are missing.

To fix it, the certs should be restored, however, without backup from other masters it is not possible.

Version-Release number of the following components:
openshift ansible 3.11.117

How reproducible:
- running the playbook mutliple times before it finishes will cause the issue.

Steps to Reproduce:
1.
2.
3.

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

it always fails on missing certs like - service-signer.crt, master.server.crt, etc.


Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 2 Russell Teague 2019-10-18 18:17:50 UTC
*** Bug 1751194 has been marked as a duplicate of this bug. ***

Comment 5 Russell Teague 2019-11-08 19:58:09 UTC
Gaoyun,
If the redeploy-certificates.yml playbook fails between removing and recreating certificates, the deleted certificates must be manually restored from the backup file created.  The changes made were to address the issue of not being able to recover files that were not backed up.  To change the code to handle failures of this type would require a significant amount of refactoring over several components.

Comment 6 Gaoyun Pei 2019-11-09 15:10:42 UTC
Thanks for the heads up, Russell!

Move this bug to verified based on Comment 4 and Comment 5, now the master certificates and configs backup would be created on all masters during playbooks/redeploy-certificates.yml.

Comment 8 errata-xmlrpc 2019-11-18 14:52:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3817