Bug 1762932 - Backup on only 1 master causing issues in - openshift_certificate_expiry : Check cert expirys on host task
Summary: Backup on only 1 master causing issues in - openshift_certificate_expiry : Ch...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.11.z
Assignee: Russell Teague
QA Contact: Gaoyun Pei
URL:
Whiteboard:
: 1751194 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-17 20:04 UTC by Vladislav Walek
Modified: 2019-11-18 14:52 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Certificates were only backed up on the first master. Consequence: If the redeploy-certificates playbook failed during execution, it could happen that certificates were deleted on all masters which would result in the playbook failing when run again.  To recover, certificates would have to be restored from backup which could be time-consuming. Fix: Back up certificates on all masters. Result: If certificates need to be recovered for any master, they are available in a locally generated file archive.
Clone Of:
Environment:
Last Closed: 2019-11-18 14:52:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift openshift-ansible pull 11966 0 'None' closed Bug 1762932: Back up certificates on all masters 2020-09-07 16:37:17 UTC
Red Hat Product Errata RHBA-2019:3817 0 None None None 2019-11-18 14:52:21 UTC

Description Vladislav Walek 2019-10-17 20:04:20 UTC
Description of problem:

When running the playbook "openshift-ansible/playbooks/redeploy-certificates.yml" there is task to create backup on one master, but remove all certs on all masters:
https://github.com/openshift/openshift-ansible/blob/release-3.11/playbooks/openshift-master/private/certificates-backup.yml

If the playbook fails, the next run will fail on task:
TASK [openshift_certificate_expiry : Check cert expirys on host] 
Because the certs are missing.

To fix it, the certs should be restored, however, without backup from other masters it is not possible.

Version-Release number of the following components:
openshift ansible 3.11.117

How reproducible:
- running the playbook mutliple times before it finishes will cause the issue.

Steps to Reproduce:
1.
2.
3.

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

it always fails on missing certs like - service-signer.crt, master.server.crt, etc.


Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 2 Russell Teague 2019-10-18 18:17:50 UTC
*** Bug 1751194 has been marked as a duplicate of this bug. ***

Comment 5 Russell Teague 2019-11-08 19:58:09 UTC
Gaoyun,
If the redeploy-certificates.yml playbook fails between removing and recreating certificates, the deleted certificates must be manually restored from the backup file created.  The changes made were to address the issue of not being able to recover files that were not backed up.  To change the code to handle failures of this type would require a significant amount of refactoring over several components.

Comment 6 Gaoyun Pei 2019-11-09 15:10:42 UTC
Thanks for the heads up, Russell!

Move this bug to verified based on Comment 4 and Comment 5, now the master certificates and configs backup would be created on all masters during playbooks/redeploy-certificates.yml.

Comment 8 errata-xmlrpc 2019-11-18 14:52:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3817


Note You need to log in before you can comment on or make changes to this bug.