Bug 1813743

Summary: etcd:[DR] should backup and restore all static pods
Product: OpenShift Container Platform Reporter: Sam Batschelet <sbatsche>
Component: EtcdAssignee: Sam Batschelet <sbatsche>
Status: CLOSED ERRATA QA Contact: ge liu <geliu>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.4CC: dmace, geliu, jniu, malonso, mas-hatada, mfuruta, rh-container, skolicha
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 1813744 1829452 (view as bug list) Environment:
Last Closed: 2020-07-13 17:20:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1813744, 1829452    
Attachments:
Description Flags
Replace master field with encryption none

Description Sam Batschelet 2020-03-16 00:27:12 UTC
Description of problem: Currently DR scripts only backup etcd and kubeapiserver. Restore will start the etcd restore pod but not any of the other static pods.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results: only etcd starts on node after restore


Expected results: all static pods start based on the revision they were backed up to.


Additional info:

Comment 1 Suresh Kolichala 2020-03-16 13:27:53 UTC
*** Bug 1812275 has been marked as a duplicate of this bug. ***

Comment 9 Maria Alonso 2020-06-10 11:35:35 UTC
Hi,

Any update about this?

Regards.

Comment 13 Masaki Hatada 2020-06-22 11:17:04 UTC
Dear Red Hat,

We have two questions.

* From OCP4.4, some oc patch commands were added to the recovery steps of the manual in order to redeploy control plane components forcibly.

    Restoring to a previous cluster state
    https://docs.openshift.com/container-platform/4.4/backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.html#dr-restoring-cluster-state

  Is the update just for this bugzilla?

* Even with this OCP4.4 steps, we fail to recover etcd if etcd has been encrypted.
  As we reported at Case 02610044, to restore encryption key, we have to do the following step after running /usr/local/bin/cluster-restore.sh.

    $ tar xvf static_kuberesources_<date>.tar.gz
    $ sudo cp -p static-pod-resources/kube-apiserver-pod-26/kube-apiserver-pod.yaml /etc/kubernetes/manifests/
    $ sudo cp -pr static-pod-resources/kube-apiserver-pod-26 /etc/kubernetes/static-pod-resources/
    $ systemctl restart kubelet

  Could Red Hat describe the above step in the manual?
  Or, are Red Hat planning to describe a more better way in the manual?

Best Regards,
Masaki Hatada

Comment 14 Masaki Furuta ( RH ) 2020-07-01 07:51:22 UTC
(In reply to Masaki Hatada from comment #13)

Dear Sam Batschelet (and Ge Liu),

Would you please take a look at comment 13 by Hatada-san, and please consider to include it ?
In case you find any apparent problem, would you please respond to Hatada-san ?

I am grateful for your help and support.

Thank you,

BR,
Masaki

Comment 15 Suresh Kolichala 2020-07-01 12:26:53 UTC
@Masaki Hatada,

I am surprised you needed to do that. The restore process does copy the manifest from the tar file. The only extra step you are doing is to restart kubelet. Do you have a cluster without these extra steps?

Thanks,
Suresh.

Comment 16 Masaki Hatada 2020-07-02 01:18:58 UTC
Dear Suresh,

Sorry, I might be wrong.
We had to recover static-pod-resources manuall in OCP4.3. But in OCP4.4, indeed the restore script seems to restore static-pod-resources automatically.

We will retest it and let you know the result later.

Best Regards,
Masaki Hatada

Comment 17 Masaki Hatada 2020-07-06 11:51:21 UTC
Created attachment 1700017 [details]
Replace master field with encryption

We confirmed that the steps of OCP4.4 manual can restore etcd even if it was encrypted.
So currently. all problems are gone, we think.

Comment 18 Maria Alonso 2020-07-10 12:40:42 UTC
Hi,

Do you know if this will be backported to 4.3?

Comment 20 errata-xmlrpc 2020-07-13 17:20:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409