Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1419670

Summary: [DOCS] Incorrect etcd backup and restore procedure
Product: OpenShift Container Platform Reporter: Jaspreet Kaur <jkaur>
Component: DocumentationAssignee: Ashley Hardin <ahardin>
Status: CLOSED CURRENTRELEASE QA Contact: Anping Li <anli>
Severity: high Docs Contact: Vikram Goyal <vigoyal>
Priority: high    
Version: 3.3.0CC: anli, aos-bugs, bfallonf, jkaur, jokerman, mmccomas, rhowe, sttts, tstclair
Target Milestone: ---Flags: sttts: needinfo+
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-27 15:52:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1405338    

Description Jaspreet Kaur 2017-02-06 17:31:09 UTC
Description of problem:
Being in the situation of a disaster recovery the procedure that is provided in documentation at

 https://docs.openshift.com/container-platform/3.4/admin_guide/backup_restore.html#cluster-backup 

or at  

https://docs.openshift.com/container-platform/3.3/admin_guide/backup_restore.html#cluster-backup is not useful because the etcd process won't start again due to the fact that the db file is missing in backup (${ETCD_DATA_DIR}/member/snap/db).

We realize that etcd3 runs in a compatibility mode and the procedure for restoring the v2 keys it's the same but It seems that it also needs that file and backing up that file it's impossible because the "snapshot" argument of "etcdctl" command it's not available which should be according to the coreos docs: https://coreos.com/etcd/docs/3.0.15/op-guide/recovery.html.

Etcd fails to start

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results: Etcd do not starts and the procedure is incorrect


Expected results: Etcd should have started and needs a updating in documentation. 

Additional info:

Comment 1 Timothy St. Clair 2017-02-06 22:20:49 UTC
This seems like two separate issues... 

1. Update the docs on disaster recovery 
2. Determine why your etcd instance is not starting.  

To #1, iirc before an upgrade a snapshot is saved.

Comment 26 Stefan Schimanski 2017-02-23 07:53:51 UTC
*** Bug 1421072 has been marked as a duplicate of this bug. ***

Comment 31 Ashley Hardin 2017-03-03 17:17:34 UTC
Updated pull request: https://github.com/openshift/openshift-docs/pull/3827

Comment 39 Anping Li 2017-03-16 01:27:41 UTC
(In reply to Anping Li from comment #36)
> For
> https://github.com/ahardin-rh/openshift-docs/blob/
> 5cfb6dc2ee7fcb1d15007ad85afb2998f81e6cdf/admin_guide/backup_restore.
> adoc#cluster-backup
> Should note 'cp "$ETCD_DATA_DIR"/member/snap/db "$HOTDIR"/member/snap/db'
> when use etcd 3.0.15.

It was talked at comment 2,10,15.

> For
> https://github.com/ahardin-rh/openshift-docs/blob/
> 5cfb6dc2ee7fcb1d15007ad85afb2998f81e6cdf/admin_guide/backup_restore.
> adoc#cluster-restore
> No step force-new-cluster and restart etcd service.  without them, for
> single-member etcd clusters, we  also need to see 
> https://docs.openshift.com/container-platform/3.4/install_config/downgrade.
> html#downgrading-restoring-embedded-etcd

There is a note 'This restore operation only works for single-member etcd clusters. For multiple-member etcd clusters, see Restoring etcd.'. In fact, the following restore operation aren't complete.  the step force-new-cluster and restart etcd service is missing.

Either change the note or copy force-new-cluster and restart etcd service step herein.


> 4.c)
> mkdir $PREFIX before run openssl

Without $PREFIX directory, the following command will fail.

> 4.e) cp ca.crt ${PREFIX} -> cp ca/ca.crt ${PREFIX}
  This step is not necessary; drop it.

Comment 41 Anping Li 2017-03-20 01:25:29 UTC
1. https://github.com/ahardin-rh/openshift-docs/blob/240abad8bc6109fc349c6f5b76521e144f08119a/admin_guide/backup_restore.adoc#cluster-backup
# tar cf /tmp/certs-and-keys-$(hostname).tar *.key *.crt' \
    master.proxy-client.crt \
    master.proxy-client.key \
    proxyca.crt \
    proxyca.key \
    master.server.crt \
    master.server.key \
    ca.crt \
    ca.key \
    master.etcd-client.crt \
    master.etcd-client.key \
    master.etcd-ca.crt

Should be 
# tar cf /tmp/certs-and-keys-$(hostname).tar *.key *.crt  

2. https://github.com/ahardin-rh/openshift-docs/blob/240abad8bc6109fc349c6f5b76521e144f08119a/admin_guide/backup_restore.adoc#cluster-restore-for-single-member-etcd-clusters
 
A similar step need to be added as https://github.com/ahardin-rh/openshift-docs/blob/240abad8bc6109fc349c6f5b76521e144f08119a/admin_guide/backup_restore.adoc#external-etcd: step 4 

For example:


Verify the etcd service started correctly, then re-edit the /usr/lib/systemd/system/etcd.service file and remove the --force-new-cluster option:

# sed -i '/ExecStart/s/ --force-new-cluster//' /usr/lib/systemd/system/etcd.service
# cat /usr/lib/systemd/system/etcd.service  | grep ExecStart

ExecStart=/bin/bash -c "GOMAXPROCS=$(nproc) /usr/bin/etcd"



Then restart the etcd service:

# systemctl daemon-reload
# systemctl start etcd



3. The other part looks good

Comment 43 Anping Li 2017-03-21 10:05:29 UTC
https://github.com/ahardin-rh/openshift-docs/blob/240abad8bc6109fc349c6f5b76521e144f08119a/admin_guide/backup_restore.adoc#cluster-backup

 tar cf /tmp/certs-and-keys-$(hostname).tar *.key *.crt \
>     master.proxy-client.crt \
>     master.proxy-client.key \
>     proxyca.crt \
>     proxyca.key \
>     master.server.crt \
>     master.server.key \
>     ca.crt \
>     ca.key \
>     master.etcd-client.crt \
>     master.etcd-client.key \
>     master.etcd-ca.crt
tar: proxyca.crt: Cannot stat: No such file or directory
tar: proxyca.key: Cannot stat: No such file or directory


1) The name be vary for crt and key files. For example: The Custom specify different names. That is why I suggested using command 'tar cf /tmp/certs-and-keys-$(hostname).tar *.key *.crt '.

Comment 45 Anping Li 2017-03-23 08:06:07 UTC
It look good to me.

Comment 46 openshift-github-bot 2017-03-23 13:29:23 UTC
Commits pushed to master at https://github.com/openshift/openshift-docs

https://github.com/openshift/openshift-docs/commit/b38042de02d9780842dce95cfa0ef45d53b58bc6
Bug 1419670, Update backup and restore procedure

https://github.com/openshift/openshift-docs/commit/be0f62d5b5e30b5a56a061382cec07cba1909f94
Merge pull request #3827 from ahardin-rh/etcd-backup-restore

Bug 1419670, Update backup and restore procedure