Description of problem: Being in the situation of a disaster recovery the procedure that is provided in documentation at https://docs.openshift.com/container-platform/3.4/admin_guide/backup_restore.html#cluster-backup or at https://docs.openshift.com/container-platform/3.3/admin_guide/backup_restore.html#cluster-backup is not useful because the etcd process won't start again due to the fact that the db file is missing in backup (${ETCD_DATA_DIR}/member/snap/db). We realize that etcd3 runs in a compatibility mode and the procedure for restoring the v2 keys it's the same but It seems that it also needs that file and backing up that file it's impossible because the "snapshot" argument of "etcdctl" command it's not available which should be according to the coreos docs: https://coreos.com/etcd/docs/3.0.15/op-guide/recovery.html. Etcd fails to start Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Etcd do not starts and the procedure is incorrect Expected results: Etcd should have started and needs a updating in documentation. Additional info:
This seems like two separate issues... 1. Update the docs on disaster recovery 2. Determine why your etcd instance is not starting. To #1, iirc before an upgrade a snapshot is saved.
*** Bug 1421072 has been marked as a duplicate of this bug. ***
Updated pull request: https://github.com/openshift/openshift-docs/pull/3827
(In reply to Anping Li from comment #36) > For > https://github.com/ahardin-rh/openshift-docs/blob/ > 5cfb6dc2ee7fcb1d15007ad85afb2998f81e6cdf/admin_guide/backup_restore. > adoc#cluster-backup > Should note 'cp "$ETCD_DATA_DIR"/member/snap/db "$HOTDIR"/member/snap/db' > when use etcd 3.0.15. It was talked at comment 2,10,15. > For > https://github.com/ahardin-rh/openshift-docs/blob/ > 5cfb6dc2ee7fcb1d15007ad85afb2998f81e6cdf/admin_guide/backup_restore. > adoc#cluster-restore > No step force-new-cluster and restart etcd service. without them, for > single-member etcd clusters, we also need to see > https://docs.openshift.com/container-platform/3.4/install_config/downgrade. > html#downgrading-restoring-embedded-etcd There is a note 'This restore operation only works for single-member etcd clusters. For multiple-member etcd clusters, see Restoring etcd.'. In fact, the following restore operation aren't complete. the step force-new-cluster and restart etcd service is missing. Either change the note or copy force-new-cluster and restart etcd service step herein. > 4.c) > mkdir $PREFIX before run openssl Without $PREFIX directory, the following command will fail. > 4.e) cp ca.crt ${PREFIX} -> cp ca/ca.crt ${PREFIX} This step is not necessary; drop it.
1. https://github.com/ahardin-rh/openshift-docs/blob/240abad8bc6109fc349c6f5b76521e144f08119a/admin_guide/backup_restore.adoc#cluster-backup # tar cf /tmp/certs-and-keys-$(hostname).tar *.key *.crt' \ master.proxy-client.crt \ master.proxy-client.key \ proxyca.crt \ proxyca.key \ master.server.crt \ master.server.key \ ca.crt \ ca.key \ master.etcd-client.crt \ master.etcd-client.key \ master.etcd-ca.crt Should be # tar cf /tmp/certs-and-keys-$(hostname).tar *.key *.crt 2. https://github.com/ahardin-rh/openshift-docs/blob/240abad8bc6109fc349c6f5b76521e144f08119a/admin_guide/backup_restore.adoc#cluster-restore-for-single-member-etcd-clusters A similar step need to be added as https://github.com/ahardin-rh/openshift-docs/blob/240abad8bc6109fc349c6f5b76521e144f08119a/admin_guide/backup_restore.adoc#external-etcd: step 4 For example: Verify the etcd service started correctly, then re-edit the /usr/lib/systemd/system/etcd.service file and remove the --force-new-cluster option: # sed -i '/ExecStart/s/ --force-new-cluster//' /usr/lib/systemd/system/etcd.service # cat /usr/lib/systemd/system/etcd.service | grep ExecStart ExecStart=/bin/bash -c "GOMAXPROCS=$(nproc) /usr/bin/etcd" Then restart the etcd service: # systemctl daemon-reload # systemctl start etcd 3. The other part looks good
https://github.com/ahardin-rh/openshift-docs/blob/240abad8bc6109fc349c6f5b76521e144f08119a/admin_guide/backup_restore.adoc#cluster-backup tar cf /tmp/certs-and-keys-$(hostname).tar *.key *.crt \ > master.proxy-client.crt \ > master.proxy-client.key \ > proxyca.crt \ > proxyca.key \ > master.server.crt \ > master.server.key \ > ca.crt \ > ca.key \ > master.etcd-client.crt \ > master.etcd-client.key \ > master.etcd-ca.crt tar: proxyca.crt: Cannot stat: No such file or directory tar: proxyca.key: Cannot stat: No such file or directory 1) The name be vary for crt and key files. For example: The Custom specify different names. That is why I suggested using command 'tar cf /tmp/certs-and-keys-$(hostname).tar *.key *.crt '.
It look good to me.
Commits pushed to master at https://github.com/openshift/openshift-docs https://github.com/openshift/openshift-docs/commit/b38042de02d9780842dce95cfa0ef45d53b58bc6 Bug 1419670, Update backup and restore procedure https://github.com/openshift/openshift-docs/commit/be0f62d5b5e30b5a56a061382cec07cba1909f94 Merge pull request #3827 from ahardin-rh/etcd-backup-restore Bug 1419670, Update backup and restore procedure
Content is now published: https://access.redhat.com/documentation/en-us/openshift_container_platform/3.4/html-single/cluster_administration/#admin-guide-backup-and-restore