Bug 1259544 - [DOCS] Document backup and restore procedures
[DOCS] Document backup and restore procedures
Status: CLOSED CURRENTRELEASE
Product: OpenShift Container Platform
Classification: Red Hat
Component: Documentation (Show other bugs)
3.0.0
Unspecified Unspecified
high Severity high
: ---
: ---
Assigned To: Ashley Hardin
Anping Li
Vikram Goyal
: Reopened
Depends On:
Blocks: 1339502
  Show dependency treegraph
 
Reported: 2015-09-02 21:04 EDT by Vikram Goyal
Modified: 2018-02-07 21:04 EST (History)
24 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-01-23 12:55:52 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
A first draft procedure I used to restore from multiple master failures (39.92 KB, text/html)
2016-05-16 08:44 EDT, Dave McCormick
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 1981013 None None None 2018-02-07 21:04 EST

  None (edit)
Description Vikram Goyal 2015-09-02 21:04:56 EDT
As per this: https://github.com/openshift/openshift-docs/issues/803

And more info from the mailing lists to be added.
Comment 2 Christoph Görn 2015-09-03 03:10:50 EDT
I got a chapter on this in SysEng's OSE3 refarch, I will release it to the end of next week. I'm happy to share afterwards.
Comment 9 Alex Dellapenta 2016-01-25 09:09:34 EST
The Downgrading doc has been published:

https://docs.openshift.com/enterprise/latest/install_config/downgrade.html

However, I want to break out the purely backup/restore part as its own topic, which can be topic-shared or linked to from the Downgrading doc. Leaving this BZ open until that work is done.
Comment 15 Thien-Thi Nguyen 2016-05-02 13:35:34 EDT
(In reply to Alex Dellapenta from comment #9)
> However, I want to break out the purely backup/restore part as its own
> topic, which can be topic-shared or linked to from the Downgrading doc.

Hi Alex, quick question: Where in the doc set were you thinking to place the pulled-out backup/restore section?
Comment 16 Alex Dellapenta 2016-05-02 13:57:29 EDT
(In reply to Thien-Thi Nguyen from comment #15)

> Hi Alex, quick question: Where in the doc set were you thinking to place the
> pulled-out backup/restore section?

Probably just its own topic under Cluster Administration.
Comment 21 Dave McCormick 2016-05-16 08:44 EDT
Created attachment 1157894 [details]
A first draft procedure I used to restore from multiple master failures

Hi

Here's a procedure I worked on for recovering from multiple master failures by cloning the remaining good master and re-generating new configurations using Ansible. 

I did hit a number of issues doing this, especially having lost the generated configs which had been put down by the ansible server on a master which had failed.

Hopefully these could help someobody else in a similar situation?

regards


Dave
Comment 22 Paul Weil 2016-05-16 09:11:49 EDT
Thien-Thi Nguyen - to be clear, that card is directed at projects and not a whole cluster.  Just making sure we're on the same page since it looks like there is a mixture of cluster admin/project admin scenarios in some of the comments and attachments.

Currently the oc export command is what can be used to backup items in a project.  The card linked is to make that experience easier along with the ability to backup the images you are using (card: https://trello.com/c/F0KjI88F/663-support-oc-export-image).  Since we don't have the final shape of the future functionality that is about as specific as I would make any references to future (proposed but not yet targeted/committed) work.
Comment 23 Thien-Thi Nguyen 2016-05-19 19:35:49 EDT
(In reply to Paul Weil from comment #22)
> that card is directed at projects and not a
> whole cluster
>
> Currently the oc export command is what can be used to backup
> items in a project.  The card linked is to make that experience
> easier along with the ability to backup the images you are using
> (card: https://trello.com/c/F0KjI88F/663-support-oc-export-image).
> Since we don't have the final shape of the future functionality
> that is about as specific as I would make any references to
> future (proposed but not yet targeted/committed) work.

Thanks for the clarification.  I will emphasize that the current
backup/restore procedures are at the cluster level.  What do you
think of: "Project-level backup/restore will be available in a
future release"?
Comment 24 Thien-Thi Nguyen 2016-05-20 13:11:53 EDT
PR: https://github.com/openshift/openshift-docs/pull/2140
Comment 25 Paul Weil 2016-05-20 13:19:41 EDT
(In reply to Thien-Thi Nguyen from comment #23)
> (In reply to Paul Weil from comment #22)
> > that card is directed at projects and not a
> > whole cluster
> >
> > Currently the oc export command is what can be used to backup
> > items in a project.  The card linked is to make that experience
> > easier along with the ability to backup the images you are using
> > (card: https://trello.com/c/F0KjI88F/663-support-oc-export-image).
> > Since we don't have the final shape of the future functionality
> > that is about as specific as I would make any references to
> > future (proposed but not yet targeted/committed) work.
> 
> Thanks for the clarification.  I will emphasize that the current
> backup/restore procedures are at the cluster level.  What do you
> think of: "Project-level backup/restore will be available in a
> future release"?

I would say that we should point them to the facility we already have (export) for project level "backup" and note that we are continuing to enhance project level capabilities.
Comment 33 Anping Li 2016-06-01 05:00:35 EDT
The inventory files and the files refereed by inventory file (for example: cert files) must be present before restore. so could we add this as a prerequisite.
Comment 34 Anping Li 2016-06-01 05:56:54 EDT
According to the discuss, we have cluster-level and project-level backup and restore.  
For backup. 
1) The Etcd may be running on Independent machine. so it isn't accurate to say 'Create an etcd backup on each master'
2) Data are sync between etcds. Need  we back up against all etcd hosts? Of course it is OK to have Redundant data. but for restore, take care to select one snapshot.
3) 'oc expose' is for project-level. I don't think we need to expose anything for cluster-level backup/resore.  It is better to have different sections for each.
4) If we want to backup to storage server. we need list all files. For example: /etc/origin /etc/sysconfig/atomic-openshift* and etc. 

For restore:
5) the step 'oc process -f cluster.template' is useless.
6) To restore clustered Etcd database. It is a little complicated. restore one etcd database and sync data to other Etcd server. You can refer to https://docs.openshift.com/enterprise/latest/install_config/downgrade.html.
Comment 35 Anping Li 2016-06-01 06:07:01 EDT
7) for external Etcd server, You needn't to install etcd packages. and the etcd data dir is /var/lib/etcd
Comment 37 Thien-Thi Nguyen 2016-06-22 10:36:04 EDT
(In reply to Anping Li from comment #33)
> The inventory files and the files refereed by inventory file (for example:
> cert files) must be present before restore. so could we add this as a
> prerequisite.

Thanks for the tip.  IIUC "inventory files" pertain to installation using ansible, which this topic doesn't address directly.  However, backup and restore of certs and keys is a generally useful step, right?  Towards that end, i've installed:

https://github.com/openshift/openshift-docs/pull/2140/commits/76cf692ce3d28cb1b76f9c19da380edff5b162fc

WDYT?  (I will address comment #34 separately.)
Comment 38 Thien-Thi Nguyen 2016-06-22 11:39:05 EDT
(In reply to Anping Li from comment #33)
> The inventory files and the files refereed by inventory file (for example:
> cert files) must be present before restore. so could we add this as a
> prerequisite.

Now there is a Prerequisites section:
https://github.com/openshift/openshift-docs/pull/2140/commits/44aa9beb5368a441e1cea6f4cc68f73f721a190b

WDYT?  (I'm still interested in WDYT re the change mentioned in comment #37.)
Comment 40 Thien-Thi Nguyen 2016-06-22 17:45:50 EDT
(In reply to Anping Li from comment #34)
Thanks for the review.

> According to the discuss, we have cluster-level and project-level backup and
> restore.

Yes, those are the two levels that are most interesting to customers.

> For backup.
> 1) The Etcd may be running on Independent machine. so it isn't accurate to
> say 'Create an etcd backup on each master'
> 2) Data are sync between etcds. Need  we back up against all etcd hosts? Of
> course it is OK to have Redundant data. but for restore, take care to select
> one snapshot.

Thanks for pointing this out.  I've installed:
https://github.com/openshift/openshift-docs/pull/2140/commits/f6b49f898a40162a47ec1182d8d651668a857516
WDYT?  (See also the answer to (6) below.)

> 3) 'oc expose' is for project-level. I don't think we need to expose
> anything for cluster-level backup/resore.  It is better to have different
> sections for each.

I think you meant to say ‘oc export’.  In this case, we are using ‘oc export --all-namespaces’, which serves as a cluster-level workalike backup mechanism.  It is the command used by many of the support/customer cases attached to this BZ, and also in the downgrading docs, which were the origin of the procedures.  I think the situation may change in the future, but for now, ‘oc export’ is unavoidable.

> 4) If we want to backup to storage server. we need list all files. For
> example: /etc/origin /etc/sysconfig/atomic-openshift* and etc.

Backup/restore for persistent storage is outside the scope of this document.  There is a short sentence to that effect at the end of the overview.  Am i missing something?

> For restore:
> 5) the step 'oc process -f cluster.template' is useless.

Is it still useless if ‘oc export’ is all we have (see answer to (3) above)?

> 6) To restore clustered Etcd database. It is a little complicated. restore
> one etcd database and sync data to other Etcd server. You can refer to
> https://docs.openshift.com/enterprise/latest/install_config/downgrade.html.

I've installed:
https://github.com/openshift/openshift-docs/pull/2140/commits/4259383bad88a8903327f2b2dc4851b54400fdae
Comment 41 Anping Li 2016-06-23 02:50:10 EDT
1) For emmbered etcd, the data dir is /var/lib/origin/etcd
   For external etcd, the data dir is /var/lib/etcd

2) no restore command for etcdctl (etcdctl version 2.2.5). As I know the command was for etcd 3.x.

[root@ha1master1 ~]# etcdctl --help
NAME:
   etcdctl - A simple command line client for etcd.

USAGE:
   etcdctl [global options] command [command options] [arguments...]
   
VERSION:
   2.2.5
   
COMMANDS:
   backup		backup an etcd directory
   cluster-health	check the health of the etcd cluster
   mk			make a new key with a given value
   mkdir		make a new directory
   rm			remove a key or a directory
   rmdir		removes the key if it is an empty directory or a key-value pair
   get			retrieve the value of a key
   ls			retrieve a directory
   set			set the value of a key
   setdir		create a new or existing directory
   update		update an existing key with a given value
   updatedir		update an existing directory
   watch		watch a key for changes
   exec-watch		watch a key for changes and exec an executable
   member		member add, remove and list subcommands
   import		import a snapshot to a cluster
   user			user add, grant and revoke subcommands
   role			role add, grant and revoke subcommands
   auth			overall auth controls
   help, h		Shows a list of commands or help for one command


3) For 'oc export all'. 

First: I still don't think it is necessary for cluster backup.  Could you provide more evidence?

Second:  I failed to restore template backed up by this command.
[root@ha1master1 ~]#  oc export all --exact  --as-template=cluster.template  >cluster.template
[root@ha1master1 ~]# oc process -f cluster.template 
error: bufio.Scanner: token too long
Comment 44 Thien-Thi Nguyen 2016-07-21 10:11:57 EDT
(In reply to Anping Li from comment #41)
> 1) For emmbered etcd, the data dir is /var/lib/origin/etcd
>    For external etcd, the data dir is /var/lib/etcd

This is addressed in:
https://github.com/openshift/openshift-docs/pull/2140/commits/6a595d33c3d6aec6cc7175c4cb8594ca95768f91

> 2) no restore command for etcdctl (etcdctl version 2.2.5). As I know the
> command was for etcd 3.x.

Yes, you are right.  Please see:
https://github.com/openshift/openshift-docs/pull/2140/commits/0935ca37de3e9b139e47623e233adb48119e8412

> 3) For 'oc export all'.
>
> First: I still don't think it is necessary for cluster backup.
> Could you provide more evidence?

Hmm, the only evidence i have is what engineering has suggested.  It feels like a workaround because support for simple full backup/restore is really not a feature yet, as far as i can tell.  (In other words, it *is* a workaround!)

> Second:  I failed to restore template backed up by this command.
> [root@ha1master1 ~]#  oc export all --exact  --as-template=cluster.template
> >cluster.template
> [root@ha1master1 ~]# oc process -f cluster.template
> error: bufio.Scanner: token too long

Is this error reliably reproducible?

Presuming yes, i will try to reproduce the error on the docs team OSE 3.2 instance and respond in a different comment.  In the meantime, could you try again, this time w/ the ‘--raw’ option, and capture the output?  Something like:

$ oc process -f cluster.template --raw > cluster.raw

(Then attach cluster.raw to the BZ.)
Comment 45 Thien-Thi Nguyen 2016-07-21 13:20:08 EDT
(In reply to Thien-Thi Nguyen from comment #44)
> > 3) For 'oc export all'.

I've discovered that i have misunderstood the ‘--as-template’ option.  I thought it specified an output filename.  Instead, it specifies the ‘metadata.name’ of the ‘Template’ object.  Output goes to standard output, and must be redirected to a file using ‘> FILENAME’.  You mentioned this in comment #41 but i failed to notice it properly.  Sorry about the confusion, my bad!

The following change reflects this new understanding:
https://github.com/openshift/openshift-docs/pull/2140/commits/108a3f572a00b236a0580c1c32163b96c0cd2c26
Comment 46 Anping Li 2016-07-22 06:20:06 EDT
oc process -f mycluster.template.yaml --raw works. 

It is better to use same file named in 1 & 2.

1. Create a template for all cluster API objects: 
2. Create the API objects for the cluster:
Comment 47 Jaspreet Kaur 2016-07-25 01:55:45 EDT
Please make a note in backup & restore documentation which all resources can be exported and which will not as per [2]:

[1] https://github.com/tnguyen-rh/openshift-docs/blob/4259383bad88a8903327f2b2dc4851b54400fdae/admin_guide/backup_restore.adoc

[2] https://github.com/kubernetes/kubernetes/pull/28955#issuecomment-232737113
Comment 48 Thien-Thi Nguyen 2016-07-25 03:40:33 EDT
(In reply to Anping Li from comment #46)
> oc process -f mycluster.template.yaml --raw works.
>
> It is better to use same file named in 1 & 2.
>
> 1. Create a template for all cluster API objects:
> 2. Create the API objects for the cluster:

Please see:
https://github.com/openshift/openshift-docs/pull/2140/commits/190a6b79df1811cde840e5e5637319e820cbfeb7
Comment 50 Thien-Thi Nguyen 2016-07-25 06:30:03 EDT
(In reply to Jaspreet Kaur from comment #47)
> Please make a note in backup & restore documentation which all resources can
> be exported and which will not as per:
> https://github.com/kubernetes/kubernetes/pull/28955#issuecomment-232737113

I've started to list the object types that i have been able to verify experimentally:
https://github.com/openshift/openshift-docs/pull/2140/commits/5116b5a90166af7a87838a6a7bbeb6bb92dcccbb

Am i going in the right direction?
Comment 52 Kenjiro Nakayama 2016-07-25 20:24:41 EDT
I got a feedback from one of the customer and added it to the github comment. Could you please check it?
https://github.com/openshift/openshift-docs/pull/2140/files#r72169023
Comment 53 Kenjiro Nakayama 2016-07-25 20:44:24 EDT
As for comment#52, I set Need Info from Anping Li.
Comment 54 Thien-Thi Nguyen 2016-07-26 09:25:18 EDT
(In reply to Kenjiro Nakayama from comment #52)
> I got a feedback from one of the customer and added it to the github
> comment. Could you please check it?
> https://github.com/openshift/openshift-docs/pull/2140/files#r72169023

Thank-you for the feedback.  Please see:
https://github.com/openshift/openshift-docs/pull/2140#discussion_r72246931
Comment 55 Anping Li 2016-07-26 21:08:10 EDT
It looks good to me
Comment 64 Ashley Hardin 2016-08-22 15:50:03 EDT
Work was continued in https://github.com/openshift/openshift-docs/pull/2689
Comment 67 Ashley Hardin 2016-12-08 11:37:35 EST
This is addressed in https://github.com/openshift/openshift-docs/pull/3310

QE verified in https://bugzilla.redhat.com/show_bug.cgi?id=1389517

Note You need to log in before you can comment on or make changes to this bug.