Bug 787184

Summary: Devise a disaster recovery plan (or process)
Product: Red Hat Satellite Reporter: James Laska <jlaska>
Component: Docs User GuideAssignee: Dan Macpherson <dmacpher>
Status: CLOSED ERRATA QA Contact: Og Maciel <omaciel>
Severity: low Docs Contact:
Priority: high    
Version: 6.0.1CC: achan, bkearney, cpelland, dgoodwin, dmacpher, gkhachik, inecas, jturner, lbrindle, lzap, omaciel, snansi
Target Milestone: UnspecifiedKeywords: Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
CloudForms 1.1 System Engine User Guide now contains documentation for Backup and Recovery for System Engine.
Story Points: ---
Clone Of:
: 799020 (view as bug list) Environment:
Last Closed: 2012-12-04 19:41:53 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1425213    

Description James Laska 2012-02-03 12:32:38 UTC
== Description of problem ==

This bug is intended to track the process of documenting a disaster recovery plan for CloudForms System Engine.

On Thu, 2012-02-02 at 18:13 -0500, Todd Warner wrote:
The elements of an upgrade are...
> 1. ability to backup : restore
> 2. yum update (the bits)
> 3. the underlying OS
> 4. scheme upgrade scripts
> 
> RHN Satellite has a process for three scenarios:
> 1. minor upgrade: It's an update (backup and then update the RPMs)
> 2. an upgrade that involves the OS: backup, then blow away box, install OS, then install Satellite, then do #1 or #3
> 3. upgrade that involves schema update: backup, update bits, upgrade schema, flip services back on
> 
> That's it in a nutshell. In time for GA we need...
> * a disaster recovery plan/process (this is not necessarily the same as a backup : restore process)
> 
> After GA, we need to work towards a...
> * Backup and Restore process
> * An upgrade process

Comment 2 Lukas Zapletal 2012-02-07 11:38:36 UTC
Wrong component? Setting owner to James.

Comment 3 James Laska 2012-02-07 12:11:15 UTC
(In reply to comment #2)
> Wrong component? Setting owner to James.

Hi Lukas!  Please don't reassign a bug to the reporter if you are unsure of the component (it'll just get lost).  Resetting to bkearney for now, we can re-evaluate when a more appropriate owner has been identified.  Thanks!

Comment 8 Lukas Zapletal 2012-03-20 09:22:16 UTC
Hey James/Andy, any news here? I would love to move this out of my plate ;-)

Comment 9 James Laska 2012-03-20 20:44:24 UTC
From cloud-program ... it sounds like Lana suggests adding this to the release notes is appropriate for v1.0.

I'm requesting release notes review using the release_notes flag

Comment 10 James Laska 2012-03-20 20:44:25 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
see comment#5

Comment 11 James Laska 2012-03-20 20:47:22 UTC
Brian Hamrick directed me to a really good version of what a disaster recovery procedure should look like ...

https://docs.redhat.com/docs/en-US/Red_Hat_Network_Satellite/5.4/html/Deployment_Guide/sect-Getting_Started_Guide-Satellite_Operation_Guidance-Backup_and_Restore_Routines.html

Can we work towards this for CloudForms System Engine (short-term and long-term)?

Comment 12 Lukas Zapletal 2012-03-21 08:37:29 UTC
Ok I will rewrite our wiki page with more details in the chapter 3.2 "style".

Comment 13 Lukas Zapletal 2012-03-21 16:34:50 UTC
The work is basically done, tomorrow I would like to test host1->host2 scenario, but the content should not change much. Only if I find any issues.

https://fedorahosted.org/katello/wiki/GuideServerBackups

Comment 17 Lukas Zapletal 2012-03-27 14:43:47 UTC
Okay QAs dont have resources for this, they will verify it afterwards. Changing component to doco team. Please the process is documented here:

https://fedorahosted.org/katello/wiki/GuideServerBackups

Comment 18 Lukas Zapletal 2012-03-27 14:46:45 UTC
There should be an errata advisory associated with that BZ

Comment 28 Jeff Weiss 2012-04-16 17:50:23 UTC
I ran through the documented procedure on the Katello wiki and made some corrections.  Those corrections need to be taken into the SE docs, so if you've already copied it, check the version diffs here:
https://fedorahosted.org/katello/wiki/GuideServerBackups?action=diff&version=18&old_version=16

I also turned the procedure into 2 bash scripts in the katello source (in src/script/backup.sh and src/script/restore.sh).  They are not polished yet but seem good enough for the community to start using instead of the manual steps.

There was one issue running the restore procedure though, my already-registered client was getting 403's in yum.  I think it may be the same symptom as another CRL issue I had earlier, so I'm going to req info from Ivan to see if restoring a several-day old CRL might cause this. 

Since this bug WAS in ON_QA when I started, and now it's in ASSIGNED, i'm not sure what to do with it now.

Comment 29 Ivan Necas 2012-04-17 06:41:17 UTC
@jeff - yes, this very probably causes this issue, since the CRL is not valid anymore. One option is not to restore the CRL, but for the time to next crl generation certs that should be invalid will be accepted. There are two other ways how this could be handled (both on Candlepin side):

  1. CP extends the validity of CRL to longer period than one day (if possible)
  2. CP provides an API or CLI call to regenerate the CRL


@devan - would one of this changes be acceptable in CP?

Comment 30 Lukas Zapletal 2012-04-17 13:37:11 UTC
FYI candlepinschema -> candlepin

There is outgoing effort to change db name to "candlepinschema" to keep database names consistency, but it did not make into 1.0 unfortunately. Good catch.

https://bugzilla.redhat.com/show_bug.cgi?id=805436

Comment 31 Jeff Weiss 2012-04-19 13:39:08 UTC
Devan, can you let us know if there's a way to regen the CRL immediately or do we have to doc our restore procedure to say "Clients will not be able to get content until the CRL is regenerated automatically (max 24 hours)"

Comment 32 Devan Goodwin 2012-04-30 13:34:17 UTC
I cannot find any way to regenerate the CRL from the API, but adding such a call would be quite possible assuming we can sort out the authentication.

Comment 33 Lukas Zapletal 2012-05-02 07:43:45 UTC
I think it would be good idea to add it on Candlepin side, on Katello side we would only need new CLI command (not important to add this to the UI). Something like

katello admin crl_regen

(I assume crl regeneration can be only done globally - not per owner.)

Bryan?

Comment 34 Bryan Kearney 2012-05-02 18:55:10 UTC
Devan, doesn't /crl regenerate the list?

Comment 35 Devan Goodwin 2012-05-03 11:16:32 UTC
My apologies, I think you're right, it's only a GET method, and it calls createCRL, which is just an alias for updateCRL. My bad, I did not consider the GET method might be updating it. 

Guys try GET /crl as a super admin and see if this helps.

Comment 37 Lukas Zapletal 2012-05-15 08:30:28 UTC
Since this bug has been implemented by Dan and not to confuse others, I created new BZ for the CRL regeneration issue: https://bugzilla.redhat.com/show_bug.cgi?id=821644

Comment 42 Og Maciel 2012-10-04 16:39:03 UTC
Verified using:

* candlepin-0.7.8-1.el6cf.noarch
* candlepin-selinux-0.7.8-1.el6cf.noarch
* candlepin-tomcat6-0.7.8-1.el6cf.noarch
* katello-1.1.12-12.el6cf.noarch
* katello-all-1.1.12-12.el6cf.noarch
* katello-candlepin-cert-key-pair-1.0-1.noarch
* katello-certs-tools-1.1.8-1.el6cf.noarch
* katello-cli-1.1.8-6.el6cf.noarch
* katello-cli-common-1.1.8-6.el6cf.noarch
* katello-common-1.1.12-12.el6cf.noarch
* katello-configure-1.1.9-6.el6cf.noarch
* katello-glue-candlepin-1.1.12-12.el6cf.noarch
* katello-glue-pulp-1.1.12-12.el6cf.noarch
* katello-qpid-broker-key-pair-1.0-1.noarch
* katello-qpid-client-key-pair-1.0-1.noarch
* katello-selinux-1.1.1-1.el6cf.noarch
* pulp-1.1.12-1.el6cf.noarch
* pulp-common-1.1.12-1.el6cf.noarch
* pulp-selinux-server-1.1.12-1.el6cf.noarch

Comment 43 Lana Brindley 2012-11-19 02:34:20 UTC
This documentation has now been dropped to translation ahead of publication. For any further issues, please open a new a bug.

LKB

Comment 44 errata-xmlrpc 2012-12-04 19:41:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-1543.html

Comment 45 Mike McCune 2013-08-16 18:21:29 UTC
getting rid of 6.0.0 version since that doesn't exist