Red Hat Bugzilla – Bug 787184
Devise a disaster recovery plan (or process)
Last modified: 2017-02-20 16:13:24 EST
== Description of problem ==
This bug is intended to track the process of documenting a disaster recovery plan for CloudForms System Engine.
On Thu, 2012-02-02 at 18:13 -0500, Todd Warner wrote:
The elements of an upgrade are...
> 1. ability to backup : restore
> 2. yum update (the bits)
> 3. the underlying OS
> 4. scheme upgrade scripts
> RHN Satellite has a process for three scenarios:
> 1. minor upgrade: It's an update (backup and then update the RPMs)
> 2. an upgrade that involves the OS: backup, then blow away box, install OS, then install Satellite, then do #1 or #3
> 3. upgrade that involves schema update: backup, update bits, upgrade schema, flip services back on
> That's it in a nutshell. In time for GA we need...
> * a disaster recovery plan/process (this is not necessarily the same as a backup : restore process)
> After GA, we need to work towards a...
> * Backup and Restore process
> * An upgrade process
Wrong component? Setting owner to James.
(In reply to comment #2)
> Wrong component? Setting owner to James.
Hi Lukas! Please don't reassign a bug to the reporter if you are unsure of the component (it'll just get lost). Resetting to bkearney for now, we can re-evaluate when a more appropriate owner has been identified. Thanks!
Hey James/Andy, any news here? I would love to move this out of my plate ;-)
From firstname.lastname@example.org ... it sounds like Lana suggests adding this to the release notes is appropriate for v1.0.
I'm requesting release notes review using the release_notes flag
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
Brian Hamrick directed me to a really good version of what a disaster recovery procedure should look like ...
Can we work towards this for CloudForms System Engine (short-term and long-term)?
Ok I will rewrite our wiki page with more details in the chapter 3.2 "style".
The work is basically done, tomorrow I would like to test host1->host2 scenario, but the content should not change much. Only if I find any issues.
Okay QAs dont have resources for this, they will verify it afterwards. Changing component to doco team. Please the process is documented here:
There should be an errata advisory associated with that BZ
I ran through the documented procedure on the Katello wiki and made some corrections. Those corrections need to be taken into the SE docs, so if you've already copied it, check the version diffs here:
I also turned the procedure into 2 bash scripts in the katello source (in src/script/backup.sh and src/script/restore.sh). They are not polished yet but seem good enough for the community to start using instead of the manual steps.
There was one issue running the restore procedure though, my already-registered client was getting 403's in yum. I think it may be the same symptom as another CRL issue I had earlier, so I'm going to req info from Ivan to see if restoring a several-day old CRL might cause this.
Since this bug WAS in ON_QA when I started, and now it's in ASSIGNED, i'm not sure what to do with it now.
@jeff - yes, this very probably causes this issue, since the CRL is not valid anymore. One option is not to restore the CRL, but for the time to next crl generation certs that should be invalid will be accepted. There are two other ways how this could be handled (both on Candlepin side):
1. CP extends the validity of CRL to longer period than one day (if possible)
2. CP provides an API or CLI call to regenerate the CRL
@devan - would one of this changes be acceptable in CP?
FYI candlepinschema -> candlepin
There is outgoing effort to change db name to "candlepinschema" to keep database names consistency, but it did not make into 1.0 unfortunately. Good catch.
Devan, can you let us know if there's a way to regen the CRL immediately or do we have to doc our restore procedure to say "Clients will not be able to get content until the CRL is regenerated automatically (max 24 hours)"
I cannot find any way to regenerate the CRL from the API, but adding such a call would be quite possible assuming we can sort out the authentication.
I think it would be good idea to add it on Candlepin side, on Katello side we would only need new CLI command (not important to add this to the UI). Something like
katello admin crl_regen
(I assume crl regeneration can be only done globally - not per owner.)
Devan, doesn't /crl regenerate the list?
My apologies, I think you're right, it's only a GET method, and it calls createCRL, which is just an alias for updateCRL. My bad, I did not consider the GET method might be updating it.
Guys try GET /crl as a super admin and see if this helps.
Since this bug has been implemented by Dan and not to confuse others, I created new BZ for the CRL regeneration issue: https://bugzilla.redhat.com/show_bug.cgi?id=821644
This documentation has now been dropped to translation ahead of publication. For any further issues, please open a new a bug.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
getting rid of 6.0.0 version since that doesn't exist