Bug 787184
Summary: | Devise a disaster recovery plan (or process) | |||
---|---|---|---|---|
Product: | Red Hat Satellite | Reporter: | James Laska <jlaska> | |
Component: | Docs User Guide | Assignee: | Dan Macpherson <dmacpher> | |
Status: | CLOSED ERRATA | QA Contact: | Og Maciel <omaciel> | |
Severity: | low | Docs Contact: | ||
Priority: | high | |||
Version: | 6.0.1 | CC: | achan, bkearney, cpelland, dgoodwin, dmacpher, gkhachik, inecas, jturner, lbrindle, lzap, omaciel, snansi | |
Target Milestone: | Unspecified | Keywords: | Triaged | |
Target Release: | Unused | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: |
CloudForms 1.1 System Engine User Guide now contains documentation for Backup and Recovery for System Engine.
|
Story Points: | --- | |
Clone Of: | ||||
: | 799020 (view as bug list) | Environment: | ||
Last Closed: | 2012-12-04 19:41:53 UTC | Type: | --- | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1425213 |
Wrong component? Setting owner to James. (In reply to comment #2) > Wrong component? Setting owner to James. Hi Lukas! Please don't reassign a bug to the reporter if you are unsure of the component (it'll just get lost). Resetting to bkearney for now, we can re-evaluate when a more appropriate owner has been identified. Thanks! Hey James/Andy, any news here? I would love to move this out of my plate ;-) From cloud-program ... it sounds like Lana suggests adding this to the release notes is appropriate for v1.0. I'm requesting release notes review using the release_notes flag Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: see comment#5 Brian Hamrick directed me to a really good version of what a disaster recovery procedure should look like ... https://docs.redhat.com/docs/en-US/Red_Hat_Network_Satellite/5.4/html/Deployment_Guide/sect-Getting_Started_Guide-Satellite_Operation_Guidance-Backup_and_Restore_Routines.html Can we work towards this for CloudForms System Engine (short-term and long-term)? Ok I will rewrite our wiki page with more details in the chapter 3.2 "style". The work is basically done, tomorrow I would like to test host1->host2 scenario, but the content should not change much. Only if I find any issues. https://fedorahosted.org/katello/wiki/GuideServerBackups Okay QAs dont have resources for this, they will verify it afterwards. Changing component to doco team. Please the process is documented here: https://fedorahosted.org/katello/wiki/GuideServerBackups There should be an errata advisory associated with that BZ I ran through the documented procedure on the Katello wiki and made some corrections. Those corrections need to be taken into the SE docs, so if you've already copied it, check the version diffs here: https://fedorahosted.org/katello/wiki/GuideServerBackups?action=diff&version=18&old_version=16 I also turned the procedure into 2 bash scripts in the katello source (in src/script/backup.sh and src/script/restore.sh). They are not polished yet but seem good enough for the community to start using instead of the manual steps. There was one issue running the restore procedure though, my already-registered client was getting 403's in yum. I think it may be the same symptom as another CRL issue I had earlier, so I'm going to req info from Ivan to see if restoring a several-day old CRL might cause this. Since this bug WAS in ON_QA when I started, and now it's in ASSIGNED, i'm not sure what to do with it now. @jeff - yes, this very probably causes this issue, since the CRL is not valid anymore. One option is not to restore the CRL, but for the time to next crl generation certs that should be invalid will be accepted. There are two other ways how this could be handled (both on Candlepin side): 1. CP extends the validity of CRL to longer period than one day (if possible) 2. CP provides an API or CLI call to regenerate the CRL @devan - would one of this changes be acceptable in CP? FYI candlepinschema -> candlepin There is outgoing effort to change db name to "candlepinschema" to keep database names consistency, but it did not make into 1.0 unfortunately. Good catch. https://bugzilla.redhat.com/show_bug.cgi?id=805436 Devan, can you let us know if there's a way to regen the CRL immediately or do we have to doc our restore procedure to say "Clients will not be able to get content until the CRL is regenerated automatically (max 24 hours)" I cannot find any way to regenerate the CRL from the API, but adding such a call would be quite possible assuming we can sort out the authentication. I think it would be good idea to add it on Candlepin side, on Katello side we would only need new CLI command (not important to add this to the UI). Something like katello admin crl_regen (I assume crl regeneration can be only done globally - not per owner.) Bryan? Devan, doesn't /crl regenerate the list? My apologies, I think you're right, it's only a GET method, and it calls createCRL, which is just an alias for updateCRL. My bad, I did not consider the GET method might be updating it. Guys try GET /crl as a super admin and see if this helps. Since this bug has been implemented by Dan and not to confuse others, I created new BZ for the CRL regeneration issue: https://bugzilla.redhat.com/show_bug.cgi?id=821644 Verified using: * candlepin-0.7.8-1.el6cf.noarch * candlepin-selinux-0.7.8-1.el6cf.noarch * candlepin-tomcat6-0.7.8-1.el6cf.noarch * katello-1.1.12-12.el6cf.noarch * katello-all-1.1.12-12.el6cf.noarch * katello-candlepin-cert-key-pair-1.0-1.noarch * katello-certs-tools-1.1.8-1.el6cf.noarch * katello-cli-1.1.8-6.el6cf.noarch * katello-cli-common-1.1.8-6.el6cf.noarch * katello-common-1.1.12-12.el6cf.noarch * katello-configure-1.1.9-6.el6cf.noarch * katello-glue-candlepin-1.1.12-12.el6cf.noarch * katello-glue-pulp-1.1.12-12.el6cf.noarch * katello-qpid-broker-key-pair-1.0-1.noarch * katello-qpid-client-key-pair-1.0-1.noarch * katello-selinux-1.1.1-1.el6cf.noarch * pulp-1.1.12-1.el6cf.noarch * pulp-common-1.1.12-1.el6cf.noarch * pulp-selinux-server-1.1.12-1.el6cf.noarch This documentation has now been dropped to translation ahead of publication. For any further issues, please open a new a bug. LKB Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2012-1543.html getting rid of 6.0.0 version since that doesn't exist |
== Description of problem == This bug is intended to track the process of documenting a disaster recovery plan for CloudForms System Engine. On Thu, 2012-02-02 at 18:13 -0500, Todd Warner wrote: The elements of an upgrade are... > 1. ability to backup : restore > 2. yum update (the bits) > 3. the underlying OS > 4. scheme upgrade scripts > > RHN Satellite has a process for three scenarios: > 1. minor upgrade: It's an update (backup and then update the RPMs) > 2. an upgrade that involves the OS: backup, then blow away box, install OS, then install Satellite, then do #1 or #3 > 3. upgrade that involves schema update: backup, update bits, upgrade schema, flip services back on > > That's it in a nutshell. In time for GA we need... > * a disaster recovery plan/process (this is not necessarily the same as a backup : restore process) > > After GA, we need to work towards a... > * Backup and Restore process > * An upgrade process