Bug 1532147
Summary: | katello-backup needs DBs running - not checked, doc says contrary | ||
---|---|---|---|
Product: | Red Hat Satellite | Reporter: | Pavel Moravec <pmoravec> |
Component: | Satellite Maintain | Assignee: | Martin Bacovsky <mbacovsk> |
Status: | CLOSED ERRATA | QA Contact: | Lukáš Hellebrandt <lhellebr> |
Severity: | medium | Docs Contact: | |
Priority: | high | ||
Version: | Unspecified | CC: | adujicek, apatel, bbuckingham, bkearney, cfouant, egolov, inecas, kgaikwad, lhellebr, mbacovsk, rhbgs.10.bigi_gigi |
Target Milestone: | 6.4.0 | Keywords: | Triaged |
Target Release: | Unused | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | rubygem-foreman_maintain-0.2.4 | Doc Type: | Enhancement |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-10-16 19:28:50 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Pavel Moravec
2018-01-08 07:32:27 UTC
katello-backup IMHO needs to ensure postgresql and mongod services are running - we can't assume this prior katello-backup run in any case. I.e. removing the notice about "run katello-service stop prior a backup" from Doc isnt sufficient - still there can be customers and scenarios when the backup tool is run with all / DBs services down. Please note that the introduction of the "-y" option made non interactive offline backup scripts break! The backup continued to the offline backup procedure without stopping the databases, leading to backups probably being corrupt. Some days passed until we were aware of the issue, should a restore have been necessary, the results would have been disastrous. (In reply to Bengt Giger from comment #4) > Please note that the introduction of the "-y" option made non interactive > offline backup scripts break! The backup continued to the offline backup > procedure without stopping the databases, leading to backups probably being > corrupt. Some days passed until we were aware of the issue, should a restore > have been necessary, the results would have been disastrous. Good point. Currently, our documentation says "katello-service stop" must be run before running backup, so that way databases will be stopped. BUT: - when stopping katello-service (or just the DBs), "foreman-rake plugin:list" fails to be collected - when letting DBs running, their backup can be inconsistent Therefore katello-backup needs: - running DBs when calling "foreman-rake plugin:list" - stopped DBs when taking offline backup - above working regardless of services status at the beginning - relevant Doc update (is "katello-service stop" really needed?) Christine, do you need extra BZ for this different but relevant flaw (stop DBs before taking offline backup)? I would vote to deal it within this BZ to cover all scenarios wrt. "above working regardless of services status at the beginning". (In reply to Pavel Moravec from comment #0) > Description of problem: > Try running katello-backup with postgresql+mongo down, few issues can be hit: > > 1) "foreman-rake plugin:list" fails to collect plugin list and raises error > "PG::Error: could not connect to server" > > 2) when running with --online-backup, katello-backup fails to call pg_dump* > and mongodump commands. > > > Since our documentation in: > > https://access.redhat.com/documentation/en-us/red_hat_satellite/6.2/html- > single/server_administration_guide/#sect-Red_Hat_Satellite- > Server_Administration_Guide-Backup_and_Disaster_Recovery- > Backing_up_Satellite_Server_or_Capsule_Server > > suggests to run the backup after running katello-service stop, we suggest to > let the tool fail at *any* customer. > > > I suggest to ensure DBs services are running at the beginning of the backup > tool. > > > > Version-Release number of selected component (if applicable): > Sat 6.2.13 / katello-common-3.0.0-31.el7sat.noarch > > How reproducible: > 100% > > > Steps to Reproduce: > 1. katello-service stop > 2. katello-backup -y /tmp > 3. katello-backup -y /tmp --online-backup > > > Actual results: > 2. raises error "PG::Error: could not connect to server:" and fails to > collect plugins list (metadata.yml in archive will miss it) > 3. fails to dump any DB > > > Expected results: > 2. not to raise an error and to collect plugin list in metadata.yml > 3. to dump all DBs > > > Additional info: Hi Pavel - the docs specifically state not to stop services prior if you are running the backup script, but I will put a couple measures in to ensure that the necessary services are not down. Created redmine issue http://projects.theforeman.org/issues/23124 from this bug This change is available in foreman_maintain-0.2.6 so marking it as ON_QA. FailedQA with Sat 6.4 snap 19. The results differ based on service status before running the backup: # mkdir /tmp/bup # rm -rf /tmp/bup/* # katello-service stop # foreman-maintain backup online /tmp/bup # katello-service restart # foreman-maintain backup online /tmp/bup # diff /tmp/bup/*/metadata.yml I checked the differences and what I see is:
# diff /tmp/bup/*/metadata.yml
18c18,28
< proxy_features: ''
---
> proxy_features:
> - ansible
> - discovery
> - dynflow
> - logs
> - openscap
> - pulp
> - puppet
> - puppetca
> - ssh
> - tftp
That means the proxy was down and the list of the proxy features couldn't be queried. This data are not used in restore so this is technically valid backup and the issue originally reported was IMO fixed. To make the current state more clear we could add note such as "Internal proxy is down features couldn't be listed" to the metadata but I'd suggest to do that in separate low prio issue. Is that acceptable solution?
Also I don't think we should enforce proxy start prior the backup as there may be reasons for the service being down (data integrity).
That makes sense. Verified with Sat 6.4 snap 21. Tried online backup, offline backup, restore. Checked plugin list presence. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2927 |