Bug 1532147

Summary:	katello-backup needs DBs running - not checked, doc says contrary
Product:	Red Hat Satellite	Reporter:	Pavel Moravec <pmoravec>
Component:	Satellite Maintain	Assignee:	Martin Bacovsky <mbacovsk>
Status:	CLOSED ERRATA	QA Contact:	Lukáš Hellebrandt <lhellebr>
Severity:	medium	Docs Contact:
Priority:	high
Version:	Unspecified	CC:	adujicek, apatel, bbuckingham, bkearney, cfouant, egolov, inecas, kgaikwad, lhellebr, mbacovsk, rhbgs.10.bigi_gigi
Target Milestone:	6.4.0	Keywords:	Triaged
Target Release:	Unused
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	rubygem-foreman_maintain-0.2.4	Doc Type:	Enhancement
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-10-16 19:28:50 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Pavel Moravec 2018-01-08 07:32:27 UTC

Description of problem:
Try running katello-backup with postgresql+mongo down, few issues can be hit:

1) "foreman-rake plugin:list" fails to collect plugin list and raises error "PG::Error: could not connect to server"

2) when running with --online-backup, katello-backup fails to call pg_dump* and mongodump commands.


Since our documentation in:

https://access.redhat.com/documentation/en-us/red_hat_satellite/6.2/html-single/server_administration_guide/#sect-Red_Hat_Satellite-Server_Administration_Guide-Backup_and_Disaster_Recovery-Backing_up_Satellite_Server_or_Capsule_Server

suggests to run the backup after running katello-service stop, we suggest to let the tool fail at *any* customer.


I suggest to ensure DBs services are running at the beginning of the backup tool.



Version-Release number of selected component (if applicable):
Sat 6.2.13 / katello-common-3.0.0-31.el7sat.noarch

How reproducible:
100%


Steps to Reproduce:
1. katello-service stop
2. katello-backup -y /tmp
3. katello-backup -y /tmp --online-backup


Actual results:
2. raises error "PG::Error: could not connect to server:" and fails to collect plugins list (metadata.yml in archive will miss it)
3. fails to dump any DB


Expected results:
2. not to raise an error and to collect plugin list in metadata.yml
3. to dump all DBs


Additional info:

Comment 2 Pavel Moravec 2018-01-08 07:50:59 UTC

katello-backup IMHO needs to ensure postgresql and mongod services are running - we can't assume this prior katello-backup run in any case.

I.e. removing the notice about "run katello-service stop prior a backup" from Doc isnt sufficient - still there can be customers and scenarios when the backup tool is run with all / DBs services down.

Comment 4 Bengt Giger 2018-01-15 15:23:38 UTC

Please note that the introduction of the "-y" option made non interactive offline backup scripts break! The backup continued to the offline backup procedure without stopping the databases, leading to backups probably being corrupt. Some days passed until we were aware of the issue, should a restore have been necessary, the results would have been disastrous.

Comment 5 Pavel Moravec 2018-01-16 08:02:30 UTC

(In reply to Bengt Giger from comment #4)
> Please note that the introduction of the "-y" option made non interactive
> offline backup scripts break! The backup continued to the offline backup
> procedure without stopping the databases, leading to backups probably being
> corrupt. Some days passed until we were aware of the issue, should a restore
> have been necessary, the results would have been disastrous.

Good point.

Currently, our documentation says "katello-service stop" must be run before running backup, so that way databases will be stopped.

BUT:
- when stopping katello-service (or just the DBs), "foreman-rake plugin:list" fails to be collected
- when letting DBs running, their backup can be inconsistent


Therefore katello-backup needs:
- running DBs when calling "foreman-rake plugin:list"
- stopped DBs when taking offline backup
- above working regardless of services status at the beginning
- relevant Doc update (is "katello-service stop" really needed?)


Christine, do you need extra BZ for this different but relevant flaw (stop DBs before taking offline backup)? I would vote to deal it within this BZ to cover all scenarios wrt. "above working regardless of services status at the beginning".

Comment 6 Christine Fouant 2018-04-04 18:11:45 UTC

(In reply to Pavel Moravec from comment #0)
> Description of problem:
> Try running katello-backup with postgresql+mongo down, few issues can be hit:
> 
> 1) "foreman-rake plugin:list" fails to collect plugin list and raises error
> "PG::Error: could not connect to server"
> 
> 2) when running with --online-backup, katello-backup fails to call pg_dump*
> and mongodump commands.
> 
> 
> Since our documentation in:
> 
> https://access.redhat.com/documentation/en-us/red_hat_satellite/6.2/html-
> single/server_administration_guide/#sect-Red_Hat_Satellite-
> Server_Administration_Guide-Backup_and_Disaster_Recovery-
> Backing_up_Satellite_Server_or_Capsule_Server
> 
> suggests to run the backup after running katello-service stop, we suggest to
> let the tool fail at *any* customer.
> 
> 
> I suggest to ensure DBs services are running at the beginning of the backup
> tool.
> 
> 
> 
> Version-Release number of selected component (if applicable):
> Sat 6.2.13 / katello-common-3.0.0-31.el7sat.noarch
> 
> How reproducible:
> 100%
> 
> 
> Steps to Reproduce:
> 1. katello-service stop
> 2. katello-backup -y /tmp
> 3. katello-backup -y /tmp --online-backup
> 
> 
> Actual results:
> 2. raises error "PG::Error: could not connect to server:" and fails to
> collect plugins list (metadata.yml in archive will miss it)
> 3. fails to dump any DB
> 
> 
> Expected results:
> 2. not to raise an error and to collect plugin list in metadata.yml
> 3. to dump all DBs
> 
> 
> Additional info:

Hi Pavel - the docs specifically state not to stop services prior if you are running the backup script, but I will put a couple measures in to ensure that the necessary services are not down.

Comment 7 Christine Fouant 2018-04-04 19:17:06 UTC

Created redmine issue http://projects.theforeman.org/issues/23124 from this bug

Comment 12 Kavita 2018-08-16 11:42:05 UTC

This change is available in foreman_maintain-0.2.6 so marking it as ON_QA.

Comment 13 Lukáš Hellebrandt 2018-08-29 13:33:49 UTC

FailedQA with Sat 6.4 snap 19.

The results differ based on service status before running the backup:

# mkdir /tmp/bup
# rm -rf /tmp/bup/*
# katello-service stop
# foreman-maintain backup online /tmp/bup
# katello-service restart
# foreman-maintain backup online /tmp/bup
# diff /tmp/bup/*/metadata.yml

Comment 15 Martin Bacovsky 2018-09-14 09:04:24 UTC

I checked the differences and what I see is:

# diff /tmp/bup/*/metadata.yml
18c18,28
< proxy_features: ''
---
> proxy_features:
> - ansible
> - discovery
> - dynflow
> - logs
> - openscap
> - pulp
> - puppet
> - puppetca
> - ssh
> - tftp

That means the proxy was down and the list of the proxy features couldn't be queried. This data are not used in restore so this is technically valid backup and the issue originally reported was IMO fixed. To make the current state more clear we could add note such as "Internal proxy is down features couldn't be listed" to the metadata but I'd suggest to do that in separate low prio issue. Is that acceptable solution?

Also I don't think we should enforce proxy start prior the backup as there may be reasons for the service being down (data integrity).

Comment 16 Lukáš Hellebrandt 2018-09-17 14:37:19 UTC

That makes sense.

Verified with Sat 6.4 snap 21.

Tried online backup, offline backup, restore. Checked plugin list presence.

Comment 17 Bryan Kearney 2018-10-16 19:28:50 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2927