Bug 1284686 - [RFE] Support use of snapshots in katello-backup to allow service to be restored quickly
[RFE] Support use of snapshots in katello-backup to allow service to be resto...
Status: CLOSED ERRATA
Product: Red Hat Satellite 6
Classification: Red Hat
Component: Backup & Restore (Show other bugs)
6.1.3
Unspecified Unspecified
high Severity high (vote)
: Beta
: --
Assigned To: Christine Fouant
Peter Ondrejka
Michaela Slaninkova
: FutureFeature, Performance, Triaged, UserExperience
: 1354337 1382002 (view as bug list)
Depends On:
Blocks: 1317008 1479962
  Show dependency treegraph
 
Reported: 2015-11-23 17:12 EST by Stuart Auchterlonie
Modified: 2018-02-21 07:32 EST (History)
15 users (show)

See Also:
Fixed In Version: tfm-rubygem-katello-3.4.2
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-02-21 07:32:18 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Foreman Issue Tracker 18329 None None None 2017-01-31 16:11 EST
Foreman Issue Tracker 21198 None None None 2017-10-09 11:01 EDT
Foreman Issue Tracker 22418 None None None 2018-01-25 11:53 EST
Foreman Issue Tracker 22447 None None None 2018-01-29 11:55 EST
Github theforeman/foreman-packaging/pull/2137 None None None 2018-01-29 13:20 EST

  None (edit)
Description Stuart Auchterlonie 2015-11-23 17:12:45 EST
Description of problem:

katello-backup is horrendously slow, and while the backup is being taken
the satellite system is offline (since the services have been shutdown).

My customer is reporting that their backups are now taking over 11hrs

In order to minimize the time taken for the we should support
taking LVM snapshots of the various volumes for postgresql, pulp,
mongodb (and any other required data).

The technique is as follows. Use SSM to manage the snapshots.
- Shutdown services
- Create snapshots and mount them
- Restart services
- Backup snapshots (without -v and -z, see bz#1283578)
- Unmount and remove snapshots


Version-Release number of selected component (if applicable):

6.1.3


Actual results:

Without using this procedure, backup speed remains unacceptable

Expected results:

Restoration of the satellite services is much faster.
There is no need to wait for the backup to complete.


Additional info:
Comment 4 Bryan Kearney 2016-07-08 16:21:02 EDT
Per 6.3 planning, moving out non acked bugs to the backlog
Comment 6 Christine Fouant 2017-01-31 16:11:02 EST
Created redmine issue http://projects.theforeman.org/issues/18329 from this bug
Comment 7 Christine Fouant 2017-01-31 17:31:58 EST
*** Bug 1382002 has been marked as a duplicate of this bug. ***
Comment 8 Christine Fouant 2017-02-03 15:05:48 EST
*** Bug 1354337 has been marked as a duplicate of this bug. ***
Comment 9 Bryan Kearney 2017-05-09 14:01:46 EDT
This did not make the 1.15/3.5 cut. I am pushing this out to sat-backlog.
Comment 10 pm-sat@redhat.com 2017-06-13 12:02:23 EDT
Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/18329 has been resolved.
Comment 13 Peter Ondrejka 2017-10-03 08:48:58 EDT
On satellite-6.3.0-19.0.beta.el7sat.noarch this fails due to https://bugzilla.redhat.com/show_bug.cgi?id=1497957

Christine, cold you please give me some background of the feature, as I didn't find much information on how it's meant to be used (and tested), namely:

-- Are there any prerequisites to the procedure, and does the script check if it has what it needs?
-- Is it supposed to stop services?
-- Can it be combined with other katello-backup subcommands?
-- How to restore from backup created by this feature?

Cheers
Comment 14 Peter Ondrejka 2017-10-04 05:23:04 EDT
After working around the issue from 1497957:
~]# katello-backup --snapshot /var/tmp
Starting backup: 2017-10-04 04:44:07 -0400
Creating backup folder /var/tmp/katello-backup-20171004044409
Generating metadata ... 
Cannot create a temporary file: /var/tmp/scl3HIanE
Done.
Backing up config files... 
Done.
WARNING: This script will stop your services. Do you want to proceed(y/n)? y
Redirecting to /bin/systemctl stop foreman-tasks.service
Redirecting to /bin/systemctl stop httpd.service
Redirecting to /bin/systemctl stop pulp_celerybeat.service
Redirecting to /bin/systemctl stop pulp_streamer.service
Redirecting to /bin/systemctl stop pulp_resource_manager.service
Redirecting to /bin/systemctl stop pulp_workers.service
Redirecting to /bin/systemctl stop tomcat.service
Redirecting to /bin/systemctl stop postgresql.service
Redirecting to /bin/systemctl stop mongod.service
Creating pulp snapshot
  Volume group "rhel_sgi-uv20-01" has insufficient free space (0 extents): 512 required.
Failed 'lvcreate -npulp-snap -L2G -s /dev/mapper/rhel_sgi--uv20--01-root' with exit code 5
Cleaning up backup folder and starting any stopped services... 
/usr/share/ruby/fileutils.rb:125: warning: conflicting chdir during another chdir block
Redirecting to /bin/systemctl start mongod.service
Redirecting to /bin/systemctl start postgresql.service
Redirecting to /bin/systemctl start tomcat.service
Redirecting to /bin/systemctl start pulp_workers.service
Redirecting to /bin/systemctl start pulp_resource_manager.service
Redirecting to /bin/systemctl start pulp_streamer.service
Redirecting to /bin/systemctl start pulp_celerybeat.service
Redirecting to /bin/systemctl start httpd.service

Not sure why I get "Cannot create a temporary file: /var/tmp/scl3HIanE" and why it is needed. Obviously there is a prerequisite of having some free extents in a vg, but that should be documented at least in the script.

I'd like to be able to set the lv name for cases like:

Creating pulp snapshot
  Logical Volume "pulp-snap" already exists in volume group "rhel_sgi-uv20-01"
Failed 'lvcreate -npulp-snap -L2G -s /dev/mapper/rhel_sgi--uv20--01-root' with exit code 5
Cleaning up backup folder and starting any stopped services...

Seems like lvs are not cleaned up after failure
Comment 15 Brad Buckingham 2017-10-13 10:53:29 EDT
Moving to POST since upstream redmine is merged.
Comment 16 Peter Ondrejka 2017-10-24 09:45:09 EDT
Hi Christine, could you please take a look at my questions from comment #13? Also I wonder what is the expected behavior of this feature on Capsule and what are the prerequisites to successful usage. 

I created a documentation bug for this feature in https://bugzilla.redhat.com/show_bug.cgi?id=1505890
Comment 17 Christine Fouant 2017-10-27 10:24:27 EDT
(In reply to Peter Ondrejka from comment #13)
> On satellite-6.3.0-19.0.beta.el7sat.noarch this fails due to
> https://bugzilla.redhat.com/show_bug.cgi?id=1497957
> 
> Christine, cold you please give me some background of the feature, as I
> didn't find much information on how it's meant to be used (and tested),
> namely:
> 
> -- Are there any prerequisites to the procedure, and does the script check
> if it has what it needs?
Prerequisites are that the filesystem must be an LVM filesystem, and there must be enough space in the volume group in which to create the snapshot. It will fail the backup if these are not in place, giving a message with the exit codes.

> -- Is it supposed to stop services?
It must stop services, but it will only be momentarily. There is no way to do snapshots with online backup. The services will be down for the amount of time it takes to create the snapshot.

> -- Can it be combined with other katello-backup subcommands? 
Should be fine to do, except with online-backup.

> -- How to restore from backup created by this feature?
Same way you restore any other backup, #katello-restore /path/to/backup/folder

> 
> Cheers

@Evgeni - could you tell us if there are any more prerequisites to snapshots that I'm missing here?
Comment 18 Evgeni Golov 2017-11-02 06:16:54 EDT
If you would wake me at 2 am in the morning and ask me for requirements for working snapshots, you'd get the following:

* the system uses LVM for (at least) /var/lib/pulp, /var/lib/mongodb, /var/lib/pgsql
* the above mentioned points are preferably (but not necessarily) on different LVs
* there is sufficient free space (3×snapshot_size, 2G by default, = 6G) in the relevant VGs
  * if all three points are on VG1, it has to have 3×snapshot_size (2G by default, = 6G) free
  * if spread differently, the free space have to match too ;)
* the backup target is preferably not on a snapshoted LV, as this would mean you have to fit the whole backup into the snapshot, raising the snapshot space requirement by orders of magnitude
Comment 24 Christine Fouant 2018-01-29 11:55:51 EST
Need to add bypass of logical volume validation
Comment 25 Peter Ondrejka 2018-02-05 10:08:35 EST
Hello, checked again on snap 35,

-- snapshot backup performs as expected, the above error was due to lack of space on backup destination LV
-- snapshots seem to play well with other options (--incremental, --skip-pulp-content)
-- services are started after creating a snapshot as expected
-- restored from snapshot backups successfully 
-- checked both on server and capsule
Comment 28 errata-xmlrpc 2018-02-21 07:32:18 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:0336

Note You need to log in before you can comment on or make changes to this bug.