Bug 1284686 - [RFE] Support use of snapshots in katello-backup to allow service to be restored quickly
[RFE] Support use of snapshots in katello-backup to allow service to be resto...
Status: ON_DEV
Product: Red Hat Satellite 6
Classification: Red Hat
Component: Backup & Restore (Show other bugs)
6.1.3
Unspecified Unspecified
high Severity high (vote)
: Beta
: --
Assigned To: Christine Fouant
Peter Ondrejka
: FutureFeature, Performance, Triaged, UserExperience
: 1354337 1382002 (view as bug list)
Depends On:
Blocks: 1317008 1479962
  Show dependency treegraph
 
Reported: 2015-11-23 17:12 EST by Stuart Auchterlonie
Modified: 2017-10-16 14:19 EDT (History)
14 users (show)

See Also:
Fixed In Version: tfm-rubygem-katello-3.4.2
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Foreman Issue Tracker 18329 None None None 2017-01-31 16:11 EST
Foreman Issue Tracker 21198 None None None 2017-10-09 11:01 EDT

  None (edit)
Description Stuart Auchterlonie 2015-11-23 17:12:45 EST
Description of problem:

katello-backup is horrendously slow, and while the backup is being taken
the satellite system is offline (since the services have been shutdown).

My customer is reporting that their backups are now taking over 11hrs

In order to minimize the time taken for the we should support
taking LVM snapshots of the various volumes for postgresql, pulp,
mongodb (and any other required data).

The technique is as follows. Use SSM to manage the snapshots.
- Shutdown services
- Create snapshots and mount them
- Restart services
- Backup snapshots (without -v and -z, see bz#1283578)
- Unmount and remove snapshots


Version-Release number of selected component (if applicable):

6.1.3


Actual results:

Without using this procedure, backup speed remains unacceptable

Expected results:

Restoration of the satellite services is much faster.
There is no need to wait for the backup to complete.


Additional info:
Comment 4 Bryan Kearney 2016-07-08 16:21:02 EDT
Per 6.3 planning, moving out non acked bugs to the backlog
Comment 6 Christine Fouant 2017-01-31 16:11:02 EST
Created redmine issue http://projects.theforeman.org/issues/18329 from this bug
Comment 7 Christine Fouant 2017-01-31 17:31:58 EST
*** Bug 1382002 has been marked as a duplicate of this bug. ***
Comment 8 Christine Fouant 2017-02-03 15:05:48 EST
*** Bug 1354337 has been marked as a duplicate of this bug. ***
Comment 9 Bryan Kearney 2017-05-09 14:01:46 EDT
This did not make the 1.15/3.5 cut. I am pushing this out to sat-backlog.
Comment 10 pm-sat@redhat.com 2017-06-13 12:02:23 EDT
Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/18329 has been resolved.
Comment 13 Peter Ondrejka 2017-10-03 08:48:58 EDT
On satellite-6.3.0-19.0.beta.el7sat.noarch this fails due to https://bugzilla.redhat.com/show_bug.cgi?id=1497957

Christine, cold you please give me some background of the feature, as I didn't find much information on how it's meant to be used (and tested), namely:

-- Are there any prerequisites to the procedure, and does the script check if it has what it needs?
-- Is it supposed to stop services?
-- Can it be combined with other katello-backup subcommands?
-- How to restore from backup created by this feature?

Cheers
Comment 14 Peter Ondrejka 2017-10-04 05:23:04 EDT
After working around the issue from 1497957:
~]# katello-backup --snapshot /var/tmp
Starting backup: 2017-10-04 04:44:07 -0400
Creating backup folder /var/tmp/katello-backup-20171004044409
Generating metadata ... 
Cannot create a temporary file: /var/tmp/scl3HIanE
Done.
Backing up config files... 
Done.
WARNING: This script will stop your services. Do you want to proceed(y/n)? y
Redirecting to /bin/systemctl stop foreman-tasks.service
Redirecting to /bin/systemctl stop httpd.service
Redirecting to /bin/systemctl stop pulp_celerybeat.service
Redirecting to /bin/systemctl stop pulp_streamer.service
Redirecting to /bin/systemctl stop pulp_resource_manager.service
Redirecting to /bin/systemctl stop pulp_workers.service
Redirecting to /bin/systemctl stop tomcat.service
Redirecting to /bin/systemctl stop postgresql.service
Redirecting to /bin/systemctl stop mongod.service
Creating pulp snapshot
  Volume group "rhel_sgi-uv20-01" has insufficient free space (0 extents): 512 required.
Failed 'lvcreate -npulp-snap -L2G -s /dev/mapper/rhel_sgi--uv20--01-root' with exit code 5
Cleaning up backup folder and starting any stopped services... 
/usr/share/ruby/fileutils.rb:125: warning: conflicting chdir during another chdir block
Redirecting to /bin/systemctl start mongod.service
Redirecting to /bin/systemctl start postgresql.service
Redirecting to /bin/systemctl start tomcat.service
Redirecting to /bin/systemctl start pulp_workers.service
Redirecting to /bin/systemctl start pulp_resource_manager.service
Redirecting to /bin/systemctl start pulp_streamer.service
Redirecting to /bin/systemctl start pulp_celerybeat.service
Redirecting to /bin/systemctl start httpd.service

Not sure why I get "Cannot create a temporary file: /var/tmp/scl3HIanE" and why it is needed. Obviously there is a prerequisite of having some free extents in a vg, but that should be documented at least in the script.

I'd like to be able to set the lv name for cases like:

Creating pulp snapshot
  Logical Volume "pulp-snap" already exists in volume group "rhel_sgi-uv20-01"
Failed 'lvcreate -npulp-snap -L2G -s /dev/mapper/rhel_sgi--uv20--01-root' with exit code 5
Cleaning up backup folder and starting any stopped services...

Seems like lvs are not cleaned up after failure
Comment 15 Brad Buckingham 2017-10-13 10:53:29 EDT
Moving to POST since upstream redmine is merged.

Note You need to log in before you can comment on or make changes to this bug.