Bug 1512605

Summary: Satellite backup procedure needs size estimations
Product: Red Hat Satellite Reporter: Lukas Zapletal <lzap>
Component: DocumentationAssignee: Sergei Petrosian <spetrosi>
Status: CLOSED CURRENTRELEASE QA Contact: Stephen Wadeley <swadeley>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.3.0CC: brubisch, ktordeur, mbacovsk, peter.vreman, spetrosi, sthirugn
Target Milestone: UnspecifiedKeywords: PrioBumpField, PrioBumpQA
Target Release: Unused   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-05-29 13:10:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1122832, 1533259    

Description Lukas Zapletal 2017-11-13 15:58:18 UTC
Document URL: 
https://access.redhat.com/documentation/en-us/red_hat_satellite/6.2/html/server_administration_guide/chap-red_hat_satellite-server_administration_guide-backup_and_disaster_recovery

"""
Ensure your backup location has enough disk space to contain a copy of the following directories
"""

We need to explain this in more detail. Provide information how to calculate source sizes, total size of input data, estimated compression ratio and thus estimated target size.

Comment 2 sthirugn@redhat.com 2017-11-13 16:47:07 UTC
I think this needs to be built in the katello-backup tool itself and the backup should not start if the storage is not sufficient?

Comment 3 Lukas Zapletal 2017-11-14 07:58:41 UTC
I tend to prefer documentation, backup is a procedure where you want to be sure you are doing it right. Let's have this documented first and then talk about integrating it.

Comment 6 Lukas Zapletal 2017-11-29 08:45:00 UTC
Please elaborate this into satellite backup and clone documentation:

Backup size estimations

Before performing the backup, calculate requied free space for the backup target directory. To do that, count used space of the following folders: /var/lib/pgsql/data, /var/lib/mongodb and /var/lib/pulp.

WARNING: Running "du" utility on pulp content can take time, it is recommended to have this directory on separate volume. In that case, "df" can be used to quickly determine used space.

The backup tool also copies some configuration files, it is good idea to calculate total size of the following directories:

du -h /etc /root/ssl-build /var/lib/candlepin /opt/puppetlabs /var/www/html/pub

For simplicity, the command above calculates total size of /etc directory while the backup tool performs backup of about dozen of subdirectories from this path. Usually /etc is small enough, but if needed see the katello-backup script source to get the list of directories.

WARNING: Public www directory only contains configuration files and certificates by default, but some users tend to publish various non-related content like custom RPMs or ISO files, these files will end up in the backup.

Expected compression ratio

In the following table, you can find expected compression ratio for all data items which are in the backups.

| Type | Directory | Ratio | Example results |
| PostgreSQL database files | /var/lib/pgsql/data | 15-20 % | 105 GB -> 20 GB |
| MongoDB database files | /var/lib/mongodb | 10-15 % | 483 GB -> 53 GB |
| Pulp RPM files | /var/lib/pulp | - | (not compressed) |
| Configuration files | /etc /root-ssl/build ... | 5-10 % | 50 MB -> 4 MB |

Add 20 % of extra space to the total and that is the total backup estimated size.

NOTE: The backup tool uses gzip with default compression level of 5.

Pulp content (RPM files and repositories) backup can be skipped via --skip-pulp-content option and this content type is never compressed because RPM files have compression ratio higher than 95 %. Backup tool uses simple method using GNU tar utility, when using alternative tools (e.g. rsync) make sure SELinux labels are also carried over.

NOTE: Backup of pulp content can take a lot of time, it is recommended to use snapshot of underlying storage (or LVM) so maintenance window can be shortened and data can be safely copied while system is in operational state.

Comment 8 Lukas Zapletal 2017-12-13 08:14:28 UTC
Thanks, I will add to that:

These numbers were calculated from offline backup.

For online backup, extra space of total size of PostgreSQL databases and MongoDB database must be allocated because online backup copies data first out of database and then compresses it.

The estimation does not include incremental backups. This highly depends on how often new Red Hat content (RPMs) is added or how many sync/promote/publish operations are performed.

Comment 9 Peter Vreman 2017-12-13 13:00:31 UTC
Also add a remark that for a weekly full+incr you need 2x the full backup, because old full backup is only removed after the next full nbackup is successful.

At least garantuees the ability to be able to restore.

Maybe katello-backup can get an option to improve reduce this that it will delete the old full backup first before creating a new one.

Ofcourse this all relies also on an extenral backup tool that takes things off-site. Having it only on the sat6 server still you are at risk with disk corruption.