Bug 1283578 - katello-backup is slow on big pulp repositories
Summary: katello-backup is slow on big pulp repositories
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Performance
Version: 6.1.3
Hardware: All
OS: All
high
high
Target Milestone: Unspecified
Assignee: Katello Bug Bin
QA Contact: Tazim Kolhar
URL: https://github.com/Katello/katello-pa...
Whiteboard:
Depends On:
Blocks: 1296845
TreeView+ depends on / blocked
 
Reported: 2015-11-19 10:31 UTC by Evgeni Golov
Modified: 2021-06-10 11:03 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Katello backups were slow due to compressing rpm data which is already compressed. The backup script was fixed to not do redundant compression.
Clone Of:
Environment:
Last Closed: 2016-01-21 07:42:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 2060393 0 None None None Never
Red Hat Product Errata RHBA-2016:0052 0 normal SHIPPED_LIVE Satellite 6.1.6 bug fix update 2016-01-21 12:40:53 UTC

Description Evgeni Golov 2015-11-19 10:31:56 UTC
Description of problem:
Running Sat 6.1.3 as an VMware guest (12 vCPU, 24G RAM, RHEL 7.1, XFS, storage does about 300MB/s).
/var/lib/pulp is 64GB big. Running "katello-backup /mnt/backup" takes about 66 mintes (same storage as /var/lib/pulp) out of which around 63 minutes are used to backup pulp.

Version-Release number of selected component (if applicable):
Satellite 6.1.3

How reproducible:
always

Steps to Reproduce:
1. Install Sat6.1.3
2. Sync content
3. run katello-backup

Actual results:
Plain katello-backup on a fresh Sat6.1.3 with ~64GB content synced, one content-view, one attached system:
real    66m32.761s
user    51m49.001s
sys     7m48.724s

Expected results:
faster :-)
Copying that amount of data in 10-20 minutes seems possible.

Additional info:
I removed "-v" and "-z" from the tar call for /var/lib/pulp in katello-backup and got the following time:
real    19m59.646s
user    2m11.093s
sys     7m18.084s

Compressing RPMs sounds useless, as they are already compressed (the initial backup has compressed the 64G to 61G).
Also running tar in verbose mode slows things down, as tar has to output every single file to a terminal then.

Comment 2 Christian Horn 2015-11-20 11:20:10 UTC
We would like to get this out to the customers fast. Due to this issue
- we are wasting cpu ressources
- the backup of /var/lib/pulp/ /var/www/pub/ is limited by the speed of one cpu, which gets occupied by gzip
- the backup is very slow

Comment 3 Evgeni Golov 2015-11-30 15:07:12 UTC
This was merged upstream. Anybody wants a backported patch for the version in Sat6.1.4?

Comment 6 Tazim Kolhar 2016-01-05 12:38:59 UTC
VERIFIED:
# rpm -qa | grep foreman
dell-pe1950-05.rhts.eng.brq.redhat.com-foreman-client-1.0-1.noarch
dell-pe1950-05.rhts.eng.brq.redhat.com-foreman-proxy-1.0-1.noarch
ruby193-rubygem-foreman_hooks-0.3.7-2.el7sat.noarch
puppet-foreman_scap_client-0.3.3-10.el7sat.noarch
foreman-vmware-1.7.2.50-1.el7sat.noarch
rubygem-hammer_cli_foreman_tasks-0.0.3.5-1.el7sat.noarch
foreman-ovirt-1.7.2.50-1.el7sat.noarch
ruby193-rubygem-foreman_gutterball-0.0.1.9-1.el7sat.noarch
foreman-1.7.2.50-1.el7sat.noarch
ruby193-rubygem-foreman_docker-1.2.0.24-1.el7sat.noarch
ruby193-rubygem-foreman-tasks-0.6.15.7-1.el7sat.noarch
rubygem-hammer_cli_foreman_bootdisk-0.1.2.7-1.el7sat.noarch
rubygem-hammer_cli_foreman_docker-0.0.3.10-1.el7sat.noarch
foreman-debug-1.7.2.50-1.el7sat.noarch
foreman-proxy-1.7.2.8-1.el7sat.noarch
dell-pe1950-05.rhts.eng.brq.redhat.com-foreman-proxy-client-1.0-1.noarch
foreman-discovery-image-3.0.5-3.el7sat.noarch
foreman-libvirt-1.7.2.50-1.el7sat.noarch
ruby193-rubygem-foreman_openscap-0.3.2.10-1.el7sat.noarch
foreman-gce-1.7.2.50-1.el7sat.noarch
rubygem-hammer_cli_foreman-0.1.4.15-1.el7sat.noarch
ruby193-rubygem-foreman_discovery-2.0.0.23-1.el7sat.noarch
foreman-selinux-1.7.2.17-1.el7sat.noarch
foreman-postgresql-1.7.2.50-1.el7sat.noarch
foreman-compute-1.7.2.50-1.el7sat.noarch
ruby193-rubygem-foreman-redhat_access-0.2.4-1.el7sat.noarch
rubygem-hammer_cli_foreman_discovery-0.0.1.10-1.el7sat.noarch
ruby193-rubygem-foreman_bootdisk-4.0.2.14-1.el7sat.noarch

steps:
After Sync content

time katello-backup /tmp/katello_backup
Success!


** BACKUP Complete, contents can be found in: /tmp/katello_backup



real	11m57.641s
user	6m41.736s
sys	0m23.281s

Comment 7 Abel Lopez 2016-01-08 21:04:03 UTC
I was about to open a new BZ, but found this.
I'd like to ask why katello-backup compresses pulp data:

echo "Backing up Pulp data... "
tar --selinux -czvf pulp_data.tar.gz /var/lib/pulp/ /var/www/pub/
echo "Done."

Aren't rpms already compressed? Generally binary data, not going to benefit from gzip compression.

This is in 6.1.5

Comment 8 Abel Lopez 2016-01-08 23:08:35 UTC
The backup of /var/lib/pulp should probably use rsync instead of tar. Not only does rsync utilize multiple processors better than tar, but subsequent runs will benefit greatly if an artifact has already been archived, no need to copy it again.

Comment 9 Christian Horn 2016-01-11 10:18:14 UTC
Abel, when using "rsync" we would get a complete filesystem structure replicated. This is a whole different concept, the destination filesystem would then also have to support ownerships, permissions and so on.

With the current strategy of creating a file
- we can also i.e. backup to a vfat
- and we get a single, good to handle file

> Aren't rpms already compressed? Generally binary data, not going
> to benefit from gzip compression.
Correct, we are approaching this issue here in the bz.

Comment 10 David O'Brien 2016-01-19 04:55:51 UTC
If this bug requires doc text for errata release, please provide draft
text in the doc text field in the following format:
 Cause:
 Consequence:
 Fix:
 Result:
The documentation team will review, edit, and approve the text.
If this bug does not require doc text, please set the
'requires_doc_text' flag to -.

Comment 11 Evgeni Golov 2016-01-19 07:50:52 UTC
I think we do not need any doc text here.

Comment 12 Christian Horn 2016-01-19 08:14:53 UTC
(In reply to Evgeni Golov from comment #11)
> I think we do not need any doc text here.
When I read releasenotes/changelogs then I am grateful to see info like "In the past, also backups of the packages were compressed. With this fix, no compression is attempted any more.".  There are probably guidelines to decide if doctext should be done or not.

Comment 13 David O'Brien 2016-01-20 04:34:37 UTC
The doc text currently reads:

"Katello backups were slow due to compressing rpm data which is already
compressed. The backup script was fixed to not do redundant backups."

Does that mean the RPM data is not backed up, or it is not compressed before it is backed up (which I thought was the actual issue)? I'd like to get this clarified before I approve the doc text. Apologies if this is apparent in the bug but I missed it.

thanks

Comment 14 Christian Horn 2016-01-20 06:59:55 UTC
> "Katello backups were slow due to compressing rpm data which is already
> compressed. The backup script was fixed to not do redundant backups."

That is wrong, I suggest "Katello backups were slow due to compressing rpm data which is already compressed. The backup script was fixed to not do redundant compression of the rpm data."

> Does that mean the RPM data is not backed up, or it is not compressed
> before it is backed up (which I thought was the actual issue)? I'd like
> to get this clarified before I approve the doc text. Apologies if this
> is apparent in the bug but I missed it.

The latter one.  Can be verified in the description; not the directories to backup are changed, but compression and verbosity options removed:

> I removed "-v" and "-z" from the tar call for /var/lib/pulp in
> katello-backup and got the following time: [..]

Comment 16 errata-xmlrpc 2016-01-21 07:42:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:0052


Note You need to log in before you can comment on or make changes to this bug.