2188504 – The "hammer export" command using single thread encryption causes a performance bottleneck.

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2188504 - The "hammer export" command using single thread encryption causes a performance bottleneck.

Summary: The "hammer export" command using single thread encryption causes a performan...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Satellite
Classification:	Red Hat
Component:	Pulp
Sub Component:
Version:	Unspecified
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	6.14.0
Assignee:	satellite6-bugs
QA Contact:	Shweta Singh
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	2238653 2239183 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2023-04-21 00:26 UTC by Brenden Wood
Modified:	2023-11-08 14:19 UTC (History)
CC List:	13 users (show)
Fixed In Version:	python-pulpcore-3.22.15-1.el8pc
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	2238353 (view as bug list)
Environment:
Last Closed:	2023-11-08 14:19:16 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	pulp pulpcore issues 3869	None	closed	Exports are bottlenecked by gzip compression which cannot be disabled	2023-06-13 12:05:48 UTC
Github	pulp pulpcore pull 3884	None	Merged	Use a more appropriate compression level for exports	2023-07-17 14:28:28 UTC
Github	pulp pulpcore pull 4352	None	Merged	Ensure monkeypatch is properly loaded	2023-09-13 00:37:52 UTC
Github	pulp pulpcore pull 4412	None	open	Ensure that non-chunked exports also use compressionlevel=1	2023-09-13 00:37:52 UTC
Red Hat Issue Tracker	SAT-18409	None	None	None	2023-06-14 14:42:48 UTC
Red Hat Product Errata	RHSA-2023:6818	None	None	None	2023-11-08 14:19:41 UTC

Description Brenden Wood 2023-04-21 00:26:01 UTC

Description of problem:

We identified a severe bottleneck in the way hammer exports work and found the code in upstream Pulp.

Line 406 of https://github.com/pulp/pulpcore/blob/main/pulpcore/app/tasks/export.py

"with tarfile.open(tarfile_fp, "w|gz", fileobj=split_process.stdin)"

The tarfile Python module creates tar files using the gizp Python module.

Data compression for the gzip Python module is provided by the zlib Python module.

The zlib Python module calls the zlib library.

If defaults are used the whole way through this series of events, the result is a single threaded pulp process doing compression of a tarball containing a massive content library. This bottleneck and can make large hammer exports take several days.

Modifying the lines that tell the tarfile.open function to NOT use compression ( change "w|gz" to "w" ) dramatically speeds up the hammer export. In our testing it reduced the time from days to just hours. The drawback is the file size was significantly larger, but the trade-off is worthwhile given we have tight timeframes and plentiful disk capacity.

Can this bottleneck be addressed with multi-threaded gzip compression?

and/or

Can a hammer command line option for no compression be implemented?

Version-Release number of selected component (if applicable): 6.10 onwards ( Pulp 3 )

How reproducible:
Run a hammer export and monitor Pulp processes. One process with run at 100% CPU. Modify the abovementioned Python script to NOT use gzip encryption, and an uncompressed tarball will be created instead much quicker and with multiple Pulp processes.

Steps to Reproduce:
1. Run a "hammer export". Monitor the Pulp process CPU usage and time taken to complete export.
2. Change the abovementioned Python code in Pulp.
3. Run a "hammer export" again and note performance improvement.

Actual results:
A single threaded pulp process doing compression of a tarball and being bottlenecked.

Expected results:
Multi-threaded gzip compression that can take full advantage of the CPU and IO of the Satellite server and not be severly bottlenecked.

Additional info:

IO wait is very low when using single threaded compression, indicating that CPU single thread is the issue. When not using encryption (removing the bottleneck) iowait increases.

This issue is causing significant delays for a couple of customers. Corresponding support tickets will be submitted soon.

Comment 1 Daniel Alley 2023-05-25 14:25:50 UTC

The default compression level of tarfile with gzip compression is level 9, the highest and most computationally intensive level.  Possibly compression would be more viable at a lower level - based on various benchmarks level 3 ought to be about 4x faster than level 9 but with a compression level only 15-20% worse.

Comment 2 Daniel Alley 2023-05-25 14:54:28 UTC

Brendan, don't forget to attach the customer cases.

If you have a reproducer (or a customer willing to experiment), what is the impact of adding "compresslevel=1" to that line?  e.g.

"with tarfile.open(tarfile_fp, "w|gz", compresslevel=1, fileobj=split_process.stdin)"

Comment 3 Daniel Alley 2023-05-25 14:59:25 UTC

(don't forget to restart the services, of course).

Also, how large is the uncompressed export in question, in gigabytes?

Comment 4 Brenden Wood 2023-05-30 04:20:27 UTC

Hi Daniel,

We have the ability to test this with a customer so I will try this out and report back. From memory, we were dealing with an export over a terabyte in size, but I will have to confirm that. 

Thanks

Comment 5 Daniel Alley 2023-05-30 04:33:17 UTC

Brendan, I thought I had posted this but apparently not - that patch actually will not work, because it requires code present in Python 3.12 only.  Please don't ask the customer to try it just yet.

Comment 6 Robin Chan 2023-06-13 12:05:49 UTC

The Pulp upstream bug status is at closed. Updating the external tracker on this bug.

Comment 8 Daniel Alley 2023-07-27 14:34:46 UTC

Anyone tracking this may also be interested in https://bugzilla.redhat.com/show_bug.cgi?id=2226950

Comment 12 Shweta Singh 2023-08-23 10:21:53 UTC

Verified.

Version Tested: Satellite 6.14.0 Snap 12.0

Verification Steps:
1. Enable some large repos like appstream and rhel7server.
2. Update the download policy to "immediate" and sync the repos.
3. Perform complete hammer export of the library lifecycle environment.
4. Observe the time to export the complete lce.

Result:
Performance of hammer export has been improved after the fix.

Comment 13 Daniel Alley 2023-08-23 13:13:08 UTC

Shweta, out of curiosity, how much improvement did you observe?

Comment 14 Daniel Alley 2023-09-05 00:51:40 UTC

I've unfortunately needed to revise the patch as it was not being reliably loaded 100% of the time.  I don't believe there is any need to delay any scheduled releases, just push the BZ off to the next one.

Comment 19 Daniel Alley 2023-09-18 13:47:25 UTC

*** Bug 2238653 has been marked as a duplicate of this bug. ***

Comment 20 Daniel Alley 2023-09-18 18:59:14 UTC

*** Bug 2239183 has been marked as a duplicate of this bug. ***

Comment 22 Shweta Singh 2023-09-26 14:05:40 UTC

Hi Daniel! While exporting few large repos, I encountered "undefined method `first' for nil:NilClass" error.

I have tried following steps:
1. Enable some large repos like appstream(Rhel8 and Rhel9) and rhel7server.
2. Update the download policy to "immediate" and sync the repos.
3. Perform complete hammer export of the library lifecycle environment.
4. Observe the time to export the complete lce.

Version Tested: Satellite 6.14.0 Snap 17.0

Observation:
During "hammer export" command, it failed with "undefined method `first' for nil:NilClass" error.

Comment 23 Daniel Alley 2023-09-26 14:26:54 UTC

That's a Ruby error, I'm not sure what would have caused it but it's unrelated to this BZ

Is there any other information?  Did the pulp task succeed or fail?  If it failed that might have triggered some other Katello error which we're seeing, but if it succeeded or never ran in the first place then it is likely completely unrelated.

Comment 24 Shweta Singh 2023-09-29 06:49:26 UTC

Hi Daniel I was able to verify the fix and it was working. There is a significant improvement in the performance of the command("hammer export"). I have tested only on Python 3.11.

Verified.

Version Tested: Satellite 6.14.0 Snap 17

Verification Steps: 
1. Enable few large repos(like appstream on rhel8 and rhel9 and rhel-7-server).
2. Update the download policy to immediate and sync all the repos.
3. Try complete export of the repos and notice the duration for the complete export.

Observation:
1. There is a significant improvement in the performance of the export.
2. It took ~23 min to complete export comapared to earlier taking 2 hours for the same export.
3. Repos which were exported: Appstream Rhel8, Appstream Rhel9 and Rhel-7-server.
4. All the code changes are present on the fixed version.

Comment 27 Ian Ballou 2023-10-30 20:51:31 UTC

A hotfix is now available for Satellite 6.11.5 on RHEL 7 and RHEL 8. Please contact Red Hat support for installation instructions.

Comment 30 errata-xmlrpc 2023-11-08 14:19:16 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Satellite 6.14 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6818

Note You need to log in before you can comment on or make changes to this bug.