Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2067293

Summary:	[RFE] Add option to make sync plans single-threaded
Product:	Red Hat Satellite	Reporter:	Jessica Richards <jrichards2>
Component:	Pulp	Assignee:	satellite6-bugs <satellite6-bugs>
Status:	CLOSED WONTFIX	QA Contact:	Satellite QE Team <sat-qe-bz-list>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	6.10.0	CC:	casl, dalley, gpayelka, hyu, jerry_d_williams, sajha, swadeley, thadzhie
Target Milestone:	Unspecified	Keywords:	FutureFeature, Triaged
Target Release:	Unused
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-06-21 16:06:44 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jessica Richards 2022-03-23 18:10:43 UTC

1. Proposed title of this feature request

[RFE] Add option to make sync plans single-threaded

3. What is the nature and description of the request?

When creating or editing a sync plan, the customer would like the option to force the sync plan to sync one repository at a time.

4. Why does the customer need this? (List the business requirements here)

Despite tuning their system, the customer has been encountering OOM killer errors on their Satellite server since upgrading to version 6.10, and they do not want to add additional memory or CPUs.

5. How would the customer like to achieve this? (List the functional requirements here)

method 1:

- Content > Sync Plans

- "Create Sync Plan"

- include a check-mark box for "Sync repositories one at a time"

method 2:

- Content > Sync Plans

- click on an existing sync plan

- include an editable field for "Sync repositories one at a time"

6. For each functional requirement listed, specify how Red Hat and the customer can test to confirm the requirement is successfully implemented.

- install a patch
- create a sync plan with multiple repositories, and enable the new option
- at the time of the sync, go to Content > Sync Status and observe whether one repository at a time syncs
- edit an existing sync plan that has multiple repositories, and enable the new option
- at the time of the sync, go to Content > Sync Status and observe whether one repository at a time syncs

7. Is there already an existing RFE upstream or in Red Hat Bugzilla?

8. Does the customer have any specific time-line dependencies and which release would they like to target (i.e. RHEL5, RHEL6)?

9. Is the sales team involved in this request and do they have any additional input?

10. List any affected packages or components.

foreman, pulp

11. Would the customer be able to assist in testing this functionality if implemented?

Comment 2 Daniel Alley 2022-07-01 17:31:35 UTC

> Despite tuning their system, the customer has been encountering OOM killer errors on their Satellite server since upgrading to version 6.10, and they do not want to add additional memory or CPUs.

6.10.6 and 6.11.0 make additional improvements to memory consumption which were not available at the time this RFE was filed.  I would recommend re-evaluating based on these recent improvements.

Comment 3 Daniel Alley 2022-07-01 17:32:45 UTC

(Not identical improvements, 6.11.0 goes much further.)

Comment 4 Stephen Wadeley 2022-07-15 08:48:01 UTC

Hi

I do wish to detract from the validity of this request, but it is worth mentioning that OOM can also be caused by lack of swap space.

From the 6.10 Installation Guide[1]

"A minimum of 20 GB RAM is required for Satellite Server to function. In addition, a minimum of 4 GB RAM of swap space is also recommended. Satellite running with less RAM than the minimum value might not operate correctly."


[1] https://access.redhat.com/documentation/en-us/red_hat_satellite/6.10/html/installing_satellite_server_from_a_connected_network/preparing-environment-for-satellite-installation#system-requirements_satellite

Comment 5 Stephen Wadeley 2022-07-15 08:50:19 UTC

s/I do wish to/I do NOT wish to/

Comment 6 Charles Slivkoff 2022-09-01 21:45:57 UTC

As requirements have changed in our environment, additional repositories have needed to be enabled. 

In this case, the RHEL minor release versions 8.5 and 8.6 and RHEL 9 repositories have been added.

A sync plan with only the 1 PRODUCT `Red Hat Enterprise Linux for x86_64` now includes 44 repositories. 

I don't see any way to further split this up using the provided UI tools.  I suppose it could be done via API and cron, but I don't believe this should be left to the user to "figure out".

Are the changes in 6.11 expected to have an impact on the parallelization or does this BZ need further attention?

Comment 7 Daniel Alley 2022-09-01 22:35:03 UTC

The improvements are to the memory consumption of each individual sync task, which aggregates when dealing with multiple syncs. There aren't any changes to the parallelism in 6.11 but nonetheless the memory required to perform syncs is much less.

I would consider the RFE as a wontfix, not because the problem (memory consumption) isn't valid but because it's the wrong approach to solving it. Satellite should be able to sync as many repos at any given time as the default configuration allows, so the right approach is to optimize for better efficiency or adjust the default configuration, rather than to add a feature specifically to work around inefficiencies at lower levels, which is essentially what this would be.

Since 6.11 does make progress on this, if you are not upgraded yet and are still experiencing those issues I would recommend it.  If you have already upgraded and are still experiencing those issues - another potential root cause is some other service (generally Puma or gunicorn, or both) consuming too much memory, which results in sync tasks being killed even when it's not really the sync tasks that are the problem. 

There's some work being done currently to take a look at that and we could possibly recommend some additional tuning steps.

Comment 8 Charles Slivkoff 2022-09-02 15:39:37 UTC

Thank you for your explanation.

The support for RHEL 9 is forcing us to 6.11, so that is being planned.

What problem is the parallelization of the sync attempting to solve?

While the elegant/best solution would be nice, from the customer perspective, reducing the time to maintain Satellite should be a goal for engineering. 

I've had to revisit the sync plans several times over the years with Sat 6. This makes 2x in this past 6 months, both post 6.10 upgrade.

Comment 9 Daniel Alley 2022-09-02 16:52:22 UTC

>I've had to revisit the sync plans several times over the years with Sat 6. This makes 2x in this past 6 months, both post 6.10 upgrade.

I'm not sure which z-version you're on specifically, but for context, 6.10.0 did have an issue where memory consumption for syncing particularly large repos such as RHEL 7 would consume an unreasonable amount of memory (in some cases peaking at more than 5gb for that singular sync task).  That got improved over time, and using the same repository as an example:

6.10.3 => peak at 2.3gb
6.10.6 => peak at 1.6gb
6.11.0 => peak at ~900mb

So at this point 6.10 ought to be roughly on par with 6.9 and 6.11 ought to be a significant improvement over both.

My suggestion if this happens again is to submit a process listing with the top memory-consuming processes, something like "ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%mem | head -n 20".  I suspect it's not actually the sync tasks that are the issue (even if the OOM killer picked them to kill).


>While the elegant/best solution would be nice, from the customer perspective, reducing the time to maintain Satellite should be a goal for engineering. 

I completely agree, but this is precisely why a switch like this isn't a great solution. The default configuration ought to work reliably on minimum-requirement systems without needing special options in the UI and I think that's the best outcome for both customers and developers.

Comment 10 Charles Slivkoff 2022-09-02 17:36:08 UTC

Specifically, the production Satellite is running 6.10.5.1 while non-production is being evaluated with 6.11.2.

The Sync plans run overnight, Sunday/Monday and the OOM errors are detected only after the fact (by syslog monitoring).  I have a daily `satellite-maintain health check -y` scheduled with the intent to recover from any service outage that may have been victim of OOM.  

This problem does not occur every week. I suspect that it may be triggered by the amount of change in the repos since the last sync.

I would expect it to be straightforward for someone with sufficient API experience to script a serial sync solution. This would seem to be a better use of my time rather than trying to gather process memory usage during the sync runs. 

Based on personal experience with Satellite through the years, I do not have much faith that this "problem" will be "fixed" by attempts to tune other pieces which themselves continue to change every 6-12 months.

Please consider that the best solution may be to realize there is no justifiable reason to continue down this path.

Comment 14 Charles Slivkoff 2023-06-22 15:52:45 UTC

I still have issues with this being ignored.

https://bugzilla.redhat.com/show_bug.cgi?id=2067293#c6

44 repositories should at least be able to be split up.

This is a real-world example that continues to cause grief with successful syncs for a satellite that is well-within the recommended specs (cpu/ram/swap/storage/clients). 

Here is a brute-force sync script using hammer that others might find useful:

~~~
#!/bin/bash

for REPO in $(hammer --output json repository list | jq '.[] | select(.Url!=null) | .Id'); do

  echo "Synchronizing Repository ..."
  hammer --no-headers repository info --id $REPO --fields Name,Id

  hammer repository synchronize --id $REPO;

  echo ""

done
~~~

Comment 15 Daniel Alley 2023-06-22 16:40:36 UTC

Charles, 

The issue has not been ignored and is not being ignored, but your proposed solution is not an acceptable one.

We have been continuing to work on reducing the memory footprint of Satellite.  See https://bugzilla.redhat.com/show_bug.cgi?id=2122872 for one such recent effort. It is still waiting to be QA'd before shipping, but if you would like, we can assist with applying the patch or receiving a hotfix more quickly if you would find that helpful.

Memory is a shared resource between all applications on the system. Because sync tasks have a certain execution profile, the OOM is more likely to pick them to be killed off even when they are not consuming very much memory relative to other components of Satellite.  This means that just because sync tasks are sometimes killed by the OOM killer, is not sufficient to diagnose the problem.  In order to know what is actually causing your system to run out of memory, you need to be looking at how much memory each process is using. 

Could you please provide

A) what version are you now currently running on
B) a listing of the top memory-consuming processes during one such large sync series

Comment 16 Charles Slivkoff 2023-06-22 17:11:18 UTC

If RH could provide some means to gather that data automatically, I would gladly put it in-place.

Why does this have to be so complex?

There is has been NO guidance on how to manage sync plans, and IMO, there should not need to be any. It should be straightforward.

Exactly how would it be "unacceptable" if I were to disable ALL sync plans and simply use the script I provided earlier?

Why does the customer experience have to be a *constant* victim of these design decisions?

Comment 17 Daniel Alley 2023-06-22 17:46:01 UTC

> There is has been NO guidance on how to manage sync plans, and IMO, there should not need to be any. It should be straightforward.

I completely, 100% agree with you.  As described in comment #7, there should not be a need to manage the sync plan because it should "just work" on the default configuration.  But that is precisely why I don't think it's a good UX choice to expose options to the user which serve the sole purpose of working around a deficiency elsewhere. That would be taking a workaround - not a fix - and permanently adding it to the UI.  An actual fix would be to not need such a magic checkbox in the first place.

If you have SSH access to the satellite box, you can run

printf "%s\n\n" "`top -b -o +%MEM | head -n 22`" >> memory_log.txt

once during an idle period and then a couple of times throughout the execution of a mass-sync.  Then attach memory_log.txt here.

In the future, we do actually have an ongoing project to expose metrics to Prometheus / Grafana for easier monitoring.

Additionally, could you please clarify which version you are currently running?

Comment 19 Charles Slivkoff 2023-07-20 16:29:55 UTC

I just completed upgrades to 6.13.2 and I see the recent 1-line suggestion in 

https://bugzilla.redhat.com/show_bug.cgi?id=2122872#c51

I have not made this change on my satellites.

I don't see anywhere near the pulp memory usage from earlier.

Comment 20 Daniel Alley 2023-07-20 23:40:07 UTC

Good to hear!

The patch(es) will make it into 6.13 in a release or two, hopefully it will drop further.