1. Proposed title of this feature request [RFE] Add option to make sync plans single-threaded 3. What is the nature and description of the request? When creating or editing a sync plan, the customer would like the option to force the sync plan to sync one repository at a time. 4. Why does the customer need this? (List the business requirements here) Despite tuning their system, the customer has been encountering OOM killer errors on their Satellite server since upgrading to version 6.10, and they do not want to add additional memory or CPUs. 5. How would the customer like to achieve this? (List the functional requirements here) method 1: - Content > Sync Plans - "Create Sync Plan" - include a check-mark box for "Sync repositories one at a time" method 2: - Content > Sync Plans - click on an existing sync plan - include an editable field for "Sync repositories one at a time" 6. For each functional requirement listed, specify how Red Hat and the customer can test to confirm the requirement is successfully implemented. - install a patch - create a sync plan with multiple repositories, and enable the new option - at the time of the sync, go to Content > Sync Status and observe whether one repository at a time syncs - edit an existing sync plan that has multiple repositories, and enable the new option - at the time of the sync, go to Content > Sync Status and observe whether one repository at a time syncs 7. Is there already an existing RFE upstream or in Red Hat Bugzilla? no 8. Does the customer have any specific time-line dependencies and which release would they like to target (i.e. RHEL5, RHEL6)? no 9. Is the sales team involved in this request and do they have any additional input? no 10. List any affected packages or components. foreman, pulp 11. Would the customer be able to assist in testing this functionality if implemented? no
> Despite tuning their system, the customer has been encountering OOM killer errors on their Satellite server since upgrading to version 6.10, and they do not want to add additional memory or CPUs. 6.10.6 and 6.11.0 make additional improvements to memory consumption which were not available at the time this RFE was filed. I would recommend re-evaluating based on these recent improvements.
(Not identical improvements, 6.11.0 goes much further.)
Hi I do wish to detract from the validity of this request, but it is worth mentioning that OOM can also be caused by lack of swap space. From the 6.10 Installation Guide[1] "A minimum of 20 GB RAM is required for Satellite Server to function. In addition, a minimum of 4 GB RAM of swap space is also recommended. Satellite running with less RAM than the minimum value might not operate correctly." [1] https://access.redhat.com/documentation/en-us/red_hat_satellite/6.10/html/installing_satellite_server_from_a_connected_network/preparing-environment-for-satellite-installation#system-requirements_satellite
s/I do wish to/I do NOT wish to/
As requirements have changed in our environment, additional repositories have needed to be enabled. In this case, the RHEL minor release versions 8.5 and 8.6 and RHEL 9 repositories have been added. A sync plan with only the 1 PRODUCT `Red Hat Enterprise Linux for x86_64` now includes 44 repositories. I don't see any way to further split this up using the provided UI tools. I suppose it could be done via API and cron, but I don't believe this should be left to the user to "figure out". Are the changes in 6.11 expected to have an impact on the parallelization or does this BZ need further attention?
The improvements are to the memory consumption of each individual sync task, which aggregates when dealing with multiple syncs. There aren't any changes to the parallelism in 6.11 but nonetheless the memory required to perform syncs is much less. I would consider the RFE as a wontfix, not because the problem (memory consumption) isn't valid but because it's the wrong approach to solving it. Satellite should be able to sync as many repos at any given time as the default configuration allows, so the right approach is to optimize for better efficiency or adjust the default configuration, rather than to add a feature specifically to work around inefficiencies at lower levels, which is essentially what this would be. Since 6.11 does make progress on this, if you are not upgraded yet and are still experiencing those issues I would recommend it. If you have already upgraded and are still experiencing those issues - another potential root cause is some other service (generally Puma or gunicorn, or both) consuming too much memory, which results in sync tasks being killed even when it's not really the sync tasks that are the problem. There's some work being done currently to take a look at that and we could possibly recommend some additional tuning steps.
Thank you for your explanation. The support for RHEL 9 is forcing us to 6.11, so that is being planned. What problem is the parallelization of the sync attempting to solve? While the elegant/best solution would be nice, from the customer perspective, reducing the time to maintain Satellite should be a goal for engineering. I've had to revisit the sync plans several times over the years with Sat 6. This makes 2x in this past 6 months, both post 6.10 upgrade.
>I've had to revisit the sync plans several times over the years with Sat 6. This makes 2x in this past 6 months, both post 6.10 upgrade. I'm not sure which z-version you're on specifically, but for context, 6.10.0 did have an issue where memory consumption for syncing particularly large repos such as RHEL 7 would consume an unreasonable amount of memory (in some cases peaking at more than 5gb for that singular sync task). That got improved over time, and using the same repository as an example: 6.10.3 => peak at 2.3gb 6.10.6 => peak at 1.6gb 6.11.0 => peak at ~900mb So at this point 6.10 ought to be roughly on par with 6.9 and 6.11 ought to be a significant improvement over both. My suggestion if this happens again is to submit a process listing with the top memory-consuming processes, something like "ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%mem | head -n 20". I suspect it's not actually the sync tasks that are the issue (even if the OOM killer picked them to kill). >While the elegant/best solution would be nice, from the customer perspective, reducing the time to maintain Satellite should be a goal for engineering. I completely agree, but this is precisely why a switch like this isn't a great solution. The default configuration ought to work reliably on minimum-requirement systems without needing special options in the UI and I think that's the best outcome for both customers and developers.
Specifically, the production Satellite is running 6.10.5.1 while non-production is being evaluated with 6.11.2. The Sync plans run overnight, Sunday/Monday and the OOM errors are detected only after the fact (by syslog monitoring). I have a daily `satellite-maintain health check -y` scheduled with the intent to recover from any service outage that may have been victim of OOM. This problem does not occur every week. I suspect that it may be triggered by the amount of change in the repos since the last sync. I would expect it to be straightforward for someone with sufficient API experience to script a serial sync solution. This would seem to be a better use of my time rather than trying to gather process memory usage during the sync runs. Based on personal experience with Satellite through the years, I do not have much faith that this "problem" will be "fixed" by attempts to tune other pieces which themselves continue to change every 6-12 months. Please consider that the best solution may be to realize there is no justifiable reason to continue down this path.
I still have issues with this being ignored. https://bugzilla.redhat.com/show_bug.cgi?id=2067293#c6 44 repositories should at least be able to be split up. This is a real-world example that continues to cause grief with successful syncs for a satellite that is well-within the recommended specs (cpu/ram/swap/storage/clients). Here is a brute-force sync script using hammer that others might find useful: ~~~ #!/bin/bash for REPO in $(hammer --output json repository list | jq '.[] | select(.Url!=null) | .Id'); do echo "Synchronizing Repository ..." hammer --no-headers repository info --id $REPO --fields Name,Id hammer repository synchronize --id $REPO; echo "" done ~~~
Charles, The issue has not been ignored and is not being ignored, but your proposed solution is not an acceptable one. We have been continuing to work on reducing the memory footprint of Satellite. See https://bugzilla.redhat.com/show_bug.cgi?id=2122872 for one such recent effort. It is still waiting to be QA'd before shipping, but if you would like, we can assist with applying the patch or receiving a hotfix more quickly if you would find that helpful. Memory is a shared resource between all applications on the system. Because sync tasks have a certain execution profile, the OOM is more likely to pick them to be killed off even when they are not consuming very much memory relative to other components of Satellite. This means that just because sync tasks are sometimes killed by the OOM killer, is not sufficient to diagnose the problem. In order to know what is actually causing your system to run out of memory, you need to be looking at how much memory each process is using. Could you please provide A) what version are you now currently running on B) a listing of the top memory-consuming processes during one such large sync series
If RH could provide some means to gather that data automatically, I would gladly put it in-place. Why does this have to be so complex? There is has been NO guidance on how to manage sync plans, and IMO, there should not need to be any. It should be straightforward. Exactly how would it be "unacceptable" if I were to disable ALL sync plans and simply use the script I provided earlier? Why does the customer experience have to be a *constant* victim of these design decisions?
> There is has been NO guidance on how to manage sync plans, and IMO, there should not need to be any. It should be straightforward. I completely, 100% agree with you. As described in comment #7, there should not be a need to manage the sync plan because it should "just work" on the default configuration. But that is precisely why I don't think it's a good UX choice to expose options to the user which serve the sole purpose of working around a deficiency elsewhere. That would be taking a workaround - not a fix - and permanently adding it to the UI. An actual fix would be to not need such a magic checkbox in the first place. If you have SSH access to the satellite box, you can run printf "%s\n\n" "`top -b -o +%MEM | head -n 22`" >> memory_log.txt once during an idle period and then a couple of times throughout the execution of a mass-sync. Then attach memory_log.txt here. In the future, we do actually have an ongoing project to expose metrics to Prometheus / Grafana for easier monitoring. Additionally, could you please clarify which version you are currently running?
I just completed upgrades to 6.13.2 and I see the recent 1-line suggestion in https://bugzilla.redhat.com/show_bug.cgi?id=2122872#c51 I have not made this change on my satellites. I don't see anywhere near the pulp memory usage from earlier.
Good to hear! The patch(es) will make it into 6.13 in a release or two, hopefully it will drop further.