Bug 2035224 - [RFE] Implement a Satellite installer option that provides same functionality as the deprecated '--katello-pulp-max-speed' option
Summary: [RFE] Implement a Satellite installer option that provides same functionality...
Keywords:
Status: NEW
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Pulp
Version: 6.10.1
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: Unspecified
Assignee: satellite6-bugs
QA Contact: Satellite QE Team
URL:
Whiteboard:
: 1791333 2067281 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-12-23 10:25 UTC by momran
Modified: 2024-04-04 16:20 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github pulp pulpcore issues 1893 0 None open As a user I can set the max speed when synchronizing a repo 2023-02-23 10:36:35 UTC
Red Hat Issue Tracker SAT-7256 0 None None None 2022-01-03 18:57:04 UTC
Red Hat Knowledge Base (Solution) 7017386 0 None None None 2023-06-06 10:34:19 UTC

Description momran 2021-12-23 10:25:44 UTC
Description of problem:

Request to implement a Satellite installer option that provides same functionality as the '--katello-pulp-max-speed' option which has been removed from Satellite 6.10 with the introduction of Pulp 3.


Version-Release number of selected component (if applicable):

satellite-6.10.1.1-1.el7sat.noarch
satellite-installer-6.10.0.7-1.el7sat.noarch


Additional info:

This feature is indispensable in scenarios where capsules are running in sites where cost of bandwidth is high.

Comment 15 Shekhar Raut 2022-04-13 20:37:45 UTC
*** Bug 2067281 has been marked as a duplicate of this bug. ***

Comment 16 Daniel Alley 2022-04-13 20:51:27 UTC
Ultimately Satellite (the application level) is not the correct place to be putting bandwidth limits. Even though it was supported previously, it is a strictly inferior solution compared to setting up an appropriate network traffic shaping policy [0] [1].

Satellite, even if it had complete network awareness between all of its own services, has no awareness of what any other services on the machine or machines on the network might be doing.  The only actor on the machine with that kind of global knowledge and control over networking is the operating system itself.  

[0] https://en.wikipedia.org/wiki/Traffic_shaping
[1] https://octetz.com/docs/2020/2020-09-16-tc/

All this being said, we ought to provide configuration recommendations for low-bandwidth scenarios.  The defaults are not optimized for this and I expect that the default timeout settings in particular would be very much insufficient for systems operating under strict bandwidth limitations.

Comment 17 pulp-infra@redhat.com 2022-05-17 10:01:38 UTC
The Pulp upstream bug status is at closed. Updating the external tracker on this bug.

Comment 18 pulp-infra@redhat.com 2022-05-17 10:26:21 UTC
All upstream Pulp bugs are at MODIFIED+. Moving this bug to POST.

Comment 25 Daniel Alley 2022-10-20 16:38:41 UTC
One thing I'd like to point out, is that I think this is being used as a proxy issue for "excessive bandwidth usage" in some cases.  This is something that should have dramatically improved lately due to changes on the Red Hat CDN which dropped the amount of repository metadata that needs to be downloaded significantly (very, very significantly). 

I think we mostly have a consensus that the feature shouldn't return in the form it previously existed in, due to how much more limited it is in effectiveness than OS tools such as "tc".  As such I'm going to reduce the priority/severity.

I think what we are still missing, is a good set of resources (tuning guide, guide to limit bandwidth using a tool such as traffic control "tc") for users who have this use case.

Comment 26 Daniel Alley 2022-10-20 16:54:21 UTC
As part of ^^, we should communicate with these users to understand what their ultimate goals are. Do they want to limit bandwidth? Do they want to limit requests/second?  etc.

And then for those use cases provide documentation to help our users set up traffic shaping accordingly.  Here's one guide we can test out to start with: https://linux-man.org/2021/09/24/how-to-limit-ingress-bandwith-with-tc-command-in-linux/

Comment 27 Paul Dudley 2022-10-20 17:48:21 UTC
Just a few things to note;
- another primary issue here is for Capsule syncs from a Satellite, where Capsules are often in different regions, perhaps in regions with not as much overall bandwidth as the Satellite itself, and customers in those regions would like to spread out bandwidth or limit it for overall connection integrity. Some customers want to keep these Capsules on the 'immediate' policy so they can control when most bandwidth will be used (at sync time, rather than patch time), and I've been asked several times now before the customer syncs if there is a way to limit the sync to save bandwidth while they sync hundreds of GB of content.

- as a matter of usability of the tools we provide, Pulp 3 has useful options for this already, as Sayan mentioned. These options are added for a reason, that is that they are helpful. Not being able to manipulate them or adjust them more easily than altering each remote via API (for a Capsule, specifically) seems to be defeating the purpose of adding the options in the first place. Helpful options already exist, and ideally we should expose them for ease-of-use in either the installer or editing a config file or anything similar. (perhaps this is out of scope for the bz, but seems ultimately related to me)

- while I can understand the best practice of editing the system properties as a whole, I think this perhaps overlooks scenarios like the one I posted above. If I'm about to sync 400GB perhaps I'd like to limit only the one application responsible for all of that content so that the system is otherwise in a better place for SCPing files, general troubleshooting, etc. at more normal speeds of whatever the network can handle. Me wanting to limit syncs shouldn't mean I want to limit overall network performance of the system as a whole in all scenarios.

Just providing (or perhaps duplicating) thoughts here. Also fwiw I don't disagree with medium prio/sev.

Comment 29 Jean Paul Gatt 2022-11-14 10:42:17 UTC
As THE paying customer for this product that opened the ticket that started this bugzilla, I feel that Daniel Alley's line of reasoning is short sighted and lacks the understanding of the underlying issue. Even a very simple tool like rsync can be used with a bandwidth limit as a parameter. This particular configuration, is something limited to the synchronization process of the capsules with satellite, and as such it is a capability of the synchronization process that needs to have the ability to be configured like other parameters are, like the number of synchronization processes to start. These capsules end up in locations, where we do have high spike's available for emergencies, but which cost hundreds to thousands per MEGABIT. Wasting our 95%tile on a capsule synchronization is a waste of money. This wall of resistance to implement this feature is quite frankly even more silly when considering that quite a number of times, the tool being used to synchronize these packages is rsync itself.

In order to implement this configuration with TC, we need to maintain another configuration tool that monitors and implements the limit, to ensure that the limit is being applied at all times, and persists across reboots, and persists across configuration changes by inexperienced staff. We usually utilize puppet for these kind of changes (outside of Redhat Satellite), but we cannot do this with puppet, since it would conflict with Redhat Satellite. I also don't want to use a one off configuration with Ansible, since this does not ensure that the configuration is not modified and/or it persists. Anyway, all of this should not be our concern, since this is a function that should be covered by the software itself.

An no, there is no need for the " around excessive bandwidth usage. It is excessive, because it cannot be controlled.

With regards to priority itself, I wonder how much lower the priority can be since this has already been open for a year and you're just flat out refusing to implement it. 

This is not a post on the Foreman forums, but a request from a paying customer of Redhat Satellite.

Comment 31 Ewoud Kohl van Wijngaarden 2022-11-15 16:17:45 UTC
I was thinking about various solutions. First of all, what we should keep in mind is that the link Satellite <-> Capsule should be capped, but Capsule <-> Client does not. Correct me if I'm wrong on that, but that's the use case for Capsules: you bring content close to clients where network is as much of a limiting factor.

When I read https://github.com/pulp/pulpcore/issues/1893 I think the concern with implementing it in side Pulp is that there are multiple workers and they can all decide to download something. Implementing a maximum bandwidth across multiple workers is a significant challenge in the existing architecture. One I (as a non-Pulp developer) can see.

Not promising anything, but to clarify: the only concern with TC is that you can't enforce the state via Puppet? Because if that's the concern, my team (which is responsible for, among others, the installer) can see what's possible there.

So in short, 2 questions:
* Can Capsule <-> Client traffic freely use the network without limitations?
* Would using TC in a managed and supported way be an acceptable solution?

Comment 32 Jean Paul Gatt 2022-11-15 16:31:28 UTC
Yes, I agree exactly. That is the scenario, traffic from Satellite <-> Capsule is capped, and not to the client, and that is specifically our use case.

Yes, I do also understand that is the issue, because in the past we have had to be creative with the limit. It does take a lot of trial and error to get the right value, specifically for the reasons you mentioned.

I am not opposed to a solution based around tc, as long as it it can be "controlled" via the Capsule configuration and deployment. Our concern is ensuring that when capsule is running, limit is set, especially when for some reason the node reboots during a time when non knowledgeable staff is handling the node. The ideal scenario would be a limit that comes up with the Capsule services, and goes away when the capsule is shut down. It is safe to assume that the traffic going out of the capsule does not need to be limited.

Comment 33 pulp-infra@redhat.com 2022-11-16 14:08:32 UTC
All upstream Pulp bugs are at MODIFIED+. Moving this bug to POST.

Comment 34 pulp-infra@redhat.com 2022-11-16 15:08:19 UTC
All upstream Pulp bugs are at MODIFIED+. Moving this bug to POST.

Comment 40 Eric Helms 2023-05-01 20:17:48 UTC
*** Bug 1791333 has been marked as a duplicate of this bug. ***

Comment 43 Eric Helms 2023-08-30 14:25:30 UTC
I am aiming to close out this bugzilla with a recap of our findings and recommendations. First, attempting to do this within the Satellite code base is fraught with edge cases, inconsistencies and will lead to a worse experience. Our belief is that this needs to be properly done through networking tooling. Two suggestions we have is to us tc or a proxy. A proxy placed between Satellite and data being synced can allow for traffic control at a single point that be tuned to the desired needs and not affect Satellite itself. Note that between a Capsule and Satellite is much harder as we do not support putting a proxy between a Satellite and Capsule.

Comment 45 Brad Buckingham 2024-01-09 20:58:59 UTC
Upon review of our valid but aging backlog the Satellite Team has concluded that this Bugzilla does not meet the criteria for a resolution in the near term, and are planning to close in a month. This message may be a repeat of a previous update and the bug is again being considered to be closed. If you have any concerns about this, please contact your Red Hat Account team.  Thank you.

Comment 47 Brad Buckingham 2024-02-07 17:39:20 UTC
Based upon feedback during auto-closure, leaving this bugzilla open a while longer so that any proposed workaround or alternative solution may be documented.


Note You need to log in before you can comment on or make changes to this bug.