Description of problem: RHUIv4 does not function when RHUA is unavailable. Version-Release number of selected component (if applicable): 4.0beta How reproducible: Consistently Steps to Reproduce: 1. Launch a RHUIv4 cluster, create entitlement certs, packages, subscribe client 2. Shut down the RHUA node 3. Attempt to perform any yum/dnf action on the client Actual results: Failure to perform any yum/dnf action e.g. yum makecache Output from client: ``` Red Hat Enterprise Linux 8 for x86_64 - BaseOS from RHUI (RPMs) 34 B/s | 4.1 kB 02:02 Errors during downloading metadata for repository 'rhui-rhel-8-for-x86_64-baseos-rhui-rpms': - Curl error (28): Timeout was reached for https://cds.example.com/pulp/content/content/dist/rhel8/rhui/8/x86_64/baseos/os/repodata/repomd.xml [Operation too slow. Less than 1000 bytes/sec transferred the last 30 seconds] Error: Failed to download metadata for repo 'rhui-rhel-8-for-x86_64-baseos-rhui-rpms': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried ``` Expected results: Successful yum/dnf operation Additional info: The nginx configuration on the CDS nodes attempts to first reach NFS and fall back to rhua-fetcher (really, API call to Pulp on RHUA node) on some content, or first try rhua-fetcher and fall back to NFS on other content. It's not obvious why this is so. For testing, we exported all repos (create symlinks on shared storage) and removed the nginx config sections that attempt to talk to the RHUA, and it functioned normally when RHUA was unavailable. Unsure if this is relevant, without yet knowing why the CDS contacts the RHUA the way it does. This is very critical to us, because as currently functioning, failure of RHUA causes a failure of RHUI overall, thus it is a single point of failure. We would expect instead that RHUI is able to serve content, but that repo synchronization would not occur until RHUA is again functioning.
We are able to manually get repomd.xml or .rpm files via curl, assuming we specify the correct --cert and --key flags to match the repo definition. It always takes 30seconds to get content, seeming to indicate it is a timeout trying rhua-fetcher and falling back to NFS. Perhaps this 30s is too long for yum clients?
Hi Liam, this is expected behavior. Unfortunately it will be clarified in proper documentation that will be available with RHUI 4.0 GA (very-very soon). Basically what happens is that Pulp will store content in it's own format (rpms named based on hashes) and only creates symlink to proper path once this path is requested. In standard Pulp installation this is not an issue since Pulp server and storage are usually on the same machine. In RHUI4 world Pulp is present only on RHUA node and CDS node is only using content stored on storage shared with this RHUA node. Downside is that CDS then have to request this symlink generation in case nginx wont find rpm on requested path - but CDS needs to do that ONLY ONCE PER RPM. And since we anticipated that your scenario might happen, the "Export repo" feature was added. Bottomline - when you exported repos, you did exactly the recommended operation. There are some ways how to workaround this Pulp behavior and we are preparing some - like ability to turn on auto exporting. BTW, exporting is long running task that will take spot in queue - that is one of the reasons why we don't do it by default - in normal circumstances (RHUA available) an impact of creating symlink is that once per rpm this request takes twice the time (like 4miliseconds instead of 1-2ms). There is also a "hack" how to force that - run yumdownloader '*' in client. ;) But obviously this will take up a lot of space on client and will only creates symlinks for latest versions. If you are ok with this explanation, I will close the bug as NOT A BUG. Or we can change the bug designation to RFE "Add auto exporting of repo" - which is already on our internal plan for next RHUI release.
We did export all repos, and the issue persisted, due to the timeout in attempting to contact the backend before falling back to the local NFS. It did eventually succeed but due to the timeout, yum clients produce errors. It was only when we tested amending the nginx configuration to NOT try to contact the RHUA at all that we were able to get client functioning while RHUA was offline. So it is not solely the symlink issue.
The CDS nginx config says for repomd, try RHUA first, or fall back to NFS. For all other content, it says try NFS first, and fall back to RHUA. If they all tried NFS first, the RHUI install would be resilient to RHUA outages (assuming the content had been downloaded at least once or we have auto-exporting enabled). What would be the impact if we made this change? Why is repomd the reverse of other content in lookup order? The impact of on-demand symlink creation is not that it takes twice the time per request, but that it makes the RHUI service entirely reliant on the RHUA. The RHUA itself must be available and it must be able to respond to all incoming client requests, and is not a scalable component in this design. It seems to defeat the purpose of having the CDS nodes scalable and hosted behind a load balancer.
Hi, yes, you are right on both accounts. You can remove the location ~ /pulp/content/.*/repomd\.xml { [..snip..] } part of nginx-ssl.conf to get independence on RHUA, but you NEED to make sure TO EXPORT REPO TO FILESYSTEM every-time you sync the repo. And yes - this is impairing the resilience of RHUI4. We are aware of this and we already have addressing of this issue planned. It will however take a bit more time, so in the meantime we will try at least to make exporting easier and deliver CLI version of "export repo to filesystem" in next patch (est. Feb'22).
OK, Q2 check-in ! We are preparing to launch RHUIv4 on GCP, what is the current status and roadmap for these items? We would prefer that CDS never has to 'fall back' to the RHUA but can simply serve content off the shared storage. The last status update was this might work for us by modifying the CDS nginx configuration to remove the forward, and to ensure all symlinks are created through an unspecified operation to keep symlinks exported. Any guidance on how to accomplish this? Did we make the February estimate for CLI support?
Hello Liam, CLI support for exporting of symlinks is already in released 4.0.1 version (check "rhui-manager repo export --help"). But what is I guess more important for you is that version 4.1.0 that is scheduled to be released on 12th April (+- give us some space for delay) contains complete solution to your problem - RHUI 4.1 will by default auto-export symlinks after sync and also nginx changes removing fall back will be in place.
excellent news, will look forward to that release
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (RHUI 4.1.0 release), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:1315