2027521 – RHUIv4 does not function when RHUA is unavailable

Bug 2027521 - RHUIv4 does not function when RHUA is unavailable

Summary: RHUIv4 does not function when RHUA is unavailable

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Update Infrastructure for Cloud Providers
Classification:	Red Hat
Component:	RHUA
Sub Component:
Version:	4.0.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	unspecified
Target Milestone:	4.1.0
Target Release:	4.x
Assignee:	RHUI Bug List
QA Contact:	Radek Bíba
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2084950
TreeView+	depends on / blocked

Reported:	2021-11-29 21:32 UTC by Liam Hopkins
Modified:	2022-05-12 17:11 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	2084950 (view as bug list)
Environment:
Last Closed:	2022-04-12 08:51:25 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHUI-157	None	None	None	2022-01-12 15:25:39 UTC
Red Hat Issue Tracker	RHUI-173	None	None	None	2022-01-12 15:25:39 UTC
Red Hat Product Errata	RHBA-2022:1315	None	None	None	2022-04-12 08:51:28 UTC

Description Liam Hopkins 2021-11-29 21:32:19 UTC

Description of problem:

RHUIv4 does not function when RHUA is unavailable.

Version-Release number of selected component (if applicable):

4.0beta

How reproducible:

Consistently

Steps to Reproduce:

1. Launch a RHUIv4 cluster, create entitlement certs, packages, subscribe client
2. Shut down the RHUA node
3. Attempt to perform any yum/dnf action on the client

Actual results:

Failure to perform any yum/dnf action e.g. yum makecache

Output from client:
```
Red Hat Enterprise Linux 8 for x86_64 - BaseOS from RHUI (RPMs) 34 B/s | 4.1 kB 02:02
Errors during downloading metadata for repository 'rhui-rhel-8-for-x86_64-baseos-rhui-rpms':
- Curl error (28): Timeout was reached for https://cds.example.com/pulp/content/content/dist/rhel8/rhui/8/x86_64/baseos/os/repodata/repomd.xml [Operation too slow. Less than 1000 bytes/sec transferred the last 30 seconds]
Error: Failed to download metadata for repo 'rhui-rhel-8-for-x86_64-baseos-rhui-rpms': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried
```

Expected results:

Successful yum/dnf operation

Additional info:

The nginx configuration on the CDS nodes attempts to first reach NFS and fall back to rhua-fetcher (really, API call to Pulp on RHUA node) on some content, or first try rhua-fetcher and fall back to NFS on other content. It's not obvious why this is so.

For testing, we exported all repos (create symlinks on shared storage) and removed the nginx config sections that attempt to talk to the RHUA, and it functioned normally when RHUA was unavailable. Unsure if this is relevant, without yet knowing why the CDS contacts the RHUA the way it does.

This is very critical to us, because as currently functioning, failure of RHUA causes a failure of RHUI overall, thus it is a single point of failure. We would expect instead that RHUI is able to serve content, but that repo synchronization would not occur until RHUA is again functioning.

Comment 1 Liam Hopkins 2021-11-29 23:17:07 UTC

We are able to manually get repomd.xml or .rpm files via curl, assuming we specify the correct --cert and --key flags to match the repo definition. It always takes 30seconds to get content, seeming to indicate it is a timeout trying rhua-fetcher and falling back to NFS. Perhaps this 30s is too long for yum clients?

Comment 2 Martin Minar 2021-11-30 10:03:13 UTC

Hi Liam,
this is expected behavior. Unfortunately it will be clarified in proper documentation that will be available with RHUI 4.0 GA (very-very soon).

Basically what happens is that Pulp will store content in it's own format (rpms named based on hashes) and only creates symlink to proper path once this path is requested. In standard Pulp installation this is not an issue since Pulp server and storage are usually on the same machine. In RHUI4 world Pulp is present only on RHUA node and CDS node is only using content stored on storage shared with this RHUA node.
Downside is that CDS then have to request this symlink generation in case nginx wont find rpm on requested path - but CDS needs to do that ONLY ONCE PER RPM. And since we anticipated that your scenario might happen, the "Export repo" feature was added.

Bottomline - when you exported repos, you did exactly the recommended operation.

There are some ways how to workaround this Pulp behavior and we are preparing some - like ability to turn on auto exporting.

BTW, exporting is long running task that will take spot in queue - that is one of the reasons why we don't do it by default - in normal circumstances (RHUA available) an impact of creating symlink is that once per rpm this request takes twice the time (like 4miliseconds instead of 1-2ms).
There is also a "hack" how to force that - run yumdownloader '*' in client. ;) But obviously this will take up a lot of space on client and will only creates symlinks for latest versions.

If you are ok with this explanation, I will close the bug as NOT A BUG. Or we can change the bug designation to RFE "Add auto exporting of repo" - which is already on our internal plan for next RHUI release.

Comment 3 Liam Hopkins 2021-11-30 18:08:16 UTC

We did export all repos, and the issue persisted, due to the timeout in attempting to contact the backend before falling back to the local NFS. It did eventually succeed but due to the timeout, yum clients produce errors.

It was only when we tested amending the nginx configuration to NOT try to contact the RHUA at all that we were able to get client functioning while RHUA was offline. So it is not solely the symlink issue.

Comment 4 Liam Hopkins 2021-11-30 18:25:21 UTC

The CDS nginx config says for repomd, try RHUA first, or fall back to NFS. For all other content, it says try NFS first, and fall back to RHUA. If they all tried NFS first, the RHUI install would be resilient to RHUA outages (assuming the content had been downloaded at least once or we have auto-exporting enabled). What would be the impact if we made this change? Why is repomd the reverse of other content in lookup order?

The impact of on-demand symlink creation is not that it takes twice the time per request, but that it makes the RHUI service entirely reliant on the RHUA. The RHUA itself must be available and it must be able to respond to all incoming client requests, and is not a scalable component in this design. It seems to defeat the purpose of having the CDS nodes scalable and hosted behind a load balancer.

Comment 5 Martin Minar 2021-12-15 08:50:49 UTC

Hi,
yes, you are right on both accounts.

You can remove the
location ~ /pulp/content/.*/repomd\.xml {
[..snip..]
}
part of nginx-ssl.conf to get independence on RHUA, but you NEED to make sure TO EXPORT REPO TO FILESYSTEM every-time you sync the repo.

And yes - this is impairing the resilience of RHUI4. We are aware of this and we already have addressing of this issue planned. It will however take a bit more time, so in the meantime we will try at least to make exporting easier and deliver CLI version of "export repo to filesystem" in next patch (est. Feb'22).

Comment 9 Liam Hopkins 2022-03-28 23:44:06 UTC

OK, Q2 check-in ! We are preparing to launch RHUIv4 on GCP, what is the current status and roadmap for these items? We would prefer that CDS never has to 'fall back' to the RHUA but can simply serve content off the shared storage. The last status update was this might work for us by modifying the CDS nginx configuration to remove the forward, and to ensure all symlinks are created through an unspecified operation to keep symlinks exported. Any guidance on how to accomplish this? Did we make the February estimate for CLI support?

Comment 10 Martin Minar 2022-03-29 07:36:27 UTC

Hello Liam,
CLI support for exporting of symlinks is already in released 4.0.1 version (check "rhui-manager repo export --help").
But what is I guess more important for you is that version 4.1.0 that is scheduled to be released on 12th April (+- give us some space for delay) contains complete solution to your problem - RHUI 4.1 will by default auto-export symlinks after sync and also nginx changes removing fall back will be in place.

Comment 11 Liam Hopkins 2022-03-29 17:16:19 UTC

excellent news, will look forward to that release

Comment 13 errata-xmlrpc 2022-04-12 08:51:25 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHUI 4.1.0 release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:1315

Note You need to log in before you can comment on or make changes to this bug.