Bug 1510136

Summary: RHUI 2/3 does not account for Candlepin 2 behavior of regenerating dirty certificates when updates needed
Product: Red Hat Update Infrastructure for Cloud Providers Reporter: Craig Donnelly <cdonnell>
Component: ToolsAssignee: RHUI Bug List <rhui-bugs>
Status: CLOSED ERRATA QA Contact: Vratislav Hutsky <vhutsky>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 2.1CC: bcourt, bkearney, cdonnell, hfukumot, hmore, kdixon, kfujii, kkohata, mkubik, mshimura, nyamashi, pcantle, rbiba, rhui-bugs, ssato, syamamot, tasander, tbhowmik, ykawada
Target Milestone: 3.0.3Keywords: PrioBumpGSS
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-05-16 12:48:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Craig Donnelly 2017-11-06 18:16:04 UTC
Description of problem:
Currently, there are several cases popping up around RHUI entitlement certs being suddenly invalid, resulting in a lack of ability to successfully sync new content from the CDN.

This appears to be resulting from a chain of events which are normal for almost all systems in respect for this, but is unaccounted for in RHUI.

At this point, Candlepin 2 will mark certificates dirty when they need to be updated. This could be for various reason, including updates to product content (paths, or otherwise), new engIDs, etc..

When Candlepin 2 marks this certificate dirty, this tells subscription-manager that during the next check-in with the customer portal, the system should generate and receive updated entitlement certificates with the new/corrected information.

Right now, we are also experiencing the process of refreshing all accounts to accommodate some missing information at the point where Candlepin 2 was rolled out to production. This means that same product data was unavailable or incorrect, and is being pushed out to all the of the accounts that are being refreshed.

With this information in mind, the issue is currently more widespread than would normally be experienced, but none-the-less needs to be accounted for in RHUI.

Essentially, my current understanding is as follows:

If your RHUI server is actively registered and subscribed (itself) to the customer portal via subscription-manager, and is utilizing the same consumer/distributor upstream as is being used to download the entitlement certificate that is imported into `rhui-manager`, if any content on that entitlement cert changes upstream (regardless of action via customer), it will be marked for regeneration - which will occur during the next subscription-manager check-in, rendering the currently imported entitlement cert within `rhui-manager` to be obsolete. This will result in the lack of ability to continue to sync CDN content into the RHUI environment.

By my understanding, this would affect RHUI 2 and 3.

One idea for updates regarding this issue would be for RHUI to make use of entitlement certificates that are present in /etc/pki/entitlement, rather than expected manual user intervention to update the certificates themselves via `rhui-manager`.

Version-Release number of selected component (if applicable):
Candlepin 2 Prod
RHUI 2, 3

How reproducible:
100% given right conditions: 

Steps to Reproduce:
1. Need an out of date entitlement cert with missing/incorrect content, and then an account refresh.
2. RHUI could stop successful syncs at this point of the entitlement cert was marked dirty, and sub-man check-in refreshes the data.

Actual results:
RHUI can no longer sync until a new entitlement certificate is uploaded into rhui-manager. (can be obtained from customer portal or /etc/pki/entitlement)

Expected results:
RHUI needs a (documented) method of utilizing a different entitlement cert that would not be touched by a subscription-manager check-in, or should make use of the entitlement certs that would be present on the system after a refresh.

Additional info:

This information is as I remember it being described and could be slightly off in various places - further information and requests for technical data should be brought to bcourt@redhat on the Candlepin side.

Comment 3 Craig Donnelly 2017-11-06 20:08:16 UTC
There is an additional recovery step required for this in the case of RHUI 3.

At this point, it appears that RHUI 3 will cache the entitlement certificates on the shared storage endpoint for use with the importers.

In our test case, the remote storage is mounted at /var/lib/rhui/remote_share on the RHUA. Inside this directory there is an 'importers' directory, which contains a directory per active repository on the RHUA. Within each of those repository directories there is a 'pki' directory which contains a copy of the CA, client cert, and key.

The client cert inside this directory does not get updated when you import a new entitlement cert into RHUI 3, and therefore to get RHUI 3 to sync any of these repositories again, you must remove the repo and re-add it inside rhui-manager. This will delete the directory for that repo and re-create it, also adding the new correct entitlement certificates.

The way to get this working is to use the following steps:

1. Login to rhui-manager on the RHUA and remove all repositories that will not sync.
2. Delete the user.crt due to caching via: `rm .rhui/<RHUA HOSNAME>/user.crt`
3. Login to rhui-manager, refresh your list of repositories and re-add all removed repositories.
4. Sync.

At that point, you should then be able to sync the repo on RHUI 3 again.

Unfortunately, you would need to do this process for ALL repositories you had enabled when the entitlement cert became expired/revoked.

Comment 4 Craig Donnelly 2017-11-07 02:11:05 UTC
One quick note/modification to comment 3 above:

You could instead of deactivating all repositories, only deactivate and re-activate a single repository that is not working.

In that instance, you could then go get the certificates in the .../importers/<REPO>/pki directory on the shared storage and then copy them and overwrite all the others in the other repository directories.

I think this would be a significant shortcut to the procedure outlined above.

Comment 27 errata-xmlrpc 2018-05-16 12:48:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1569