Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1924844 - When simple content access is enabled, entitlement cert might not regenerate correctly after adding and removing repos from a content view due to race conditions [NEEDINFO]
Summary: When simple content access is enabled, entitlement cert might not regenerate ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Candlepin
Version: 6.8.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: 6.10.0
Assignee: satellite6-bugs
QA Contact: Danny Synk
URL:
Whiteboard:
Depends On:
Blocks: 1925546 1935300
TreeView+ depends on / blocked
 
Reported: 2021-02-03 17:58 UTC by Jessica Hanley
Modified: 2024-06-14 00:09 UTC (History)
14 users (show)

Fixed In Version: candlepin-4.0.2-1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1925546 (view as bug list)
Environment:
Last Closed: 2021-11-16 14:10:01 UTC
Target Upstream Version:
Embargoed:
nmoumoul: needinfo?


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 5774301 0 None None None 2021-09-28 08:52:51 UTC
Red Hat Product Errata RHSA-2021:4702 0 None None None 2021-11-16 14:10:12 UTC

Description Jessica Hanley 2021-02-03 17:58:25 UTC
Description of problem:

With the simple content access(SCA) enabled, the entitlement cert for the client cannot be regenerated after deleting repos from a content view.


Version-Release number of selected component (if applicable):

satellite-6.8.1-1.el7sat.noarch
candlepin-3.1.22-1.el7sat.noarch


How reproducible:

unclear


Steps to Reproduce:
1.  enable simple content access
2.  create a content view with repositories
3.  register a host to that content view
4.  run this command:  subscription-manager repos
5.  remove a repository from that content view
6.  publish and promote that content view
7.  run this command and compare the results:  subscription-manager repos

Actual results:

The removed repositories still appear in the output of "subscription-manager repos" on the host.

Expected results:

The removed repositories should not appear in the output of "subscription-manager repos" on the host.


Additional info:

Disabling the repository from Satellite entirely doesn't help.  Neither does any action taken on the host.  At this time, it can only be removed from postgres manually.

Comment 2 Hao Chang Yu 2021-02-04 01:45:37 UTC
Firstly, we need to understand that the new SCA entitlement certificate for the content view environment will be regenerated when the first client requested it. The client who calls the "/certificates/serials" API will trigger the certificate regeneration. This API can be triggered by some subscrption-manager commands (refresh, repos --list etc), yum and rhsmcertd (I think).

Secondly, when publishing a content view, the "Actions::Candlepin::Environment::SetContent" step will call 2 Candlepin APIs to update the contents. It will call "promoteContent" follow by "demoteContent".

Due to this behaviour, a race conditions may happen when there are large number of clients are registered to the content view environment.

The race conditions is happening in the below order:
1) Candlepin runs "promoteContent".
2) Candlepin added new Contents and removed existing SCA entitlement Certificate.
3) One client run yum or triggers rhsmcertd.
4) Candlepin found that old SCA entitlement certificate has been removed and start generating a new one.
5) Candlepin runs "demoteContent"
6) Candlepin deleted unneeded contents. There is no existing SCA entitlement Certificate to remove.
7) New SCA entitlement Certificate in step (4) has been generated successfully.

Since the new SCA entitlement Certificate which starts generating in step (4) before the "demoteContent" is run, it will include the newly added contents and will still include the contents to be removed.

Comment 3 Hao Chang Yu 2021-02-04 02:11:07 UTC
How to reproduce:
1. Enable simple content access
2. Create a content view with some repositories
3. Publish the first version of that content view.
4. Register a host to that content view
5. On that host, run "subscription-manager repos --list". The available repos should be listed correctly.
6. Add one or more repositories to that content view
7. Remove one or more repositories from that content view
8. Open a few terminals and run the following commands concurrently. It is more easier to reproduce the issue when you run with more concurrency.

for i in {1..10000}; do curl -k -u "<username>:<pass>" https://<satellite fqdn>/rhsm/consumers/<consumer uuid of that host>/certificates/serials; done

9. Publish new version of that content view.


Actual results:

The removed repositories still appear in the output of "subscription-manager repos --list" on the host.

Expected results:

The removed repositories should not appear in the output of "subscription-manager repos --list" on the host.

Comment 14 Danny Synk 2021-06-14 19:54:12 UTC
Steps to test:

1. Deploy Satellite 6.10, snap 4.
2. Upload a manifest with Simple Content Access enabled.
3. Synchronize 4 repositories.
4. Create a content view containing 2 of the synced repositories.
5. Publish the first version of the content view.
6. Register a host to the content view using the global registration template.
7. Verify that only the two repositories in the content view show as available on the content host in the output of `subscription-manager repos --list`.
8. Add the other two repositories to the content view.
9. Remove the original two repositories from the content view.
10. Open 4 terminal sessions and, in each one, run `for i in {1..10000}; do curl -k -u admin:password https://satellite.example.com/rhsm/consumers/<consumer uuid of the host>/certificates/serials; done`
11. After the loops run in the previous step complete, publish new version of the content view.
12. Run `subscription-manager repos --list` on the content host.

Expected Results:

The two newly-added repositories show as available in the output of `subscription-manager repos --list`. The two repositories in the first version of the content view do not appear in the output.

Actual Results:

The two newly-added repositories show as available in the output of `subscription-manager repos --list`. The two repositories in the first version of the content view do not appear in the output.

Verified on Satellite 6.10, snap 4 (candlepin-4.0.4-1.el7sat.noarch).

Comment 15 patalber 2021-08-02 15:45:15 UTC
Good morning, Team,

One of the customers affected by the bug has moved to 6.9.4, and provided this update:

"After upgrading our entire Satellite env to 6.9.4 the issue is still present.    When I change Content View and run a "subscription manager refresh" I see the old Content View mixed in with the new..   This is RHEL 6 so I don't have the "--force" option like I do in 7 and 8 which fixes it.  The issue happens when there is only 1 certificate deleted, the "--force" on 7 and 8 removes more.    The thing I noticed on RHEL6 is that if I run the "subscription-manager refresh" twice it appears to fix the issue.    When I run it the second time it mores 3 more certificates.   I have done very limited testing so by no means is this the fix. 



-- When running "subscription-manger refresh" twice.  Notice the 2nd time removes more certs:
[root@host yum.repos.d]# subscription-manager refresh
1 local certificate has been deleted.
All local data refreshed
[root@host yum.repos.d]# subscription-manager refresh
1 local host has been deleted.
3 local certificates have been deleted.
All local data refreshed
[root@host yum.repos.d]#
"

Is it odd that it takes two runs of the refresh command to work around the issue?

Thank you.

--Patrick

Comment 16 Nikos Moumoulidis 2021-08-03 11:19:54 UTC
(In reply to patalber from comment #15)
> -- When running "subscription-manger refresh" twice.  Notice the 2nd time
> removes more certs:
> [root@host yum.repos.d]# subscription-manager refresh
> 1 local certificate has been deleted.
> All local data refreshed
> [root@host yum.repos.d]# subscription-manager refresh
> 1 local host has been deleted.
> 3 local certificates have been deleted.
> All local data refreshed

Hi Patrick,

This sounds exactly like this subscription-manager bug: https://bugzilla.redhat.com/show_bug.cgi?id=1960220
Could you provide the exact subscription-manager version that this happened with?

Thanks,
Nikos

Comment 17 patalber 2021-08-04 21:45:14 UTC
Hi Nikos,

The version of subscription-manager on the client exhibiting the issue is subscription-manager-1.24.45-1.el7_9.x86_64.

The version in that bug is 1.24.42-1.el7.

Please let me know if there is more that I can do.

--Patrick

Comment 18 Nikos Moumoulidis 2021-08-05 10:04:17 UTC
(In reply to patalber from comment #17)
Hi Patrick,

Based on the 'Fixed in version' field on that bug (1886772), the issue was fixed in subscription-manager-1.24.48-1.el7_9,
and all versions before that (including version subscription-manager-1.24.45-1.el7_9.x86_64) are affected by the problem.
So my suggestion would be to upgrade subscription-manager to subscription-manager-1.24.48-1.el7_9.

Thanks,
Nikos

Comment 19 patalber 2021-08-06 16:47:46 UTC
Hi Nikos,

The customer is seeing this only on RHEL 6.x clients. RHEL7/8 are working because of the later version of subscription-manager being installed, I believe.

The latest version for RHEL6 that I see is 1.20.10-8. I can imagine that this type of fix would be unlikely to make it into a RHEL6 package, given where RHEL6 is, cycle-wise.

Is there anything else we can do?

Thanks.

--Patrick

Comment 26 Peter Vreman 2021-09-27 14:18:25 UTC
I see the same error with Satellite 6.9.6.1 and RHEL8.4 clients. I replaced a content view in a composite content views and the RHEL8.4 clients are still seeing a repo from an old content view that i removed from the composite content view.

I was always already afraid of enabling SCA and it has been proven that it is still not working correct.
With the this SCA major bug found in Satellite 6.8, why is it still not fixed in Satellite 6.9.x?

Comment 27 Nikos Moumoulidis 2021-09-27 14:30:35 UTC
(In reply to Peter Vreman from comment #26)
> I see the same error with Satellite 6.9.6.1 and RHEL8.4 clients. I replaced
> a content view in a composite content views and the RHEL8.4 clients are
> still seeing a repo from an old content view that i removed from the
> composite content view.

Hi Peter,

Which subscription-manager version are the clients running?
Also, if you run "subscription-manager refresh" twice in a row (like mentioned in comment #15), does that work around the issue?

Thanks,
Nikos

Comment 28 Peter Vreman 2021-09-27 14:42:15 UTC
$  rpm -q subscription-manager
subscription-manager-1.28.13-3.el8_4.x86_64

Running 'subscription-manager refresh' multiple times does not help, but running 'subscription-manager refresh --force' works. Also re-registering using 'subscription-manager register --force' the clients work.

Lucky enough i hit it on my testing Satellite installation. I was planning on enabling it on my Production Satellite next week, there it could have been breaking 100's of Clients that suddenly stop working get the latest content and security errata.

Looking at the github fix it is from 6+ Months old and it still not in 6.9.x included. Sorry, this is a real bug breaking existing systems by the RedHat strongly recommended SCA because it is so easy.... Yes.... to break your a 7 year long working process of refreshing Certificates at the clients to access the YUM repos.... Please this is more important than some RH Cloud and Pulp3 migration stuff.

Comment 36 Sayan Das 2021-09-28 16:00:53 UTC
Hello,

I cannot seem to reproduce the issue.

My ENV:

# rpm -q satellite katello candlepin
satellite-6.9.6.1-1.el7sat.noarch
katello-3.18.1-3.el7sat.noarch
candlepin-3.1.28-1.el7sat.noarch


RHEL 7.9 Client:

# rpm -qa subscription-manager*
subscription-manager-rhsm-certificates-1.24.48-1.el7_9.x86_64
subscription-manager-1.24.48-1.el7_9.x86_64
subscription-manager-rhsm-1.24.48-1.el7_9.x86_64


RHEL 8.4 client:

# rpm -qa *subscription*
subscription-manager-rhsm-certificates-1.28.13-3.el8_4.x86_64
dnf-plugin-subscription-manager-1.28.13-3.el8_4.x86_64
subscription-manager-1.28.13-3.el8_4.x86_64
python3-subscription-manager-rhsm-1.28.13-3.el8_4.x86_64
subscription-manager-cockpit-1.28.13-3.el8_4.noarch



Tried the exact reproducer steps but as soon as I publish the CV, Promote it and then come back to the client and execute "subscription-manager repos --list", It will show "1 local certificate has been deleted." and then list the correct repos. At least for me, that is the scenario.

Comment 37 Nikos Moumoulidis 2021-09-29 10:59:31 UTC
I just realized that the reproduction steps in comment #15 are wrong, and do not match with the original steps in comment #3, so the verification was not accurate

More specifically, step 11 is wrong; it should say "Publish the new version of the content view WHILE the loops are running" instead of

> 11. After the loops run in the previous step complete, publish new version
> of the content view.

The proper list of reproduction steps would then be:

1. Deploy Satellite 6.10, snap 4 (or any version of Satellite that has candlepin-3.1.28-1+, such as Satellite 6.9.2)
2. Upload a manifest with Simple Content Access enabled.
3. Synchronize 4 repositories.
4. Create a content view containing 2 of the synced repositories.
5. Publish the first version of the content view.
6. Register a host to the content view using the global registration template.
7. Verify that only the two repositories in the content view show as available on the content host in the output of `subscription-manager repos --list`.
8. Add the other two repositories to the content view.
9. Remove the original two repositories from the content view.
10. Open 4 terminal sessions and, in each one, run `for i in {1..10000}; do curl -k -u admin:password https://satellite.example.com/rhsm/consumers/<consumer uuid of the host>/certificates/serials; done`
11. Publish the new version of the content view WHILE the loops are running.
12. Run `subscription-manager repos --list` on the content host.

Expected Results:

The two newly-added repositories show as available in the output of `subscription-manager repos --list`. The two repositories in the first version of the content view do not appear in the output.

Comment 38 Peter Vreman 2021-09-29 11:38:54 UTC
Is there any log pattern that can be check to see if i hit this issue? In the last 3 days i have at least hit twice the issue. And the last time it was even a bit more inconsistent that even on the client the 'subscription-manager refresh --force' did not work. I had to use 'subscription-manager register --force' to get the client working again.

Comment 39 Sayan Das 2021-09-29 11:49:07 UTC
Hello Peter, 

As I had mentioned in the case, the following is the scenario we are considering.

  * rhsmcertd\yum]sub-man commands were invoked on the target client system and that was trying to check-in with satellite.
  * CV was published exactly the same time when the host was checking in with satellite.

Since the automation at your end executes sub-man repos --enable=*

You should be able to see the following hits for "--enable=*", in apache logs

XX.XX.XXX.XXX - - [29/Sep/2021:17:13:44 +0530] "GET /rhsm/status HTTP/1.1" 200 426 "-" "RHSM/1.0 (cmd=subscription-manager) subscription-manager/1.28.13-3.el8_4"
XX.XX.XXX.XXX - - [29/Sep/2021:17:13:44 +0530] "GET /rhsm/consumers/2308c9de-58b0-4e8d-9a21-0965c72295f2/accessible_content HTTP/1.1" 200 3954 "-" "RHSM/1.0 (cmd=subscription-manager) subscription-manager/1.28.13-3.el8_4"
XX.XX.XXX.XXX - - [29/Sep/2021:17:13:45 +0530] "GET /rhsm/consumers/2308c9de-58b0-4e8d-9a21-0965c72295f2/content_overrides HTTP/1.1" 200 637 "-" "RHSM/1.0 (cmd=subscription-manager) subscription-manager/1.28.13-3.el8_4"
XX.XX.XXX.XXX - - [29/Sep/2021:17:13:45 +0530] "GET /rhsm/consumers/2308c9de-58b0-4e8d-9a21-0965c72295f2/accessible_content HTTP/1.1" 200 3954 "-" "RHSM/1.0 (cmd=subscription-manager) subscription-manager/1.28.13-3.el8_4"
XX.XX.XXX.XXX - - [29/Sep/2021:17:13:45 +0530] "GET /rhsm/consumers/2308c9de-58b0-4e8d-9a21-0965c72295f2/content_overrides HTTP/1.1" 200 637 "-" "RHSM/1.0 (cmd=subscription-manager) subscription-manager/1.28.13-3.el8_4"
XX.XX.XXX.XXX - - [29/Sep/2021:17:13:45 +0530] "GET /rhsm/consumers/2308c9de-58b0-4e8d-9a21-0965c72295f2 HTTP/1.1" 200 16862 "-" "RHSM/1.0 (cmd=subscription-manager) subscription-manager/1.28.13-3.el8_4"
XX.XX.XXX.XXX - - [29/Sep/2021:17:13:45 +0530] "GET /rhsm/consumers/2308c9de-58b0-4e8d-9a21-0965c72295f2/content_overrides HTTP/1.1" 200 637 "-" "RHSM/1.0 (cmd=subscription-manager) subscription-manager/1.28.13-3.el8_4"
XX.XX.XXX.XXX - - [29/Sep/2021:17:13:46 +0530] "PUT /rhsm/consumers/2308c9de-58b0-4e8d-9a21-0965c72295f2 HTTP/1.1" 200 41 "-" "RHSM/1.0 (cmd=subscription-manager) subscription-manager/1.28.13-3.el8_4"
XX.XX.XXX.XXX - - [29/Sep/2021:17:13:46 +0530] "GET /rhsm/consumers/2308c9de-58b0-4e8d-9a21-0965c72295f2 HTTP/1.1" 200 16862 "-" "RHSM/1.0 (cmd=subscription-manager) subscription-manager/1.28.13-3.el8_4"
XX.XX.XXX.XXX - - [29/Sep/2021:17:13:46 +0530] "GET /rhsm/consumers/2308c9de-58b0-4e8d-9a21-0965c72295f2/compliance HTTP/1.1" 200 240 "-" "RHSM/1.0 (cmd=subscription-manager
a


where 2308c9de-58b0-4e8d-9a21-0965c72295f2 is the consumer id of the system which can be found out vai "sub-man identity"


Now what needs to be done I guess is to check inside production.log of satellite to find the POST API calls for publish or promote e.g.

Started POST "/katello/api/v2/content_views/10/publish?organization_id=3

Started POST "/katello/api/v2/content_view_versions/15/promote?organization_id=3"


If that entry can be found and we can track down the request till the end of publish and promotion, We need to collect the timstamp from there and the verify if during the processing of CV publish or promote, that host had checked in to satellite or not i.e. Those rhsm api calls are visible or not.


I am sure Nikos will add something else if I missed interpreting the problem here.

Comment 40 Peter Vreman 2021-09-29 14:16:08 UTC
I have provided detals the with log snippets and timeframes the issue is seen into my case and provided also an updated sosreport,foreman-debug of the satellite and client.

During the publishing timeframe there is no client checking using the CCV having the issue. But there are 8 other clients using various other CCVs checkin
Note that also after updating the CCV with the new repo i delete the old CCV version with the old repo

One interesting finding is that i see the 'broken' that 'subscription-manager repos enable' deletes 1 local certificate at the time another CCV is published.

Comment 41 Peter Vreman 2021-09-29 14:24:28 UTC
Another thing to notice is that i do good housekeeping and delete also the replaced CV and Product. This all happened between the time of the 2 checkins.
Log snippets are shared the case

Comment 42 Danny Synk 2021-09-29 14:40:10 UTC
I re-ran the verification steps per comment #37 on Satellite 6.10, snap 20 (candlepin-4.0.6-1.el7sat.noarch), and the result was the same as the result reported in comment #15.

Corrected Steps to Test:
1. Deploy Satellite 6.10, snap 20.
2. Upload a manifest with Simple Content Access enabled.
3. Synchronize 4 repositories.
4. Create a content view containing 2 of the synced repositories.
5. Publish the first version of the content view.
6. Register a host to the content view using the global registration template.
7. Verify that only the two repositories in the content view show as available on the content host in the output of `subscription-manager repos --list`.
8. Add the other two repositories to the content view.
9. Remove the original two repositories from the content view.
10. Open 4 terminal sessions and, in each one, run `for i in {1..10000}; do curl -k -u admin:password https://satellite.example.com/rhsm/consumers/<consumer uuid of the host>/certificates/serials; done`
11. While the loops started in the previous step are running, publish new version of the content view.
12. After the new content view version is published, run `subscription-manager repos --list` on the content host.

Expected Results:
The two newly-added repositories show as available in the output of `subscription-manager repos --list`. The two repositories in the first version of the content view do not appear in the output.

Actual Results:
The two newly-added repositories show as available in the output of `subscription-manager repos --list`. The two repositories in the first version of the content view do not appear in the output.

Comment 58 errata-xmlrpc 2021-11-16 14:10:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Satellite 6.10 Release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4702


Note You need to log in before you can comment on or make changes to this bug.