Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2184151 - katello:clean_backend_objects false alarms on systems with >1500 clients when PUTing customer facts
Summary: katello:clean_backend_objects false alarms on systems with >1500 clients when...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Hosts - Content
Version: 6.12.2
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: 6.15.0
Assignee: satellite6-bugs
QA Contact: Cole Higgins
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-04-03 18:33 UTC by Pavel Moravec
Modified: 2024-05-08 14:30 UTC (History)
13 users (show)

Fixed In Version: rubygem-katello-4.11.0.9-1.el8sat
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2272113 (view as bug list)
Environment:
Last Closed: 2024-04-23 17:14:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Foreman Issue Tracker 37283 0 Normal New katello:clean_backend_objects false alarms on systems with >1500 clients when PUTing customer facts 2024-03-20 16:27:55 UTC
Red Hat Issue Tracker SAT-19335 0 None None None 2023-08-02 16:37:56 UTC
Red Hat Issue Tracker SAT-24068 0 None None None 2024-03-24 12:36:11 UTC
Red Hat Knowledge Base (Solution) 7005998 0 None None None 2023-04-03 20:38:58 UTC
Red Hat Product Errata RHSA-2024:2010 0 None None None 2024-04-23 17:14:04 UTC

Description Pavel Moravec 2023-04-03 18:33:58 UTC
Description of problem:
When katello:clean_backend_objects rake script is running on a system with >1500 candlepin consumers at a time a consumer updates its facts, the rake script can wrongly detect a system is missing in candlepin, like:

Host 16080 hostname.example.com ff0a6edf-311c-4924-a7b9-b72707931c7b is partially missing subscription information.  Un-registering

The reason is katello within ::Katello::Resources::Candlepin::Consumer::all_uuids call queries candlepin in pages:

2023-03-23 07:46:34,701 [thread=https-jsse-nio-127.0.0.1-23443-exec-9] [req=849f90be-9f62-40e9-9541-341e3055916b, org=, csid=] INFO  org.candlepin.servlet.filter.logging.LoggingFilter - Request: verb=GET, uri=/candlepin/consumers/?owner=MyOrganization&include=uuid&per_page=1500&page=1
2023-03-23 07:48:50,123 [thread=https-jsse-nio-127.0.0.1-23443-exec-75] [req=a5d01c04-bf19-45e2-a42a-9b23493598c3, org=, csid=] INFO  org.candlepin.servlet.filter.logging.LoggingFilter - Request: verb=GET, uri=/candlepin/consumers/?owner=MyOrganization&include=uuid&per_page=1500&page=2
2023-03-23 07:50:21,081 [thread=https-jsse-nio-127.0.0.1-23443-exec-26] [req=40f37098-fbed-40a6-972e-f70510000ced, org=, csid=391be221-c85d-4224-953b-d7df9143650b] INFO  org.candlepin.servlet.filter.logging.LoggingFilter - Request: verb=PUT, uri=/candlepin/consumers/c8a22c7e-d959-42ef-b6a2-1eecbffd2459
2023-03-23 07:50:41,851 [thread=https-jsse-nio-127.0.0.1-23443-exec-20] [req=944317ac-69f3-45c5-9467-92b78734b276, org=, csid=e68d1403-cc68-4c79-a6a3-b8bd329fe9da] INFO  org.candlepin.servlet.filter.logging.LoggingFilter - Request: verb=PUT, uri=/candlepin/consumers/8484b642-fbb1-43ff-800a-26e792c37dfc
2023-03-23 07:51:11,228 [thread=https-jsse-nio-127.0.0.1-23443-exec-63] [req=6b6a49be-eaff-4a16-9626-2ecc58bfd97e, org=, csid=] INFO  org.candlepin.servlet.filter.logging.LoggingFilter - Request: verb=GET, uri=/candlepin/consumers/?owner=MyOrganization&include=uuid&per_page=1500&page=3

The problem occurs when a client updates consumer facts *between* the "get me next 1.5k consumers" requests - like the PUT requests above. This shuffles ordering of the consumers in a response, causing some UUID is skipped and some is present twice in the overall all_uuids call (we can easily demonstrate this).

Those skipped consumers are then wrongly marked as "partially missing subscription information".


Version-Release number of selected component (if applicable):
Sat 6.12.2 / any older as well


How reproducible:
very reliably


Steps to Reproduce:
1. Have a Satellite with >1.5k Content Hosts
2. invoke "foreman-rake katello:clean_backend_objects"
3. while it is running, run "subscription-manager facts --update" on a few hosts
4. check the rake script output


Actual results:
4. with some (high) probability, the rake script will wrongly detect a missing candlepin consumer


Expected results:
4. no such false alarms


Additional info:
We can easily demonstrate the "PUTing consumer facts shuffles consumers ordeing" on >10 hosts system as well. Just enable candlepin API from cmdline per https://access.redhat.com/solutions/2955931 , and replace "get me clients in 1.5k batches" requests by ".. in batches of, say, 7"

for i in $(seq 1 4); do curl -sk -H "Content-Type:application/json" -u admin:admin "https://localhost:23443/candlepin/consumers/?owner=YOUR_ORGANIZATION&include=uuid&per_page=7&page=${i}" | json_reformat; sleep 1; echo "uuid"; done | grep uuid

Normally, it will print something like:

        "uuid": "08513930-d0f7-4882-8129-77673dc721c2"
        "uuid": "cbc30fb8-9939-467d-9273-052eaf791eb4"
        "uuid": "7ba62210-3c30-4a4c-ae0a-fb31df3bc33d"
        "uuid": "f83e11cc-c0c3-479d-83ae-573fad2090bc"
        "uuid": "9c8bbc55-ac77-4e2c-853d-a2a6fc24862a"
        "uuid": "f72296a5-0b56-48b3-8712-f2ff4813e778"
        "uuid": "762a8c10-2793-489e-a53f-a3ffb2e3acb0"
uuid
        "uuid": "aac2882d-0bf6-4336-9d41-b30c94c452de"
        "uuid": "60787875-b416-4aaf-bf1e-bd365639f4a8"
        "uuid": "adb8a530-03a9-4c90-95b6-845b742adeb4"
        "uuid": "f8bffcba-cd62-4a0f-b86d-f02a2f27e14f"
        "uuid": "b8787ab1-58aa-4a14-b1f6-68eae436954e"
        "uuid": "8f7d4e4e-6b7f-4498-9019-30dad9b8a62a"
        "uuid": "f8677b63-d1b3-4495-a86e-00d415995bfd"
uuid
        "uuid": "62d3e041-00c2-4adf-8135-055815954203"
        "uuid": "f72296a5-0b56-48b3-8712-f2ff4813e778"
        "uuid": "0e7893a7-fc9c-4b5b-97d7-b44c8b120071"
        "uuid": "8f7d4e4e-6b7f-4498-9019-30dad9b8a62a"
        "uuid": "663af0ad-4206-4294-a947-83fd5cfa612d"
        "uuid": "adb8a530-03a9-4c90-95b6-845b742adeb4"
        "uuid": "f8677b63-d1b3-4495-a86e-00d415995bfd"
uuid
        "uuid": "7ba62210-3c30-4a4c-ae0a-fb31df3bc33d"
        "uuid": "b8787ab1-58aa-4a14-b1f6-68eae436954e"
        "uuid": "aac2882d-0bf6-4336-9d41-b30c94c452de"
        "uuid": "82480bc1-3737-429c-8947-d1de745e9e27"
        "uuid": "762a8c10-2793-489e-a53f-a3ffb2e3acb0"
uuid

Every time the same sequence. Now, *during* executing the script (play with the sleep time there), run "subscription-manager facts --update" on a client. The output will be e.g.:
        "uuid": "08513930-d0f7-4882-8129-77673dc721c2"
        "uuid": "cbc30fb8-9939-467d-9273-052eaf791eb4"
        "uuid": "7ba62210-3c30-4a4c-ae0a-fb31df3bc33d"
        "uuid": "f83e11cc-c0c3-479d-83ae-573fad2090bc"
        "uuid": "9c8bbc55-ac77-4e2c-853d-a2a6fc24862a"
        "uuid": "f72296a5-0b56-48b3-8712-f2ff4813e778"
        "uuid": "762a8c10-2793-489e-a53f-a3ffb2e3acb0"
uuid
        "uuid": "aac2882d-0bf6-4336-9d41-b30c94c452de"
        "uuid": "60787875-b416-4aaf-bf1e-bd365639f4a8"
        "uuid": "adb8a530-03a9-4c90-95b6-845b742adeb4"
        "uuid": "f8bffcba-cd62-4a0f-b86d-f02a2f27e14f"
        "uuid": "b8787ab1-58aa-4a14-b1f6-68eae436954e"
        "uuid": "8f7d4e4e-6b7f-4498-9019-30dad9b8a62a"
        "uuid": "f8677b63-d1b3-4495-a86e-00d415995bfd"
uuid
        "uuid": "f83e11cc-c0c3-479d-83ae-573fad2090bc"
        "uuid": "f72296a5-0b56-48b3-8712-f2ff4813e778"
        "uuid": "0e7893a7-fc9c-4b5b-97d7-b44c8b120071"
        "uuid": "8f7d4e4e-6b7f-4498-9019-30dad9b8a62a"
        "uuid": "663af0ad-4206-4294-a947-83fd5cfa612d"
        "uuid": "adb8a530-03a9-4c90-95b6-845b742adeb4"
        "uuid": "f8677b63-d1b3-4495-a86e-00d415995bfd"
uuid
        "uuid": "7ba62210-3c30-4a4c-ae0a-fb31df3bc33d"
        "uuid": "b8787ab1-58aa-4a14-b1f6-68eae436954e"
        "uuid": "aac2882d-0bf6-4336-9d41-b30c94c452de"
        "uuid": "82480bc1-3737-429c-8947-d1de745e9e27"
        "uuid": "762a8c10-2793-489e-a53f-a3ffb2e3acb0"
uuid

See the 62d3e041-.. is replaced by another f83e11cc-.., so the rake script won't get UUID 62d3e041 any more, during this run.

Comment 8 Pavel Moravec 2024-03-27 09:01:26 UTC
Hi Michael and Jeremy,
rather double-check confirmation: The code fix in ':sort_by => "uuid"' will help also when virt-who is updating it mapping between the paginated "get me consumers" requests, am I right?

(we had a very unlucky customer where 6.12->6.13 upgrade run automatic clean_backend_objects script with COMMIT=true, and at exactly that "bad" time, virt-who sent its mapping - this ended up in 600 Hosts unregistered during the upgrade)


Also, when considering impacts of the sorting; shall not it be rather per `created` timestamp (ascending ordering)? What if a new Host is registered just during the clean_backend_objects execution - its new uuid can break the linearity/ordering of "get me consumers per pages" responses..

Comment 9 Jeremy Lenz 2024-03-27 13:13:46 UTC
> shall not it be rather per `created` timestamp (ascending ordering)? What if a new Host is registered just during the clean_backend_objects execution - its new uuid can break the linearity/ordering of "get me consumers per pages" responses..

Hey Pavel

I thought about this as well. If we still have problems we can change it again to sort by created, but the reason we didn't do this from the start is that the "all_uuids" method we altered is currently restricted to only the "uuid" field and does not include the "created" field in the response.

This change should help when more consumers are added during the rake task, no matter if they're added by virt-who mapping or another method.

Comment 10 myoder 2024-03-27 16:07:40 UTC
Hi Pavel/Jeremy,

The issue I discovered was because candlepin was duplicating ids, instead of getting a unique list.  It happened with an organization with 6,065 host ids.  Candlepin was returning 6,065 host uuids, but they were not all unique, it was getting duplicate hosts in its output.  So really candlepin had roughly 5,000 unique ids, while katello had 6,065 unique ids.  Which means they had a different count, so hosts got removed.

I don't think there is any linearity/ordering issue.  The issue was just about a unique list of hosts ids.  So if one is added, and is out of order, shouldn't matter (unless katello doesn't have that id in its list).  I could have missed something, but that was my understanding of the code.

Kind regards,

Comment 15 errata-xmlrpc 2024-04-23 17:14:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Satellite 6.15.0 release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:2010


Note You need to log in before you can comment on or make changes to this bug.