Bug 1972501 - After promoting the content view, Candlepin failed to mark the entitlement certificates as dirty
Summary: After promoting the content view, Candlepin failed to mark the entitlement ce...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Candlepin
Version: 6.9.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: 6.11.0
Assignee: satellite6-bugs
QA Contact: Imaan
URL:
Whiteboard:
: 2026504 (view as bug list)
Depends On: 1973257 2016418
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-16 04:52 UTC by Hao Chang Yu
Modified: 2022-07-09 06:53 UTC (History)
12 users (show)

Fixed In Version: candlepin-3.1.28-2, candlepin-4.1.8-1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1973257 2016418 (view as bug list)
Environment:
Last Closed: 2022-07-05 14:29:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 6956231 0 None None None 2022-07-09 06:53:37 UTC
Red Hat Product Errata RHSA-2022:5498 0 None None None 2022-07-05 14:29:45 UTC

Internal Links: 2059660

Description Hao Chang Yu 2021-06-16 04:52:11 UTC
Description of problem:
There are 2 issues here.
- When many entitlements are attached to the environment and many new content_ids are added, Candlepin will take very long time (45mins+) to run the "RegenEnvEntitlementCertsJob".
- "RegenEnvEntitlementCertsJob" might fail to process if 1 or more affected entitlements got revoked, such as guests migration will trigger entitlement revocation.

Steps to Reproduce:
1. Create a new content view. Attach 1 repo to it and publish the 1.0 version.
2. Register 1000 hosts to the content view. Attach 10 or more subscriptions to the hosts. You can create multiple custom products to attach but 1 subscription must be the virtual subscription (guest of <hypervisor> pool).
3. Attach 3 or more new repositories (same product. For example rhel7 server, optional, extras, satellite tools etc) to the content view and publish the 2.0 version.
4. Tail the /var/log/candlepin/candlepin.log you should see the following:

[thread=Thread-502 (ActiveMQ-client-global-threads)] [job=xxxxx, job_key=RegenEnvEntitlementCertsJob, org=redhat, csid=] INFO  org.candlepin.async.JobManager - Starting job "Regenerate Environment Entitlement Certificates" using class: org.candlepin.async.tasks.RegenEnvEntitlementCertsJob
[thread=Thread-502 (ActiveMQ-client-global-threads)] [job=xxxxx, job_key=RegenEnvEntitlementCertsJob, org=redhat, csid=] INFO  org.candlepin.controller.EntitlementCertificateGenerator - Regenerating relevant certificates in environment: xxxxxxxxxxxxxxxxxxxxx

5. Wait for about 2 to 3 minutes, then you can use the virt-who fake report to simulate the VM migrations.
6. Then run the following curl command to simulate the client checkin.

curl -v -k -u <admin>:<pass> https://satellite.example.com/rhsm/consumers/<subscription uuid of the migrated VM>/certificates/serials

7. You should see the following entitlement revocation and auto healing in candlepin.log:

[req=xxxxxxxx, org=, csid=] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=GET, uri=/candlepin/consumers/<uuid>/certificates/serials
[req=xxxxxxxx, org=my_org, csid=] INFO  org.candlepin.controller.CandlepinPoolManager - Batch revoking 1 entitlements
[req=xxxxxxxx, org=my_org, csid=] INFO  org.candlepin.controller.CandlepinPoolManager - Starting batch delete of pools
[req=xxxxxxxx, org=my_org, csid=] INFO  org.candlepin.controller.CandlepinPoolManager - Starting batch delete of entitlements
[req=xxxxxxxx, org=my_org, csid=] INFO  org.candlepin.controller.CandlepinPoolManager - Starting delete flush
[req=xxxxxxxx, org=my_org, csid=] INFO  org.candlepin.controller.CandlepinPoolManager - All deletes flushed successfully
[req=xxxxxxxx, org=my_org, csid=] INFO  org.candlepin.controller.CandlepinPoolManager - Recomputing status for 1 consumers.
[req=xxxxxxxx, org=my_org, csid=] INFO  org.candlepin.controller.CandlepinPoolManager - All statuses recomputed.
[req=xxxxxxxx, org=my_org, csid=] INFO  org.candlepin.controller.Entitler - Attempting to heal host machine with UUID "<uuid>" for guest with UUID "<uuid>"
[req=xxxxxxxx, org=redhat, csid=] INFO  org.candlepin.policy.js.autobind.AutobindRules - Rules did not select a pool for products: [] and consumer installed products: []
<snip>

8. If you don't want to do step 6 and 7, I think you can also try to remove the affected entitlements manually from the VMs via Satellite web ui.
9. After awhile, the RegenEnvEntitlementCertsJob will fail wit the following error:

[thread=QuartzScheduler_Worker-1] [job=regen_entitlement_cert_of_envXXXX-XXXX-XXXX-XXXX-XXXXXXXX, org=, csid=] INFO  org.candlepin.controller.EntitlementCertificateGenerator - Found 1000 certificates to regenerate.
[thread=QuartzScheduler_Worker-1] [job=regen_entitlement_cert_of_envXXXX-XXXX-XXXX-XXXX-XXXXXXXX, org=, csid=] ERROR org.hibernate.internal.ExceptionMapperStandardImpl - HHH000346: Error during managed flush [Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect) : [org.candlepin.model.Entitlement#XXXXXXXXXXXXXXXXXXXXXXXXX]]  <================
...
[thread=QuartzScheduler_Worker-1] [job=regen_entitlement_cert_of_envXXXX-XXXX-XXXX-XXXX-XXXXXXXX, org=, csid=] ERROR org.candlepin.pinsetter.tasks.KingpinJob - Job: org.candlepin.pinsetter.tasks.RegenEnvEntitlementCertsJob encountered a problem.
...
[thread=QuartzScheduler_Worker-1] [job=regen_entitlement_cert_of_envXXXX-XXXX-XXXX-XXXX-XXXXXXXX, org=, csid=] INFO  org.candlepin.pinsetter.tasks.KingpinJob - Job completed: time=2748452  <=========== 45 minutes


Actual results:
Failed to mark entitlement as dirty. Clients are unable to see and enable new repositories.


Expected results:
No error. Clients can see and enable new repositories.

Additional infos:
In my opinion, slowness is caused by the following reasons:
- Large number of entitlments are attached to the environments. For example each host is attaching 10+ entitlments
- Adding multiple new contents/repositories to the content views.

# src/main/java/org/candlepin/controller/EntitlementCertificateGenerator.java
    public void regenerateCertificatesOf(String environmentId, Collection<String> contentIds,
        boolean lazy) {

        log.info("Regenerating relevant certificates in environment: {}", environmentId);

        Set<Entitlement> entsToRegen = new HashSet<>();

        entLoop: for (Entitlement entitlement : this.entitlementCurator.listByEnvironment(environmentId)) { <=======
            // Impl note:
            // Since the entitlements came from the DB, we should be safe to traverse the graph as
            // necessary without any sanity checks (so long as our model's restrictions aren't
            // broken).

            for (String contentId : contentIds) {   <======== Each entitlement needs to loop multiple times here doesn't seem to be efficient
                if (entitlement.getPool().getProduct().hasContent(contentId)) {
                    entsToRegen.add(entitlement);
                    continue entLoop;
                }
                Collection<Product> providedProducts = entitlement.getPool().getProduct()
                    .getProvidedProducts();
                for (Product provided : providedProducts) {
                    if (provided.hasContent(contentId)) {
                        entsToRegen.add(entitlement);
                        continue entLoop;
                    }
                }
            }
        }

Comment 3 Nikos Moumoulidis 2021-12-15 08:44:13 UTC
*** Bug 2026504 has been marked as a duplicate of this bug. ***

Comment 14 errata-xmlrpc 2022-07-05 14:29:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Satellite 6.11 Release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5498


Note You need to log in before you can comment on or make changes to this bug.