Bug 1972501

Summary: After promoting the content view, Candlepin failed to mark the entitlement certificates as dirty
Product: Red Hat Satellite Reporter: Hao Chang Yu <hyu>
Component: CandlepinAssignee: satellite6-bugs <satellite6-bugs>
Status: CLOSED ERRATA QA Contact: Imaan <ikaur>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.9.0CC: ahumbe, jhutar, juwatts, lvrtelov, nmoumoul, pcreech, pmendezh, redakkan, satellite6-bugs, saydas, tasander, wpoteat
Target Milestone: 6.11.0Keywords: Performance, Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: candlepin-3.1.28-2, candlepin-4.1.8-1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1973257 2016418 (view as bug list) Environment:
Last Closed: 2022-07-05 14:29:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1973257, 2016418    
Bug Blocks:    

Description Hao Chang Yu 2021-06-16 04:52:11 UTC
Description of problem:
There are 2 issues here.
- When many entitlements are attached to the environment and many new content_ids are added, Candlepin will take very long time (45mins+) to run the "RegenEnvEntitlementCertsJob".
- "RegenEnvEntitlementCertsJob" might fail to process if 1 or more affected entitlements got revoked, such as guests migration will trigger entitlement revocation.

Steps to Reproduce:
1. Create a new content view. Attach 1 repo to it and publish the 1.0 version.
2. Register 1000 hosts to the content view. Attach 10 or more subscriptions to the hosts. You can create multiple custom products to attach but 1 subscription must be the virtual subscription (guest of <hypervisor> pool).
3. Attach 3 or more new repositories (same product. For example rhel7 server, optional, extras, satellite tools etc) to the content view and publish the 2.0 version.
4. Tail the /var/log/candlepin/candlepin.log you should see the following:

[thread=Thread-502 (ActiveMQ-client-global-threads)] [job=xxxxx, job_key=RegenEnvEntitlementCertsJob, org=redhat, csid=] INFO  org.candlepin.async.JobManager - Starting job "Regenerate Environment Entitlement Certificates" using class: org.candlepin.async.tasks.RegenEnvEntitlementCertsJob
[thread=Thread-502 (ActiveMQ-client-global-threads)] [job=xxxxx, job_key=RegenEnvEntitlementCertsJob, org=redhat, csid=] INFO  org.candlepin.controller.EntitlementCertificateGenerator - Regenerating relevant certificates in environment: xxxxxxxxxxxxxxxxxxxxx

5. Wait for about 2 to 3 minutes, then you can use the virt-who fake report to simulate the VM migrations.
6. Then run the following curl command to simulate the client checkin.

curl -v -k -u <admin>:<pass> https://satellite.example.com/rhsm/consumers/<subscription uuid of the migrated VM>/certificates/serials

7. You should see the following entitlement revocation and auto healing in candlepin.log:

[req=xxxxxxxx, org=, csid=] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=GET, uri=/candlepin/consumers/<uuid>/certificates/serials
[req=xxxxxxxx, org=my_org, csid=] INFO  org.candlepin.controller.CandlepinPoolManager - Batch revoking 1 entitlements
[req=xxxxxxxx, org=my_org, csid=] INFO  org.candlepin.controller.CandlepinPoolManager - Starting batch delete of pools
[req=xxxxxxxx, org=my_org, csid=] INFO  org.candlepin.controller.CandlepinPoolManager - Starting batch delete of entitlements
[req=xxxxxxxx, org=my_org, csid=] INFO  org.candlepin.controller.CandlepinPoolManager - Starting delete flush
[req=xxxxxxxx, org=my_org, csid=] INFO  org.candlepin.controller.CandlepinPoolManager - All deletes flushed successfully
[req=xxxxxxxx, org=my_org, csid=] INFO  org.candlepin.controller.CandlepinPoolManager - Recomputing status for 1 consumers.
[req=xxxxxxxx, org=my_org, csid=] INFO  org.candlepin.controller.CandlepinPoolManager - All statuses recomputed.
[req=xxxxxxxx, org=my_org, csid=] INFO  org.candlepin.controller.Entitler - Attempting to heal host machine with UUID "<uuid>" for guest with UUID "<uuid>"
[req=xxxxxxxx, org=redhat, csid=] INFO  org.candlepin.policy.js.autobind.AutobindRules - Rules did not select a pool for products: [] and consumer installed products: []
<snip>

8. If you don't want to do step 6 and 7, I think you can also try to remove the affected entitlements manually from the VMs via Satellite web ui.
9. After awhile, the RegenEnvEntitlementCertsJob will fail wit the following error:

[thread=QuartzScheduler_Worker-1] [job=regen_entitlement_cert_of_envXXXX-XXXX-XXXX-XXXX-XXXXXXXX, org=, csid=] INFO  org.candlepin.controller.EntitlementCertificateGenerator - Found 1000 certificates to regenerate.
[thread=QuartzScheduler_Worker-1] [job=regen_entitlement_cert_of_envXXXX-XXXX-XXXX-XXXX-XXXXXXXX, org=, csid=] ERROR org.hibernate.internal.ExceptionMapperStandardImpl - HHH000346: Error during managed flush [Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect) : [org.candlepin.model.Entitlement#XXXXXXXXXXXXXXXXXXXXXXXXX]]  <================
...
[thread=QuartzScheduler_Worker-1] [job=regen_entitlement_cert_of_envXXXX-XXXX-XXXX-XXXX-XXXXXXXX, org=, csid=] ERROR org.candlepin.pinsetter.tasks.KingpinJob - Job: org.candlepin.pinsetter.tasks.RegenEnvEntitlementCertsJob encountered a problem.
...
[thread=QuartzScheduler_Worker-1] [job=regen_entitlement_cert_of_envXXXX-XXXX-XXXX-XXXX-XXXXXXXX, org=, csid=] INFO  org.candlepin.pinsetter.tasks.KingpinJob - Job completed: time=2748452  <=========== 45 minutes


Actual results:
Failed to mark entitlement as dirty. Clients are unable to see and enable new repositories.


Expected results:
No error. Clients can see and enable new repositories.

Additional infos:
In my opinion, slowness is caused by the following reasons:
- Large number of entitlments are attached to the environments. For example each host is attaching 10+ entitlments
- Adding multiple new contents/repositories to the content views.

# src/main/java/org/candlepin/controller/EntitlementCertificateGenerator.java
    public void regenerateCertificatesOf(String environmentId, Collection<String> contentIds,
        boolean lazy) {

        log.info("Regenerating relevant certificates in environment: {}", environmentId);

        Set<Entitlement> entsToRegen = new HashSet<>();

        entLoop: for (Entitlement entitlement : this.entitlementCurator.listByEnvironment(environmentId)) { <=======
            // Impl note:
            // Since the entitlements came from the DB, we should be safe to traverse the graph as
            // necessary without any sanity checks (so long as our model's restrictions aren't
            // broken).

            for (String contentId : contentIds) {   <======== Each entitlement needs to loop multiple times here doesn't seem to be efficient
                if (entitlement.getPool().getProduct().hasContent(contentId)) {
                    entsToRegen.add(entitlement);
                    continue entLoop;
                }
                Collection<Product> providedProducts = entitlement.getPool().getProduct()
                    .getProvidedProducts();
                for (Product provided : providedProducts) {
                    if (provided.hasContent(contentId)) {
                        entsToRegen.add(entitlement);
                        continue entLoop;
                    }
                }
            }
        }

Comment 3 Nikos Moumoulidis 2021-12-15 08:44:13 UTC
*** Bug 2026504 has been marked as a duplicate of this bug. ***

Comment 14 errata-xmlrpc 2022-07-05 14:29:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Satellite 6.11 Release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5498