Description of problem: There are 2 issues here. - When many entitlements are attached to the environment and many new content_ids are added, Candlepin will take very long time (45mins+) to run the "RegenEnvEntitlementCertsJob". - "RegenEnvEntitlementCertsJob" might fail to process if 1 or more affected entitlements got revoked, such as guests migration will trigger entitlement revocation. Steps to Reproduce: 1. Create a new content view. Attach 1 repo to it and publish the 1.0 version. 2. Register 1000 hosts to the content view. Attach 10 or more subscriptions to the hosts. You can create multiple custom products to attach but 1 subscription must be the virtual subscription (guest of <hypervisor> pool). 3. Attach 3 or more new repositories (same product. For example rhel7 server, optional, extras, satellite tools etc) to the content view and publish the 2.0 version. 4. Tail the /var/log/candlepin/candlepin.log you should see the following: [thread=Thread-502 (ActiveMQ-client-global-threads)] [job=xxxxx, job_key=RegenEnvEntitlementCertsJob, org=redhat, csid=] INFO org.candlepin.async.JobManager - Starting job "Regenerate Environment Entitlement Certificates" using class: org.candlepin.async.tasks.RegenEnvEntitlementCertsJob [thread=Thread-502 (ActiveMQ-client-global-threads)] [job=xxxxx, job_key=RegenEnvEntitlementCertsJob, org=redhat, csid=] INFO org.candlepin.controller.EntitlementCertificateGenerator - Regenerating relevant certificates in environment: xxxxxxxxxxxxxxxxxxxxx 5. Wait for about 2 to 3 minutes, then you can use the virt-who fake report to simulate the VM migrations. 6. Then run the following curl command to simulate the client checkin. curl -v -k -u <admin>:<pass> https://satellite.example.com/rhsm/consumers/<subscription uuid of the migrated VM>/certificates/serials 7. You should see the following entitlement revocation and auto healing in candlepin.log: [req=xxxxxxxx, org=, csid=] INFO org.candlepin.common.filter.LoggingFilter - Request: verb=GET, uri=/candlepin/consumers/<uuid>/certificates/serials [req=xxxxxxxx, org=my_org, csid=] INFO org.candlepin.controller.CandlepinPoolManager - Batch revoking 1 entitlements [req=xxxxxxxx, org=my_org, csid=] INFO org.candlepin.controller.CandlepinPoolManager - Starting batch delete of pools [req=xxxxxxxx, org=my_org, csid=] INFO org.candlepin.controller.CandlepinPoolManager - Starting batch delete of entitlements [req=xxxxxxxx, org=my_org, csid=] INFO org.candlepin.controller.CandlepinPoolManager - Starting delete flush [req=xxxxxxxx, org=my_org, csid=] INFO org.candlepin.controller.CandlepinPoolManager - All deletes flushed successfully [req=xxxxxxxx, org=my_org, csid=] INFO org.candlepin.controller.CandlepinPoolManager - Recomputing status for 1 consumers. [req=xxxxxxxx, org=my_org, csid=] INFO org.candlepin.controller.CandlepinPoolManager - All statuses recomputed. [req=xxxxxxxx, org=my_org, csid=] INFO org.candlepin.controller.Entitler - Attempting to heal host machine with UUID "<uuid>" for guest with UUID "<uuid>" [req=xxxxxxxx, org=redhat, csid=] INFO org.candlepin.policy.js.autobind.AutobindRules - Rules did not select a pool for products: [] and consumer installed products: [] <snip> 8. If you don't want to do step 6 and 7, I think you can also try to remove the affected entitlements manually from the VMs via Satellite web ui. 9. After awhile, the RegenEnvEntitlementCertsJob will fail wit the following error: [thread=QuartzScheduler_Worker-1] [job=regen_entitlement_cert_of_envXXXX-XXXX-XXXX-XXXX-XXXXXXXX, org=, csid=] INFO org.candlepin.controller.EntitlementCertificateGenerator - Found 1000 certificates to regenerate. [thread=QuartzScheduler_Worker-1] [job=regen_entitlement_cert_of_envXXXX-XXXX-XXXX-XXXX-XXXXXXXX, org=, csid=] ERROR org.hibernate.internal.ExceptionMapperStandardImpl - HHH000346: Error during managed flush [Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect) : [org.candlepin.model.Entitlement#XXXXXXXXXXXXXXXXXXXXXXXXX]] <================ ... [thread=QuartzScheduler_Worker-1] [job=regen_entitlement_cert_of_envXXXX-XXXX-XXXX-XXXX-XXXXXXXX, org=, csid=] ERROR org.candlepin.pinsetter.tasks.KingpinJob - Job: org.candlepin.pinsetter.tasks.RegenEnvEntitlementCertsJob encountered a problem. ... [thread=QuartzScheduler_Worker-1] [job=regen_entitlement_cert_of_envXXXX-XXXX-XXXX-XXXX-XXXXXXXX, org=, csid=] INFO org.candlepin.pinsetter.tasks.KingpinJob - Job completed: time=2748452 <=========== 45 minutes Actual results: Failed to mark entitlement as dirty. Clients are unable to see and enable new repositories. Expected results: No error. Clients can see and enable new repositories. Additional infos: In my opinion, slowness is caused by the following reasons: - Large number of entitlments are attached to the environments. For example each host is attaching 10+ entitlments - Adding multiple new contents/repositories to the content views. # src/main/java/org/candlepin/controller/EntitlementCertificateGenerator.java public void regenerateCertificatesOf(String environmentId, Collection<String> contentIds, boolean lazy) { log.info("Regenerating relevant certificates in environment: {}", environmentId); Set<Entitlement> entsToRegen = new HashSet<>(); entLoop: for (Entitlement entitlement : this.entitlementCurator.listByEnvironment(environmentId)) { <======= // Impl note: // Since the entitlements came from the DB, we should be safe to traverse the graph as // necessary without any sanity checks (so long as our model's restrictions aren't // broken). for (String contentId : contentIds) { <======== Each entitlement needs to loop multiple times here doesn't seem to be efficient if (entitlement.getPool().getProduct().hasContent(contentId)) { entsToRegen.add(entitlement); continue entLoop; } Collection<Product> providedProducts = entitlement.getPool().getProduct() .getProvidedProducts(); for (Product provided : providedProducts) { if (provided.hasContent(contentId)) { entsToRegen.add(entitlement); continue entLoop; } } } }
*** Bug 2026504 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Satellite 6.11 Release), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5498