Bug 1247242
Summary: | Intermittent "not enough free subscriptions available for the product" | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Satellite | Reporter: | Jesus M. Rodriguez <jesusr> | ||||||||||||
Component: | Candlepin | Assignee: | Chris "Ceiu" Rog <crog> | ||||||||||||
Status: | CLOSED ERRATA | QA Contact: | Lukas Pramuk <lpramuk> | ||||||||||||
Severity: | high | Docs Contact: | |||||||||||||
Priority: | unspecified | ||||||||||||||
Version: | 6.1.0 | CC: | bbuckingham, bkearney, crog, cwelton, dgoodwin, egolov, lpramuk, mmccune | ||||||||||||
Target Milestone: | Unspecified | Keywords: | Triaged | ||||||||||||
Target Release: | Unused | ||||||||||||||
Hardware: | Unspecified | ||||||||||||||
OS: | Unspecified | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | Environment: | ||||||||||||||
Last Closed: | 2015-09-15 07:22:23 UTC | Type: | Bug | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Attachments: |
|
Created attachment 1056663 [details]
subscriptions page in satellite
Created attachment 1056664 [details]
log files from first occurrence of bug
Created attachment 1056666 [details]
logs from latest run (2015-07-23)
Created attachment 1056818 [details]
2 requests coming in at same time
In this threads.log we can see there are two requests:
req=6921f5d4-8117-439c-9135-1518792a3758
req=a537641a-7ecc-4fb2-b80b-73b1ed9913f7
We see 6921f5d4 come in at 15:57:50,336. While a537641a comes in 2 seconds later 15:57:52,033. The first request 6921f5d4 locks pool ff8080814ebbf12e014ebc0f2ad501eb at 15:57:53,985 then request a537641a also locks the same pool at 15:57:53,989.
So we have 2 requests both locking the same pool. The second request gets the lock and is granted the entitlement at 15:57:55,768.
Request 6921f5d4, after blocking, then fails at 5:57:55,874 when it tries to consume the same pool which has been exhausted at this point. That's 106 ms after the other request was granted the entitlement.
The addOrUpdateEntitlement of CandlepinPoolManager locks the pool. It blocks then proceeds. I believe that once a thread has consumed the pool, this thread will fail preEntitlement check. I think the CALLER of this method should handle the EntitlementRefusedException and try to pick the NEXT entitlement available in the list. This is under a AUTOBIND scenario with an ActivationKey. private Entitlement addOrUpdateEntitlement(Consumer consumer, Pool pool, Entitlement entitlement, Integer quantity, boolean generateUeberCert, CallerType caller) throws EntitlementRefusedException { // Because there are several paths to this one place where entitlements // are granted, we cannot be positive the caller obtained a lock on the // pool // when it was read. As such we're going to reload it with a lock // before starting this process. log.info("Locking pool: {}", pool.getId()); pool = poolCurator.lockAndLoad(pool); if (quantity > 0) { log.info("Running pre-entitlement rules."); // XXX preEntitlement is run twice for new entitlement creation ValidationResult result = enforcer.preEntitlement(consumer, pool, quantity, caller); if (!result.isSuccessful()) { log.warn("Entitlement not granted: {}", result.getErrors().toString()); throw new EntitlementRefusedException(result); } } commit 8ec5a2e9f9fc36a6edea99884a377afde22370c4 Author: Chris Rog <crog> Date: Wed Jul 29 15:03:37 2015 -0400 1247242: CPM will attempt to retry entitlements under certain conditions - When the CandlepinPoolManager triggers an EntitlementRefusedException while entitling by products due to entitlements not being available, it will retry the operation. This works around a synchronization issue that can occur when a pool is selected for consumption, but is locked and consumed by another request/thread before it can be consumed. Chris' fix has been backported to the 0.9.49-HOTFIX branch and built as: candlepin-0.9.49.6-1 This is a hotfix branch for Satellite 6.1. tagged in 9.49.6 If RHCI indicates this scenario is no longer effecting them, move to VERIFIED. REPRODUCED. candlepin-0.9.49.3-1.el7.noarch candlepin-selinux-0.9.49.3-1.el7.noarch candlepin-tomcat-0.9.49.3-1.el7.noarch 1. Create a manifest containing specific subscription by multiple amounts of 1 (errors wouldn't be intermittent then) # hammer subscription list --organization-id 1 ----------------|----------|---------|--------------|-----------|----------|------------|----------------------------------|-----------------|-----------|--------- NAME | CONTRACT | ACCOUNT | SUPPORT | QUANTITY | CONSUMED | END DATE | ID | PRODUCT | QUANTITY | ATTACHED ----------------|----------|---------|--------------|-----------|----------|------------|----------------------------------|-----------------|-----------|--------- Employee SKU | 10169793 | 477931 | Self-Support | 1 | 0 | 2022-01-01 | 8a90c6bc4fb24dbf014fbcf906df0bd7 | Employee SKU | 1 | 0 Employee SKU | 10169793 | 477931 | Self-Support | 1 | 0 | 2022-01-01 | 8a90c6bc4fb24dbf014fbcf904670aa3 | Employee SKU | 1 | 0 Employee SKU | 10169793 | 477931 | Self-Support | 1 | 0 | 2022-01-01 | 8a90c6bc4fb24dbf014fbcf901fe096f | Employee SKU | 1 | 0 Employee SKU | 10169793 | 477931 | Self-Support | 1 | 0 | 2022-01-01 | 8a90c6bc4fb24dbf014fbcf909210d0b | Employee SKU | 1 | 0 Employee SKU | 10169793 | 477931 | Self-Support | 1 | 0 | 2022-01-01 | 8a90c6bc4fb24dbf014fbcf90b5c0e3f | Employee SKU | 1 | 0 ----------------|----------|---------|--------------|-----------|----------|------------|----------------------------------|-----------------|-----------|--------- 2. Sync rh repos, create and publish CVs, create related AK 3. Register 2 content host using this activationkey @CHOSTs (simultaneously): # subscription-manager register --org=Default_Organization --activationkey=ak_rhel7 4. Check registration results, logs... @GUEST1: # subscription-manager list +-------------------------------------------+ Installed Product Status +-------------------------------------------+ Product Name: Red Hat Enterprise Linux Server Product ID: 69 Version: 7.1 Arch: x86_64 Status: Subscribed Status Details: Starts: 04/24/2013 Ends: 01/01/2022 @GUEST2: # subscription-manager list +-------------------------------------------+ Installed Product Status +-------------------------------------------+ Product Name: Red Hat Enterprise Linux Server Product ID: 69 Version: 7.1 Arch: x86_64 Status: Unknown Status Details: Starts: Ends: @SAT: # tail -f /var/log/candlepin/candlepin.log ... 2015-09-14 05:17:34,546 [req=dc46c9d9-df7c-47a8-95c7-931faa7c326c, org=] INFO org.candlepin.controller.CandlepinPoolManager - Running pre-entitlement rules. 2015-09-14 05:17:34,570 [req=dc46c9d9-df7c-47a8-95c7-931faa7c326c, org=] WARN org.candlepin.controller.CandlepinPoolManager - Entitlement not granted: [rulefailed.no.entitlements.available] 2015-09-14 05:17:34,584 [req=dc46c9d9-df7c-47a8-95c7-931faa7c326c, org=] INFO org.candlepin.common.filter.LoggingFilter - Response: status=403, content-type="application/json", time=27491 ... @GUEST2: #tail -f /var/log/rhsm/rhsm.log ... 2015-09-14 09:17:05,867 [DEBUG] subscription-manager @connection.py:420 - Loaded CA certificates from /etc/rhsm/ca/: candlepin-stage.pem, redhat-uep.pem, katello-default-ca.pem, katello-server-ca.pem 2015-09-14 09:17:05,868 [DEBUG] subscription-manager @connection.py:469 - Making request: POST /rhsm/consumers?owner=Default_Organization&activation_keys=ak_rhel7 2015-09-14 09:17:34,843 [DEBUG] subscription-manager @connection.py:492 - Response: status=500 2015-09-14 09:17:34,844 [ERROR] subscription-manager @managercli.py:1069 - There are not enough free subscriptions available for the product '69' Traceback (most recent call last): File "/usr/share/rhsm/subscription_manager/managercli.py", line 1064, in _do_command content_tags=self.installed_mgr.tags) File "/usr/lib64/python2.7/site-packages/rhsm/connection.py", line 765, in registerConsumer return self.conn.request_post(url, params) File "/usr/lib64/python2.7/site-packages/rhsm/connection.py", line 576, in request_post return self._request("POST", method, params) File "/usr/lib64/python2.7/site-packages/rhsm/connection.py", line 501, in _request self.validateResponse(result, request_type, handler) File "/usr/lib64/python2.7/site-packages/rhsm/connection.py", line 543, in validateResponse raise RestlibException(response['status'], error_msg) RestlibException: There are not enough free subscriptions available for the product '69' ... I mean by GUEST = CHOST (content host) VERIFIED. @Sat6.1.2-SNAP1-Compose3 candlepin-0.9.49.6-1.el7.noarch candlepin-selinux-0.9.49.6-1.el7.noarch candlepin-tomcat-0.9.49.6-1.el7.noarch Verified by manual reproducer described in comment #13 >>> both machines are registered, no errors in the logs Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2015:1786 |
Created attachment 1056662 [details] activation key screen in satellite Description of problem: There are not enough free subscriptions available for the product '69' - seems to happen intermittently if your submgr app has multiple smaller subs added, in this case 10, 4, 1, 1, 1 for a total of 17 subs....looks like sometimes a group of '1' is used and 2nd host to provision fails. Version-Release number of selected component (if applicable): Satellite 6.1 candlepin-scl-quartz-2.1.5-6.el7.noarch candlepin-tomcat-0.9.49.3-1.el7.noarch candlepin-guice-3.0-2_redhat_1.el7.noarch candlepin-scl-runtime-1-5.el7.noarch candlepin-scl-1-5.el7.noarch candlepin-selinux-0.9.49.3-1.el7.noarch candlepin-common-1.0.22-1.el7.noarch candlepin-0.9.49.3-1.el7.noarch candlepin-scl-rhino-1.7R3-3.el7.noarch katello-default-ca-1.0-1.noarch katello-debug-2.2.0.11-1.el7sat.noarch katello-2.2.0.11-1.el7sat.noarch katello-certs-tools-2.2.1-1.el7sat.noarch katello-installer-base-2.3.12-1.el7sat.noarch pulp-katello-0.5-1.el7sat.noarch ruby193-rubygem-katello-2.2.0.51-1.el7sat.noarch katello-service-2.2.0.11-1.el7sat.noarch katello-installer-2.3.12-1.el7sat.noarch katello-server-ca-1.0-1.noarch rubygem-hammer_cli_katello-0.0.7.15-1.el7sat.noarch katello-common-2.2.0.11-1.el7sat.noarch How reproducible: Intermittently during a deployment of RHEV with 2 hypervisors from within RHCI. Steps to Reproduce: 1. In RHCI, Select RHEV and Cloudforms for deployment 2. Give deployment 2 hosts for hypervisors, 1 for engine. 3. Fails to deploy second hypervisor host with above error. Expected results: Additional info: