Bug 1247242

Summary: Intermittent "not enough free subscriptions available for the product"
Product: Red Hat Satellite Reporter: Jesus M. Rodriguez <jesusr>
Component: CandlepinAssignee: Chris "Ceiu" Rog <crog>
Status: CLOSED ERRATA QA Contact: Lukas Pramuk <lpramuk>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.1.0CC: bbuckingham, bkearney, crog, cwelton, dgoodwin, egolov, lpramuk, mmccune
Target Milestone: UnspecifiedKeywords: Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-09-15 07:22:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
activation key screen in satellite
none
subscriptions page in satellite
none
log files from first occurrence of bug
none
logs from latest run (2015-07-23)
none
2 requests coming in at same time none

Description Jesus M. Rodriguez 2015-07-27 15:44:48 UTC
Created attachment 1056662 [details]
activation key screen in satellite

Description of problem:
There are not enough free subscriptions available for the product '69' - seems to happen intermittently if your submgr app has multiple smaller subs added, in this case 10, 4, 1, 1, 1 for a total of 17 subs....looks like sometimes a group of '1' is used and 2nd host to provision fails.

Version-Release number of selected component (if applicable):
Satellite 6.1

candlepin-scl-quartz-2.1.5-6.el7.noarch
candlepin-tomcat-0.9.49.3-1.el7.noarch
candlepin-guice-3.0-2_redhat_1.el7.noarch
candlepin-scl-runtime-1-5.el7.noarch
candlepin-scl-1-5.el7.noarch
candlepin-selinux-0.9.49.3-1.el7.noarch
candlepin-common-1.0.22-1.el7.noarch
candlepin-0.9.49.3-1.el7.noarch
candlepin-scl-rhino-1.7R3-3.el7.noarch

katello-default-ca-1.0-1.noarch
katello-debug-2.2.0.11-1.el7sat.noarch
katello-2.2.0.11-1.el7sat.noarch
katello-certs-tools-2.2.1-1.el7sat.noarch
katello-installer-base-2.3.12-1.el7sat.noarch
pulp-katello-0.5-1.el7sat.noarch
ruby193-rubygem-katello-2.2.0.51-1.el7sat.noarch
katello-service-2.2.0.11-1.el7sat.noarch
katello-installer-2.3.12-1.el7sat.noarch
katello-server-ca-1.0-1.noarch
rubygem-hammer_cli_katello-0.0.7.15-1.el7sat.noarch
katello-common-2.2.0.11-1.el7sat.noarch


How reproducible:
Intermittently during a deployment of RHEV with 2 hypervisors from within RHCI.

Steps to Reproduce:
1. In RHCI, Select RHEV and Cloudforms for deployment
2. Give deployment 2 hosts for hypervisors, 1 for engine.
3. Fails to deploy second hypervisor host with above error.


Expected results:


Additional info:

Comment 1 Jesus M. Rodriguez 2015-07-27 15:45:26 UTC
Created attachment 1056663 [details]
subscriptions page in satellite

Comment 2 Jesus M. Rodriguez 2015-07-27 15:45:56 UTC
Created attachment 1056664 [details]
log files from first occurrence of bug

Comment 4 Jesus M. Rodriguez 2015-07-27 15:51:02 UTC
Created attachment 1056666 [details]
logs from latest run (2015-07-23)

Comment 5 Jesus M. Rodriguez 2015-07-27 23:39:02 UTC
Created attachment 1056818 [details]
2 requests coming in at same time

In this threads.log we can see there are two requests:

req=6921f5d4-8117-439c-9135-1518792a3758
req=a537641a-7ecc-4fb2-b80b-73b1ed9913f7

We see 6921f5d4 come in at 15:57:50,336. While a537641a comes in 2 seconds later 15:57:52,033. The first request 6921f5d4 locks pool ff8080814ebbf12e014ebc0f2ad501eb at 15:57:53,985 then request a537641a also locks the same pool at 15:57:53,989.

So we have 2 requests both locking the same pool. The second request gets the lock and is granted the entitlement at 15:57:55,768.

Request 6921f5d4, after blocking, then fails at 5:57:55,874 when it tries to consume the same pool which has been exhausted at this point. That's 106 ms after the other request was granted the entitlement.

Comment 6 Jesus M. Rodriguez 2015-07-27 23:42:20 UTC
The addOrUpdateEntitlement of CandlepinPoolManager locks the pool. It blocks then proceeds. I believe that once a thread has consumed the pool, this thread will fail preEntitlement check. I think the CALLER of this method should handle the EntitlementRefusedException and try to pick the NEXT entitlement available in the list. This is under a AUTOBIND scenario with an ActivationKey.

    private Entitlement addOrUpdateEntitlement(Consumer consumer, Pool pool,
        Entitlement entitlement, Integer quantity, boolean generateUeberCert,
        CallerType caller)
        throws EntitlementRefusedException {
        // Because there are several paths to this one place where entitlements
        // are granted, we cannot be positive the caller obtained a lock on the
        // pool
        // when it was read. As such we're going to reload it with a lock
        // before starting this process.
        log.info("Locking pool: {}", pool.getId());
        pool = poolCurator.lockAndLoad(pool);

        if (quantity > 0) {
            log.info("Running pre-entitlement rules.");
            // XXX preEntitlement is run twice for new entitlement creation
            ValidationResult result = enforcer.preEntitlement(consumer, pool, quantity, caller);

            if (!result.isSuccessful()) {
                log.warn("Entitlement not granted: {}", result.getErrors().toString());
                throw new EntitlementRefusedException(result);
            }
        }

Comment 8 Chris "Ceiu" Rog 2015-07-30 16:27:46 UTC
commit 8ec5a2e9f9fc36a6edea99884a377afde22370c4
Author: Chris Rog <crog>
Date:   Wed Jul 29 15:03:37 2015 -0400

    1247242: CPM will attempt to retry entitlements under certain conditions
    
    - When the CandlepinPoolManager triggers an EntitlementRefusedException while
      entitling by products due to entitlements not being available, it will
      retry the operation. This works around a synchronization issue that can
      occur when a pool is selected for consumption, but is locked and consumed
      by another request/thread before it can be consumed.

Comment 9 Devan Goodwin 2015-07-30 16:34:07 UTC
Chris' fix has been backported to the 0.9.49-HOTFIX branch and built as: candlepin-0.9.49.6-1

This is a hotfix branch for Satellite 6.1.

Comment 10 Mike McCune 2015-08-26 16:49:29 UTC
tagged in 9.49.6

Comment 12 Mike McCune 2015-09-14 05:00:09 UTC
If RHCI indicates this scenario is no longer effecting them, move to VERIFIED.

Comment 13 Lukas Pramuk 2015-09-14 10:15:32 UTC
REPRODUCED.
candlepin-0.9.49.3-1.el7.noarch
candlepin-selinux-0.9.49.3-1.el7.noarch
candlepin-tomcat-0.9.49.3-1.el7.noarch

1. Create a manifest containing specific subscription by multiple amounts of 1 (errors wouldn't be intermittent then)
# hammer subscription list --organization-id 1
----------------|----------|---------|--------------|-----------|----------|------------|----------------------------------|-----------------|-----------|---------
NAME            | CONTRACT | ACCOUNT | SUPPORT      | QUANTITY  | CONSUMED | END DATE   | ID                               | PRODUCT         | QUANTITY  | ATTACHED
----------------|----------|---------|--------------|-----------|----------|------------|----------------------------------|-----------------|-----------|---------
Employee SKU    | 10169793 | 477931  | Self-Support | 1         | 0        | 2022-01-01 | 8a90c6bc4fb24dbf014fbcf906df0bd7 | Employee SKU    | 1         | 0       
Employee SKU    | 10169793 | 477931  | Self-Support | 1         | 0        | 2022-01-01 | 8a90c6bc4fb24dbf014fbcf904670aa3 | Employee SKU    | 1         | 0       
Employee SKU    | 10169793 | 477931  | Self-Support | 1         | 0        | 2022-01-01 | 8a90c6bc4fb24dbf014fbcf901fe096f | Employee SKU    | 1         | 0       
Employee SKU    | 10169793 | 477931  | Self-Support | 1         | 0        | 2022-01-01 | 8a90c6bc4fb24dbf014fbcf909210d0b | Employee SKU    | 1         | 0       
Employee SKU    | 10169793 | 477931  | Self-Support | 1         | 0        | 2022-01-01 | 8a90c6bc4fb24dbf014fbcf90b5c0e3f | Employee SKU    | 1         | 0       
----------------|----------|---------|--------------|-----------|----------|------------|----------------------------------|-----------------|-----------|---------

2. Sync rh repos, create and publish CVs, create related AK 

3. Register 2 content host using this activationkey
@CHOSTs (simultaneously):
# subscription-manager register --org=Default_Organization --activationkey=ak_rhel7

4. Check registration results, logs...
@GUEST1:
# subscription-manager list

+-------------------------------------------+
    Installed Product Status
+-------------------------------------------+
Product Name:   Red Hat Enterprise Linux Server
Product ID:     69
Version:        7.1
Arch:           x86_64
Status:         Subscribed
Status Details: 
Starts:         04/24/2013
Ends:           01/01/2022

@GUEST2:
# subscription-manager list

+-------------------------------------------+
    Installed Product Status
+-------------------------------------------+
Product Name:   Red Hat Enterprise Linux Server
Product ID:     69
Version:        7.1
Arch:           x86_64
Status:         Unknown
Status Details: 
Starts:         
Ends:           


@SAT:
# tail -f /var/log/candlepin/candlepin.log
...
2015-09-14 05:17:34,546 [req=dc46c9d9-df7c-47a8-95c7-931faa7c326c, org=] INFO  org.candlepin.controller.CandlepinPoolManager - Running pre-entitlement rules.
2015-09-14 05:17:34,570 [req=dc46c9d9-df7c-47a8-95c7-931faa7c326c, org=] WARN  org.candlepin.controller.CandlepinPoolManager - Entitlement not granted: [rulefailed.no.entitlements.available]
2015-09-14 05:17:34,584 [req=dc46c9d9-df7c-47a8-95c7-931faa7c326c, org=] INFO  org.candlepin.common.filter.LoggingFilter - Response: status=403, content-type="application/json", time=27491
...

@GUEST2:
#tail -f /var/log/rhsm/rhsm.log
...
2015-09-14 09:17:05,867 [DEBUG] subscription-manager @connection.py:420 - Loaded CA certificates from /etc/rhsm/ca/: candlepin-stage.pem, redhat-uep.pem, katello-default-ca.pem, katello-server-ca.pem
2015-09-14 09:17:05,868 [DEBUG] subscription-manager @connection.py:469 - Making request: POST /rhsm/consumers?owner=Default_Organization&activation_keys=ak_rhel7
2015-09-14 09:17:34,843 [DEBUG] subscription-manager @connection.py:492 - Response: status=500
2015-09-14 09:17:34,844 [ERROR] subscription-manager @managercli.py:1069 - There are not enough free subscriptions available for the product '69'
Traceback (most recent call last):
  File "/usr/share/rhsm/subscription_manager/managercli.py", line 1064, in _do_command
    content_tags=self.installed_mgr.tags)
  File "/usr/lib64/python2.7/site-packages/rhsm/connection.py", line 765, in registerConsumer
    return self.conn.request_post(url, params)
  File "/usr/lib64/python2.7/site-packages/rhsm/connection.py", line 576, in request_post
    return self._request("POST", method, params)
  File "/usr/lib64/python2.7/site-packages/rhsm/connection.py", line 501, in _request
    self.validateResponse(result, request_type, handler)
  File "/usr/lib64/python2.7/site-packages/rhsm/connection.py", line 543, in validateResponse
    raise RestlibException(response['status'], error_msg)
RestlibException: There are not enough free subscriptions available for the product '69'
...

Comment 14 Lukas Pramuk 2015-09-14 10:18:03 UTC
I mean by GUEST = CHOST (content host)

Comment 15 Lukas Pramuk 2015-09-14 10:43:44 UTC
VERIFIED.
@Sat6.1.2-SNAP1-Compose3
candlepin-0.9.49.6-1.el7.noarch
candlepin-selinux-0.9.49.6-1.el7.noarch
candlepin-tomcat-0.9.49.6-1.el7.noarch

Verified by manual reproducer described in comment #13

>>> both machines are registered, no errors in the logs

Comment 17 errata-xmlrpc 2015-09-15 07:22:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2015:1786