Bug 1264862 - System registration timeout when concurrent content provisioning are handled
System registration timeout when concurrent content provisioning are handled
Status: CLOSED DUPLICATE of bug 1201428
Product: Red Hat Satellite 6
Classification: Red Hat
Component: Subscription Management (Show other bugs)
Unspecified Unspecified
unspecified Severity high (vote)
: GA
: --
Assigned To: Justin Sherrill
Katello QA List
: Triaged
Depends On: 1201428
Blocks: 1122832
  Show dependency treegraph
Reported: 2015-09-21 07:33 EDT by Peter Vreman
Modified: 2016-08-29 03:41 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2016-08-29 03:41:28 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
task export - referenced task is efcf50d5-dd48-4d83-a154-8a0c357bfe2d (13.13 MB, application/x-gzip)
2015-09-21 07:33 EDT, Peter Vreman
no flags Details

  None (edit)
Description Peter Vreman 2015-09-21 07:33:21 EDT
Created attachment 1075474 [details]
task export - referenced task is efcf50d5-dd48-4d83-a154-8a0c357bfe2d

Description of problem:
The registration timeout on the client is 60 seconds:

2015-09-21 09:54:06,061 [DEBUG]  @connection.py:460 - Response status: 204
2015-09-21 09:54:06,064 [INFO]  @managerlib.py:775 - Successfully un-registered.
2015-09-21 09:54:06,067 [DEBUG]  @identity.py:130 - Loading consumer info from identity certificates.
2015-09-21 09:54:06,071 [INFO]  @cache.py:116 - Deleting cache: /var/lib/rhsm/packages/packages.json
2015-09-21 09:54:06,072 [INFO]  @cache.py:116 - Deleting cache: /var/lib/rhsm/facts/facts.json
2015-09-21 09:54:06,073 [INFO]  @cache.py:116 - Deleting cache: /var/lib/rhsm/cache/installed_products.json
2015-09-21 09:54:06,074 [INFO]  @cache.py:116 - Deleting cache: /var/lib/rhsm/cache/entitlement_status.json
2015-09-21 09:54:06,076 [INFO]  @managercli.py:1019 - --force specified, unregistered old consumer: 1b12f89c-597e-4e85-8eeb-8b6c24a8a5bd
2015-09-21 09:54:06,090 [DEBUG]  @hwprobe.py:518 - cpu info: {'cpu.cpu(s)': 1, 'cpu.core(s)_per_socket': 1, 'cpu.thread(s)_per_core': 1, 'cpu.topology_source': 'kernel /sys cpu sibl
ing lists', 'cpu.cpu_socket(s)': 1}
2015-09-21 09:54:06,483 [DEBUG]  @connection.py:420 - Loading CA PEM certificates from: /etc/rhsm/ca/
2015-09-21 09:54:06,484 [DEBUG]  @connection.py:402 - Loading CA certificate: '/etc/rhsm/ca/katello-default-ca.pem'
2015-09-21 09:54:06,485 [DEBUG]  @connection.py:402 - Loading CA certificate: '/etc/rhsm/ca/katello-server-ca.pem'
2015-09-21 09:54:06,485 [DEBUG]  @connection.py:402 - Loading CA certificate: '/etc/rhsm/ca/redhat-uep.pem'
2015-09-21 09:54:06,486 [DEBUG]  @connection.py:402 - Loading CA certificate: '/etc/rhsm/ca/candlepin-stage.pem'
2015-09-21 09:54:06,487 [DEBUG]  @connection.py:441 - Making request: POST /rhsm/consumers?owner=Hilti&activation_keys=hg-crash%3A%3AInfra%3A%3ACapsule61
2015-09-21 09:55:06,584 [ERROR]  @managercli.py:151 - Error during registration:
2015-09-21 09:55:06,586 [ERROR]  @managercli.py:152 -
Traceback (most recent call last):
  File "/usr/share/rhsm/subscription_manager/managercli.py", line 1054, in _do_command
  File "/usr/lib64/python2.6/site-packages/rhsm/connection.py", line 722, in registerConsumer
    return self.conn.request_post(url, params)
  File "/usr/lib64/python2.6/site-packages/rhsm/connection.py", line 544, in request_post
    return self._request("POST", method, params)
  File "/usr/lib64/python2.6/site-packages/rhsm/connection.py", line 455, in _request
    response = conn.getresponse()
  File "/usr/lib64/python2.6/httplib.py", line 990, in getresponse
  File "/usr/lib64/python2.6/httplib.py", line 391, in begin
    version, status, reason = self._read_status()
  File "/usr/lib64/python2.6/httplib.py", line 349, in _read_status
    line = self.fp.readline()
  File "/usr/lib64/python2.6/socket.py", line 433, in readline
    data = recv(1)
  File "/usr/lib64/python2.6/site-packages/M2Crypto/SSL/Connection.py", line 228, in read
    return self._read_bio(size)
  File "/usr/lib64/python2.6/site-packages/M2Crypto/SSL/Connection.py", line 213, in _read_bio
    return m2.ssl_read(self.ssl, size, self._timeout)

Attached is the tasks export
The "Create System" task efcf50d5-dd48-4d83-a154-8a0c357bfe2d in Satellite duration was 76 seconds to complete. This is larger than the timeout of 60seconds. Therefor the client gave a timeout.
Closely looking at the "Create System" tasks showed that sum of the duration the individual Actions was only 8 seconds. That means that 68 seconds were lost in the scheduling part.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Sync 50+ RedHat repositories concurrently
2. Publish 10 CVs with each 10 Repos concurrently
3. Make sure the system has atleast 50+ tasks in running
4. Register a new system

Actual results:
Timeout to register client
Alternative if no timeout happened then the response time is very slow

Expected results:
Registration successful and quick (<10 seconds)

Additional info:
We are now running with SSD backend, but the Content provisioning performance is still slow and consuming many hours. Below is throughput of the pulp backend, and in SAR there is also no high iowait percentage visible.

[crash] root@li-lc-1578:/var/lib/pulp# dd of=tmpfile1 if=/dev/zero bs=1M count=500
500+0 records in
500+0 records out
524288000 bytes (524 MB) copied, 1.25679 s, 417 MB/s
Comment 2 Justin Sherrill 2016-03-18 09:43:06 EDT
It does seem that dynflow/foreman-tasks may have been overloaded as:

* there was a 20 second gap between the two run steps
* and a 30 second gap between the run phase and the finalize phase

This is largely related to https://bugzilla.redhat.com/show_bug.cgi?id=1201428
Comment 5 Peter Vreman 2016-08-29 03:41:28 EDT
Haven't hit this issue in a while. The repo sync and CV publishing has improved a lot so.

Marking is as duplicate of the concurrent dynflow issue to improve handling of concurrency in dynflow in general

*** This bug has been marked as a duplicate of bug 1201428 ***

Note You need to log in before you can comment on or make changes to this bug.