Bug 1264862 - System registration timeout when concurrent content provisioning are handled
Summary: System registration timeout when concurrent content provisioning are handled
Keywords:
Status: CLOSED DUPLICATE of bug 1201428
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Subscription Management
Version: 6.1.2
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: Unspecified
Assignee: Justin Sherrill
QA Contact: Katello QA List
URL:
Whiteboard:
Depends On: 1201428
Blocks: 1122832
TreeView+ depends on / blocked
 
Reported: 2015-09-21 11:33 UTC by Peter Vreman
Modified: 2019-04-01 20:27 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-08-29 07:41:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
task export - referenced task is efcf50d5-dd48-4d83-a154-8a0c357bfe2d (13.13 MB, application/x-gzip)
2015-09-21 11:33 UTC, Peter Vreman
no flags Details

Description Peter Vreman 2015-09-21 11:33:21 UTC
Created attachment 1075474 [details]
task export - referenced task is efcf50d5-dd48-4d83-a154-8a0c357bfe2d

Description of problem:
The registration timeout on the client is 60 seconds:

2015-09-21 09:54:06,061 [DEBUG]  @connection.py:460 - Response status: 204
2015-09-21 09:54:06,064 [INFO]  @managerlib.py:775 - Successfully un-registered.
2015-09-21 09:54:06,067 [DEBUG]  @identity.py:130 - Loading consumer info from identity certificates.
2015-09-21 09:54:06,071 [INFO]  @cache.py:116 - Deleting cache: /var/lib/rhsm/packages/packages.json
2015-09-21 09:54:06,072 [INFO]  @cache.py:116 - Deleting cache: /var/lib/rhsm/facts/facts.json
2015-09-21 09:54:06,073 [INFO]  @cache.py:116 - Deleting cache: /var/lib/rhsm/cache/installed_products.json
2015-09-21 09:54:06,074 [INFO]  @cache.py:116 - Deleting cache: /var/lib/rhsm/cache/entitlement_status.json
2015-09-21 09:54:06,076 [INFO]  @managercli.py:1019 - --force specified, unregistered old consumer: 1b12f89c-597e-4e85-8eeb-8b6c24a8a5bd
2015-09-21 09:54:06,090 [DEBUG]  @hwprobe.py:518 - cpu info: {'cpu.cpu(s)': 1, 'cpu.core(s)_per_socket': 1, 'cpu.thread(s)_per_core': 1, 'cpu.topology_source': 'kernel /sys cpu sibl
ing lists', 'cpu.cpu_socket(s)': 1}
2015-09-21 09:54:06,483 [DEBUG]  @connection.py:420 - Loading CA PEM certificates from: /etc/rhsm/ca/
2015-09-21 09:54:06,484 [DEBUG]  @connection.py:402 - Loading CA certificate: '/etc/rhsm/ca/katello-default-ca.pem'
2015-09-21 09:54:06,485 [DEBUG]  @connection.py:402 - Loading CA certificate: '/etc/rhsm/ca/katello-server-ca.pem'
2015-09-21 09:54:06,485 [DEBUG]  @connection.py:402 - Loading CA certificate: '/etc/rhsm/ca/redhat-uep.pem'
2015-09-21 09:54:06,486 [DEBUG]  @connection.py:402 - Loading CA certificate: '/etc/rhsm/ca/candlepin-stage.pem'
2015-09-21 09:54:06,487 [DEBUG]  @connection.py:441 - Making request: POST /rhsm/consumers?owner=Hilti&activation_keys=hg-crash%3A%3AInfra%3A%3ACapsule61
2015-09-21 09:55:06,584 [ERROR]  @managercli.py:151 - Error during registration:
2015-09-21 09:55:06,586 [ERROR]  @managercli.py:152 -
Traceback (most recent call last):
  File "/usr/share/rhsm/subscription_manager/managercli.py", line 1054, in _do_command
    installed_products=self.installed_mgr.format_for_server())
  File "/usr/lib64/python2.6/site-packages/rhsm/connection.py", line 722, in registerConsumer
    return self.conn.request_post(url, params)
  File "/usr/lib64/python2.6/site-packages/rhsm/connection.py", line 544, in request_post
    return self._request("POST", method, params)
  File "/usr/lib64/python2.6/site-packages/rhsm/connection.py", line 455, in _request
    response = conn.getresponse()
  File "/usr/lib64/python2.6/httplib.py", line 990, in getresponse
    response.begin()
  File "/usr/lib64/python2.6/httplib.py", line 391, in begin
    version, status, reason = self._read_status()
  File "/usr/lib64/python2.6/httplib.py", line 349, in _read_status
    line = self.fp.readline()
  File "/usr/lib64/python2.6/socket.py", line 433, in readline
    data = recv(1)
  File "/usr/lib64/python2.6/site-packages/M2Crypto/SSL/Connection.py", line 228, in read
    return self._read_bio(size)
  File "/usr/lib64/python2.6/site-packages/M2Crypto/SSL/Connection.py", line 213, in _read_bio
    return m2.ssl_read(self.ssl, size, self._timeout)
SSLTimeoutError


Attached is the tasks export
The "Create System" task efcf50d5-dd48-4d83-a154-8a0c357bfe2d in Satellite duration was 76 seconds to complete. This is larger than the timeout of 60seconds. Therefor the client gave a timeout.
Closely looking at the "Create System" tasks showed that sum of the duration the individual Actions was only 8 seconds. That means that 68 seconds were lost in the scheduling part.



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Sync 50+ RedHat repositories concurrently
2. Publish 10 CVs with each 10 Repos concurrently
3. Make sure the system has atleast 50+ tasks in running
4. Register a new system
5.

Actual results:
Timeout to register client
Alternative if no timeout happened then the response time is very slow


Expected results:
Registration successful and quick (<10 seconds)

Additional info:
We are now running with SSD backend, but the Content provisioning performance is still slow and consuming many hours. Below is throughput of the pulp backend, and in SAR there is also no high iowait percentage visible.

[crash] root@li-lc-1578:/var/lib/pulp# dd of=tmpfile1 if=/dev/zero bs=1M count=500
500+0 records in
500+0 records out
524288000 bytes (524 MB) copied, 1.25679 s, 417 MB/s

Comment 2 Justin Sherrill 2016-03-18 13:43:06 UTC
It does seem that dynflow/foreman-tasks may have been overloaded as:

* there was a 20 second gap between the two run steps
* and a 30 second gap between the run phase and the finalize phase
 

This is largely related to https://bugzilla.redhat.com/show_bug.cgi?id=1201428

Comment 5 Peter Vreman 2016-08-29 07:41:28 UTC
Haven't hit this issue in a while. The repo sync and CV publishing has improved a lot so.

Marking is as duplicate of the concurrent dynflow issue to improve handling of concurrency in dynflow in general

*** This bug has been marked as a duplicate of bug 1201428 ***


Note You need to log in before you can comment on or make changes to this bug.