Created attachment 1075474 [details] task export - referenced task is efcf50d5-dd48-4d83-a154-8a0c357bfe2d Description of problem: The registration timeout on the client is 60 seconds: 2015-09-21 09:54:06,061 [DEBUG] @connection.py:460 - Response status: 204 2015-09-21 09:54:06,064 [INFO] @managerlib.py:775 - Successfully un-registered. 2015-09-21 09:54:06,067 [DEBUG] @identity.py:130 - Loading consumer info from identity certificates. 2015-09-21 09:54:06,071 [INFO] @cache.py:116 - Deleting cache: /var/lib/rhsm/packages/packages.json 2015-09-21 09:54:06,072 [INFO] @cache.py:116 - Deleting cache: /var/lib/rhsm/facts/facts.json 2015-09-21 09:54:06,073 [INFO] @cache.py:116 - Deleting cache: /var/lib/rhsm/cache/installed_products.json 2015-09-21 09:54:06,074 [INFO] @cache.py:116 - Deleting cache: /var/lib/rhsm/cache/entitlement_status.json 2015-09-21 09:54:06,076 [INFO] @managercli.py:1019 - --force specified, unregistered old consumer: 1b12f89c-597e-4e85-8eeb-8b6c24a8a5bd 2015-09-21 09:54:06,090 [DEBUG] @hwprobe.py:518 - cpu info: {'cpu.cpu(s)': 1, 'cpu.core(s)_per_socket': 1, 'cpu.thread(s)_per_core': 1, 'cpu.topology_source': 'kernel /sys cpu sibl ing lists', 'cpu.cpu_socket(s)': 1} 2015-09-21 09:54:06,483 [DEBUG] @connection.py:420 - Loading CA PEM certificates from: /etc/rhsm/ca/ 2015-09-21 09:54:06,484 [DEBUG] @connection.py:402 - Loading CA certificate: '/etc/rhsm/ca/katello-default-ca.pem' 2015-09-21 09:54:06,485 [DEBUG] @connection.py:402 - Loading CA certificate: '/etc/rhsm/ca/katello-server-ca.pem' 2015-09-21 09:54:06,485 [DEBUG] @connection.py:402 - Loading CA certificate: '/etc/rhsm/ca/redhat-uep.pem' 2015-09-21 09:54:06,486 [DEBUG] @connection.py:402 - Loading CA certificate: '/etc/rhsm/ca/candlepin-stage.pem' 2015-09-21 09:54:06,487 [DEBUG] @connection.py:441 - Making request: POST /rhsm/consumers?owner=Hilti&activation_keys=hg-crash%3A%3AInfra%3A%3ACapsule61 2015-09-21 09:55:06,584 [ERROR] @managercli.py:151 - Error during registration: 2015-09-21 09:55:06,586 [ERROR] @managercli.py:152 - Traceback (most recent call last): File "/usr/share/rhsm/subscription_manager/managercli.py", line 1054, in _do_command installed_products=self.installed_mgr.format_for_server()) File "/usr/lib64/python2.6/site-packages/rhsm/connection.py", line 722, in registerConsumer return self.conn.request_post(url, params) File "/usr/lib64/python2.6/site-packages/rhsm/connection.py", line 544, in request_post return self._request("POST", method, params) File "/usr/lib64/python2.6/site-packages/rhsm/connection.py", line 455, in _request response = conn.getresponse() File "/usr/lib64/python2.6/httplib.py", line 990, in getresponse response.begin() File "/usr/lib64/python2.6/httplib.py", line 391, in begin version, status, reason = self._read_status() File "/usr/lib64/python2.6/httplib.py", line 349, in _read_status line = self.fp.readline() File "/usr/lib64/python2.6/socket.py", line 433, in readline data = recv(1) File "/usr/lib64/python2.6/site-packages/M2Crypto/SSL/Connection.py", line 228, in read return self._read_bio(size) File "/usr/lib64/python2.6/site-packages/M2Crypto/SSL/Connection.py", line 213, in _read_bio return m2.ssl_read(self.ssl, size, self._timeout) SSLTimeoutError Attached is the tasks export The "Create System" task efcf50d5-dd48-4d83-a154-8a0c357bfe2d in Satellite duration was 76 seconds to complete. This is larger than the timeout of 60seconds. Therefor the client gave a timeout. Closely looking at the "Create System" tasks showed that sum of the duration the individual Actions was only 8 seconds. That means that 68 seconds were lost in the scheduling part. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Sync 50+ RedHat repositories concurrently 2. Publish 10 CVs with each 10 Repos concurrently 3. Make sure the system has atleast 50+ tasks in running 4. Register a new system 5. Actual results: Timeout to register client Alternative if no timeout happened then the response time is very slow Expected results: Registration successful and quick (<10 seconds) Additional info: We are now running with SSD backend, but the Content provisioning performance is still slow and consuming many hours. Below is throughput of the pulp backend, and in SAR there is also no high iowait percentage visible. [crash] root@li-lc-1578:/var/lib/pulp# dd of=tmpfile1 if=/dev/zero bs=1M count=500 500+0 records in 500+0 records out 524288000 bytes (524 MB) copied, 1.25679 s, 417 MB/s
It does seem that dynflow/foreman-tasks may have been overloaded as: * there was a 20 second gap between the two run steps * and a 30 second gap between the run phase and the finalize phase This is largely related to https://bugzilla.redhat.com/show_bug.cgi?id=1201428
Haven't hit this issue in a while. The repo sync and CV publishing has improved a lot so. Marking is as duplicate of the concurrent dynflow issue to improve handling of concurrency in dynflow in general *** This bug has been marked as a duplicate of bug 1201428 ***