Bug 1490019
| Summary: | concurrent registrations of 32 hosts causes 1 of them fail with: Unable to verify server's identity: ('The read operation timed out',) | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Satellite | Reporter: | Jan Hutař <jhutar> | ||||
| Component: | Registration | Assignee: | satellite6-bugs <satellite6-bugs> | ||||
| Status: | CLOSED ERRATA | QA Contact: | jcallaha | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 6.2.11 | CC: | bbuckingham, bkearney, cduryee, inecas, jbhatia, jsherril, ktordeur, lzap, mmccune, psuriset, rajgupta, sbadhwar, sisadoun, tarek.kiwan, tbhowmik, zhunting | ||||
| Target Milestone: | Unspecified | Keywords: | Performance, Regression, Triaged | ||||
| Target Release: | Unused | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2018-05-21 20:21:35 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | 1514508, 1553861, 1553862, 1553869, 1553871 | ||||||
| Bug Blocks: | 1122832 | ||||||
| Attachments: |
|
||||||
Example of one of the failed clients:
b10-h01-r620container129.example.com - 172.18.0.129
From /var/log/rhsm/rhsm.log:
2017-09-09 08:32:59,368 [INFO] rhsmd:346:MainThread @rhsmd:263 - rhsmd started
2017-09-09 08:33:00,149 [INFO] subscription-manager:340:MainThread @managercli.py:518 - X-Correlation-ID: b7a518b3c9eb4dc7aab21b6a8190fc70
2017-09-09 08:33:00,149 [INFO] subscription-manager:340:MainThread @managercli.py:407 - Client Versions: {'python-rhsm': '1.19.9-1.el7', 'subscription-manager': '1.19.21-1.el7'}
2017-09-09 08:33:00,215 [INFO] subscription-manager:340:MainThread @managerlib.py:889 - Cleaned local data
2017-09-09 08:33:00,238 [ERROR] rhsmd:346:MainThread @cache.py:128 - Unable to read cache: /var/lib/rhsm/cache/rhsm_icon.json
2017-09-09 08:33:00,239 [ERROR] rhsmd:346:MainThread @cache.py:129 - [Errno 2] No such file or directory: '/var/lib/rhsm/cache/rhsm_icon.json'
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/subscription_manager/cache.py", line 123, in _read_cache
f = open(self.CACHE_FILE)
IOError: [Errno 2] No such file or directory: '/var/lib/rhsm/cache/rhsm_icon.json'
2017-09-09 08:33:10,501 [INFO] rhsmd:502:MainThread @rhsmd:263 - rhsmd started
2017-09-09 08:33:10,649 [INFO] subscription-manager:496:MainThread @managercli.py:518 - X-Correlation-ID: 6b91ae91eb154d92b0b76dd93642909a
2017-09-09 08:33:10,650 [INFO] subscription-manager:496:MainThread @managercli.py:407 - Client Versions: {'python-rhsm': '1.19.9-1.el7', 'subscription-manager': '1.19.21-1.el7'}
2017-09-09 08:33:16,232 [INFO] subscription-manager:578:MainThread @managercli.py:518 - X-Correlation-ID: 28825ea09ac84f84b64263aa7899309c
2017-09-09 08:33:16,233 [INFO] subscription-manager:578:MainThread @managercli.py:407 - Client Versions: {'python-rhsm': '1.19.9-1.el7', 'subscription-manager': '1.19.21-1.el7'}
2017-09-09 08:33:16,234 [INFO] subscription-manager:578:MainThread @connection.py:822 - Connection built: host=c08-h21-r630-vm2.rdu.openstack.engineering.redhat.com port=443 handler=/rhsm auth=identity_cert ca_dir=/etc/rhsm/ca/ insecure=False
2017-09-09 08:33:16,235 [INFO] subscription-manager:578:MainThread @connection.py:822 - Connection built: host=c08-h21-r630-vm2.rdu.openstack.engineering.redhat.com port=443 handler=/rhsm auth=none
2017-09-09 08:33:16,236 [INFO] subscription-manager:578:MainThread @managercli.py:407 - Client Versions: {'python-rhsm': '1.19.9-1.el7', 'subscription-manager': '1.19.21-1.el7'}
2017-09-09 08:33:16,236 [INFO] subscription-manager:578:MainThread @managercli.py:382 - Consumer Identity name=None uuid=None
2017-09-09 08:33:16,239 [INFO] subscription-manager:578:MainThread @managercli.py:382 - Consumer Identity name=None uuid=None
2017-09-09 08:33:16,241 [INFO] subscription-manager:578:MainThread @connection.py:822 - Connection built: host=c08-h21-r630-vm2.rdu.openstack.engineering.redhat.com port=443 handler=/rhsm auth=basic username=admin
2017-09-09 08:33:16,454 [INFO] subscription-manager:578:MainThread @dmiinfo.py:73 - Using dmidecode dump file: /dev/mem
2017-09-09 08:33:16,475 [WARNING] subscription-manager:578:MainThread @dmiinfo.py:127 - Error reading system DMI information: Failed to open memory buffer (/dev/mem): No such file or directory
No SMBIOS nor DMI entry point found, sorry.
Permission denied to memory file/device (/dev/mem)
Permission denied to memory file/device (/dev/mem)
Permission denied to memory file/device (/dev/mem)
Permission denied to memory file/device (/dev/mem)
Permission denied to memory file/device (/dev/mem)
Permission denied to memory file/device (/dev/mem)
Permission denied to memory file/device (/dev/mem)
Permission denied to memory file/device (/dev/mem)
Permission denied to memory file/device (/dev/mem)
Permission denied to memory file/device (/dev/mem)
Permission denied to memory file/device (/dev/mem)
Permission denied to memory file/device (/dev/mem)
Permission denied to memory file/device (/dev/mem)
2017-09-09 08:33:16,692 [INFO] subscription-manager:578:MainThread @custom.py:85 - Loading custom facts from: /etc/rhsm/facts/katello.facts
2017-09-09 08:33:23,130 [INFO] subscription-manager:578:MainThread @connection.py:552 - Response: status=200, request="GET /rhsm/"
2017-09-09 08:33:24,955 [INFO] subscription-manager:578:MainThread @connection.py:552 - Response: status=200, request="GET /rhsm/owners/Default_Organization/environments?name=Library"
2017-09-09 08:36:25,069 [ERROR] subscription-manager:578:MainThread @managercli.py:177 - Error during registration: ('The read operation timed out',)
2017-09-09 08:36:25,070 [ERROR] subscription-manager:578:MainThread @managercli.py:178 - ('The read operation timed out',)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/subscription_manager/managercli.py", line 1194, in _do_command
content_tags=self.installed_mgr.tags)
File "/usr/lib64/python2.7/site-packages/rhsm/connection.py", line 920, in registerConsumer
return self.conn.request_post(url, params)
File "/usr/lib64/python2.7/site-packages/rhsm/connection.py", line 649, in request_post
return self._request("POST", method, params, headers=headers)
File "/usr/lib64/python2.7/site-packages/rhsm/connection.py", line 672, in _request
info=info, headers=headers)
File "/usr/lib64/python2.7/site-packages/rhsm/connection.py", line 539, in _request
response = conn.getresponse()
File "/usr/lib64/python2.7/httplib.py", line 1089, in getresponse
response.begin()
File "/usr/lib64/python2.7/httplib.py", line 444, in begin
version, status, reason = self._read_status()
File "/usr/lib64/python2.7/httplib.py", line 400, in _read_status
line = self.fp.readline(_MAXLINE + 1)
File "/usr/lib64/python2.7/socket.py", line 476, in readline
data = self._sock.recv(self._rbufsize)
File "/usr/lib64/python2.7/ssl.py", line 759, in recv
return self.read(buflen)
File "/usr/lib64/python2.7/ssl.py", line 653, in read
v = self._sslobj.read(len or 1024)
SSLError: ('The read operation timed out',)
So to me it looks like client issued a POST call at 08:33:24, and after 3 minutes (default SSL timeout), at 08:36:25 it thrown that timeout exception.
Looking into Satellite logs: # grep -C 1 172\.18\.0\.129 /var/log/httpd/foreman-ssl_access_ssl.log 172.17.30.124 - - [09/Sep/2017:04:33:17 -0400] "GET /rhsm/ HTTP/1.1" 200 1746 "-" "RHSM/1.0 (cmd=subscription-manager)" 172.18.0.129 - - [09/Sep/2017:04:33:17 -0400] "GET /rhsm/ HTTP/1.1" 200 1746 "-" "RHSM/1.0 (cmd=subscription-manager)" 172.18.1.149 - - [09/Sep/2017:04:33:17 -0400] "GET /rhsm/ HTTP/1.1" 200 1746 "-" "RHSM/1.0 (cmd=subscription-manager)" -- 172.17.30.124 - - [09/Sep/2017:04:33:23 -0400] "GET /rhsm/owners/Default_Organization/environments?name=Library HTTP/1.1" 200 73 "-" "RHSM/1.0 (cmd=subscription-manager)" 172.18.0.129 - - [09/Sep/2017:04:33:23 -0400] "GET /rhsm/owners/Default_Organization/environments?name=Library HTTP/1.1" 200 73 "-" "RHSM/1.0 (cmd=subscription-manager)" 172.17.49.11 - - [09/Sep/2017:04:33:23 -0400] "GET /rhsm/status HTTP/1.1" 200 264 "-" "RHSM/1.0 (cmd=subscription-manager)" -- 172.17.0.2 - admin [09/Sep/2017:04:39:26 -0400] "GET /pulp/api/v2/tasks/d96adf20-5eed-464d-afed-f5146bce6ef4/ HTTP/1.1" 200 384 "-" "Ruby" 172.18.0.129 - - [09/Sep/2017:04:33:24 -0400] "POST /rhsm/environments/1/consumers HTTP/1.1" 200 10516 "-" "RHSM/1.0 (cmd=subscription-manager)" 172.17.0.2 - admin [09/Sep/2017:04:39:26 -0400] "POST /pulp/api/v2/consumers/d369dff7-9c9a-4dd5-a607-0e874cafc4b6/actions/content/regenerate_applicability/ HTTP/1.1" 202 172 "-" "Ruby" IMO it looks like first two GETs client did were processed normally, but POST request was obtained at (when moving time to UTC) 08:33:24. Judging by timestamps around it, it was responded 6 minutes later, at 08:39:26, so 3 minutes after client gave up. And its error code was 200, which is good, it was just slow. Based on https://bugzilla.redhat.com/show_bug.cgi?id=1490138#c5, I believe this is related to https://bugzilla.redhat.com/show_bug.cgi?id=1483677 And to clarify, this is likely a regression due to https://bugzilla.redhat.com/show_bug.cgi?id=1434040 which moved where the package profile uploading was happening. This will be resolved with the inclusion of 1514508 Verified in Satellite 6.2.15 I performed 1000 content hosts registrations with 50 concurrent at all times. On both RHEL 6 and RHEL 7 all 1000 hosts were registered in around 20 minutes. At no point did any host encounter the error identified in this bug report. I am attaching the registration logs for the 1000 hosts registered to the RHEL 7 Satellite 6.2.15 server. Created attachment 1425724 [details]
registration logs
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1673 |
Description of problem: When I'm running concurrent registrations of 32 hosts, it causes that 1 of them fails with: "Unable to verify server's identity: ('The read operation timed out',)". IMO this is a regression, as I remember that earlier, with 6.2, we were able to register 50 clients concurrently without any tunings done on Satellite or client side (I can be wrong, but have strong feeling about that). Version-Release number of selected component (if applicable): satellite-6.2.11-2.0.el7sat.noarch How reproducible: 2 of 4 attempts (rate is bigger when registering more hosts concurrently) Steps to Reproduce: 1. Register 32 clients to the satellite in parallel with: # subscription-manager register --org Default_Organization \ --environment Library --username <user> --password <pass> --force Actual results: 1 registration fails Expected results: all registrations should pass