Bug 1490019 - concurrent registrations of 32 hosts causes 1 of them fail with: Unable to verify server's identity: ('The read operation timed out',)
Summary: concurrent registrations of 32 hosts causes 1 of them fail with: Unable to ve...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Registration
Version: 6.2.11
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: Unspecified
Assignee: satellite6-bugs
QA Contact: jcallaha
URL:
Whiteboard:
Depends On: 1514508 1553861 1553862 1553869 1553871
Blocks: 1122832
TreeView+ depends on / blocked
 
Reported: 2017-09-09 08:35 UTC by Jan Hutař
Modified: 2021-09-09 12:35 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-05-21 20:21:35 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
registration logs (1.13 MB, text/plain)
2018-04-23 19:44 UTC, jcallaha
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:1673 0 None None None 2018-05-21 20:21:43 UTC

Description Jan Hutař 2017-09-09 08:35:42 UTC
Description of problem:
When I'm running concurrent registrations of 32 hosts, it causes that 1 of them fails with: "Unable to verify server's identity: ('The read operation timed out',)". IMO this is a regression, as I remember that earlier, with 6.2, we were able to register 50 clients concurrently without any tunings done on Satellite or client side (I can be wrong, but have strong feeling about that).


Version-Release number of selected component (if applicable):
satellite-6.2.11-2.0.el7sat.noarch


How reproducible:
2 of 4 attempts
(rate is bigger when registering more hosts concurrently)


Steps to Reproduce:
1. Register 32 clients to the satellite in parallel with:
   # subscription-manager register --org Default_Organization \
         --environment Library --username <user> --password <pass> --force


Actual results:
1 registration fails


Expected results:
all registrations should pass

Comment 1 Jan Hutař 2017-09-09 08:42:31 UTC
Example of one of the failed clients:

b10-h01-r620container129.example.com - 172.18.0.129

From /var/log/rhsm/rhsm.log:

2017-09-09 08:32:59,368 [INFO] rhsmd:346:MainThread @rhsmd:263 - rhsmd started
2017-09-09 08:33:00,149 [INFO] subscription-manager:340:MainThread @managercli.py:518 - X-Correlation-ID: b7a518b3c9eb4dc7aab21b6a8190fc70
2017-09-09 08:33:00,149 [INFO] subscription-manager:340:MainThread @managercli.py:407 - Client Versions: {'python-rhsm': '1.19.9-1.el7', 'subscription-manager': '1.19.21-1.el7'}
2017-09-09 08:33:00,215 [INFO] subscription-manager:340:MainThread @managerlib.py:889 - Cleaned local data
2017-09-09 08:33:00,238 [ERROR] rhsmd:346:MainThread @cache.py:128 - Unable to read cache: /var/lib/rhsm/cache/rhsm_icon.json
2017-09-09 08:33:00,239 [ERROR] rhsmd:346:MainThread @cache.py:129 - [Errno 2] No such file or directory: '/var/lib/rhsm/cache/rhsm_icon.json'
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/subscription_manager/cache.py", line 123, in _read_cache
    f = open(self.CACHE_FILE)
IOError: [Errno 2] No such file or directory: '/var/lib/rhsm/cache/rhsm_icon.json'
2017-09-09 08:33:10,501 [INFO] rhsmd:502:MainThread @rhsmd:263 - rhsmd started
2017-09-09 08:33:10,649 [INFO] subscription-manager:496:MainThread @managercli.py:518 - X-Correlation-ID: 6b91ae91eb154d92b0b76dd93642909a
2017-09-09 08:33:10,650 [INFO] subscription-manager:496:MainThread @managercli.py:407 - Client Versions: {'python-rhsm': '1.19.9-1.el7', 'subscription-manager': '1.19.21-1.el7'}
2017-09-09 08:33:16,232 [INFO] subscription-manager:578:MainThread @managercli.py:518 - X-Correlation-ID: 28825ea09ac84f84b64263aa7899309c
2017-09-09 08:33:16,233 [INFO] subscription-manager:578:MainThread @managercli.py:407 - Client Versions: {'python-rhsm': '1.19.9-1.el7', 'subscription-manager': '1.19.21-1.el7'}
2017-09-09 08:33:16,234 [INFO] subscription-manager:578:MainThread @connection.py:822 - Connection built: host=c08-h21-r630-vm2.rdu.openstack.engineering.redhat.com port=443 handler=/rhsm auth=identity_cert ca_dir=/etc/rhsm/ca/ insecure=False
2017-09-09 08:33:16,235 [INFO] subscription-manager:578:MainThread @connection.py:822 - Connection built: host=c08-h21-r630-vm2.rdu.openstack.engineering.redhat.com port=443 handler=/rhsm auth=none
2017-09-09 08:33:16,236 [INFO] subscription-manager:578:MainThread @managercli.py:407 - Client Versions: {'python-rhsm': '1.19.9-1.el7', 'subscription-manager': '1.19.21-1.el7'}
2017-09-09 08:33:16,236 [INFO] subscription-manager:578:MainThread @managercli.py:382 - Consumer Identity name=None uuid=None
2017-09-09 08:33:16,239 [INFO] subscription-manager:578:MainThread @managercli.py:382 - Consumer Identity name=None uuid=None
2017-09-09 08:33:16,241 [INFO] subscription-manager:578:MainThread @connection.py:822 - Connection built: host=c08-h21-r630-vm2.rdu.openstack.engineering.redhat.com port=443 handler=/rhsm auth=basic username=admin
2017-09-09 08:33:16,454 [INFO] subscription-manager:578:MainThread @dmiinfo.py:73 - Using dmidecode dump file: /dev/mem
2017-09-09 08:33:16,475 [WARNING] subscription-manager:578:MainThread @dmiinfo.py:127 - Error reading system DMI information: Failed to open memory buffer (/dev/mem): No such file or directory
No SMBIOS nor DMI entry point found, sorry.
Permission denied to memory file/device (/dev/mem)
Permission denied to memory file/device (/dev/mem)
Permission denied to memory file/device (/dev/mem)
Permission denied to memory file/device (/dev/mem)
Permission denied to memory file/device (/dev/mem)
Permission denied to memory file/device (/dev/mem)
Permission denied to memory file/device (/dev/mem)
Permission denied to memory file/device (/dev/mem)
Permission denied to memory file/device (/dev/mem)
Permission denied to memory file/device (/dev/mem)
Permission denied to memory file/device (/dev/mem)
Permission denied to memory file/device (/dev/mem)
Permission denied to memory file/device (/dev/mem)

2017-09-09 08:33:16,692 [INFO] subscription-manager:578:MainThread @custom.py:85 - Loading custom facts from: /etc/rhsm/facts/katello.facts
2017-09-09 08:33:23,130 [INFO] subscription-manager:578:MainThread @connection.py:552 - Response: status=200, request="GET /rhsm/"
2017-09-09 08:33:24,955 [INFO] subscription-manager:578:MainThread @connection.py:552 - Response: status=200, request="GET /rhsm/owners/Default_Organization/environments?name=Library"
2017-09-09 08:36:25,069 [ERROR] subscription-manager:578:MainThread @managercli.py:177 - Error during registration: ('The read operation timed out',)
2017-09-09 08:36:25,070 [ERROR] subscription-manager:578:MainThread @managercli.py:178 - ('The read operation timed out',)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/subscription_manager/managercli.py", line 1194, in _do_command
    content_tags=self.installed_mgr.tags)
  File "/usr/lib64/python2.7/site-packages/rhsm/connection.py", line 920, in registerConsumer
    return self.conn.request_post(url, params)
  File "/usr/lib64/python2.7/site-packages/rhsm/connection.py", line 649, in request_post
    return self._request("POST", method, params, headers=headers)
  File "/usr/lib64/python2.7/site-packages/rhsm/connection.py", line 672, in _request
    info=info, headers=headers)
  File "/usr/lib64/python2.7/site-packages/rhsm/connection.py", line 539, in _request
    response = conn.getresponse()
  File "/usr/lib64/python2.7/httplib.py", line 1089, in getresponse
    response.begin()
  File "/usr/lib64/python2.7/httplib.py", line 444, in begin
    version, status, reason = self._read_status()
  File "/usr/lib64/python2.7/httplib.py", line 400, in _read_status
    line = self.fp.readline(_MAXLINE + 1)
  File "/usr/lib64/python2.7/socket.py", line 476, in readline
    data = self._sock.recv(self._rbufsize)
  File "/usr/lib64/python2.7/ssl.py", line 759, in recv
    return self.read(buflen)
  File "/usr/lib64/python2.7/ssl.py", line 653, in read
    v = self._sslobj.read(len or 1024)
SSLError: ('The read operation timed out',)

So to me it looks like client issued a POST call at 08:33:24, and after 3 minutes (default SSL timeout), at 08:36:25 it thrown that timeout exception.

Comment 2 Jan Hutař 2017-09-09 08:48:18 UTC
Looking into Satellite logs:

# grep -C 1 172\.18\.0\.129 /var/log/httpd/foreman-ssl_access_ssl.log
172.17.30.124 - - [09/Sep/2017:04:33:17 -0400] "GET /rhsm/ HTTP/1.1" 200 1746 "-" "RHSM/1.0 (cmd=subscription-manager)"
172.18.0.129 - - [09/Sep/2017:04:33:17 -0400] "GET /rhsm/ HTTP/1.1" 200 1746 "-" "RHSM/1.0 (cmd=subscription-manager)"
172.18.1.149 - - [09/Sep/2017:04:33:17 -0400] "GET /rhsm/ HTTP/1.1" 200 1746 "-" "RHSM/1.0 (cmd=subscription-manager)"
--
172.17.30.124 - - [09/Sep/2017:04:33:23 -0400] "GET /rhsm/owners/Default_Organization/environments?name=Library HTTP/1.1" 200 73 "-" "RHSM/1.0 (cmd=subscription-manager)"
172.18.0.129 - - [09/Sep/2017:04:33:23 -0400] "GET /rhsm/owners/Default_Organization/environments?name=Library HTTP/1.1" 200 73 "-" "RHSM/1.0 (cmd=subscription-manager)"
172.17.49.11 - - [09/Sep/2017:04:33:23 -0400] "GET /rhsm/status HTTP/1.1" 200 264 "-" "RHSM/1.0 (cmd=subscription-manager)"
--
172.17.0.2 - admin [09/Sep/2017:04:39:26 -0400] "GET /pulp/api/v2/tasks/d96adf20-5eed-464d-afed-f5146bce6ef4/ HTTP/1.1" 200 384 "-" "Ruby"
172.18.0.129 - - [09/Sep/2017:04:33:24 -0400] "POST /rhsm/environments/1/consumers HTTP/1.1" 200 10516 "-" "RHSM/1.0 (cmd=subscription-manager)"
172.17.0.2 - admin [09/Sep/2017:04:39:26 -0400] "POST /pulp/api/v2/consumers/d369dff7-9c9a-4dd5-a607-0e874cafc4b6/actions/content/regenerate_applicability/ HTTP/1.1" 202 172 "-" "Ruby"

IMO it looks like first two GETs client did were processed normally, but POST request was obtained at (when moving time to UTC) 08:33:24. Judging by timestamps around it, it was responded 6 minutes later, at 08:39:26, so 3 minutes after client gave up. And its error code was 200, which is good, it was just slow.

Comment 4 Ivan Necas 2017-09-11 07:16:55 UTC
Based on https://bugzilla.redhat.com/show_bug.cgi?id=1490138#c5, I believe this is related to https://bugzilla.redhat.com/show_bug.cgi?id=1483677

Comment 5 Justin Sherrill 2017-09-11 12:51:21 UTC
And to clarify, this is likely a regression due to https://bugzilla.redhat.com/show_bug.cgi?id=1434040 which moved where the package profile uploading was happening.

Comment 19 Mike McCune 2018-03-07 18:27:15 UTC
This will be resolved with the inclusion of 1514508

Comment 24 jcallaha 2018-04-23 19:43:19 UTC
Verified in Satellite 6.2.15

I performed 1000 content hosts registrations with 50 concurrent at all times. On both RHEL 6 and RHEL 7 all 1000 hosts were registered in around 20 minutes. At no point did any host encounter the error identified in this bug report.

I am attaching the registration logs for the 1000 hosts registered to the RHEL 7 Satellite 6.2.15 server.

Comment 25 jcallaha 2018-04-23 19:44:01 UTC
Created attachment 1425724 [details]
registration logs

Comment 28 errata-xmlrpc 2018-05-21 20:21:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1673


Note You need to log in before you can comment on or make changes to this bug.