Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1480071 - Concurrent registrations seems to have frequent RHSM timeouts
Summary: Concurrent registrations seems to have frequent RHSM timeouts
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Performance
Version: 6.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: Unspecified
Assignee: satellite6-bugs
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-08-10 05:32 UTC by sbadhwar
Modified: 2019-04-01 20:27 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-11-02 12:16:24 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description sbadhwar 2017-08-10 05:32:03 UTC
Description of problem:
In the Satellite 6.2.x releases, we were easily able to accomplish nearly 75 concurrent content host registrations. With the Satellite 6.3 Snap releases, this number has came down to close to 40 concurrent registrations only.

The error that appears frequently during the registrations is "Unable to establish server identity: Timed Out"

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Ivan Necas 2017-08-10 15:21:17 UTC
Could you provide as much debug information as possible to pinpoint what the issues are? Without any data we can't do any analysis of where the issues actually are.

Comment 2 sbadhwar 2017-08-21 11:29:54 UTC
(In reply to Ivan Necas from comment #1)
> Could you provide as much debug information as possible to pinpoint what the
> issues are? Without any data we can't do any analysis of where the issues
> actually are.

Hello Ivan,

While doing concurrent registrations for the content hosts(75 hosts registering concurrently to Satellite 6.3), we see a number of hosts failing to register to Satellite. The error that is displayed is:

Unable to verify server's identity: timed out

On checking the RHSM log of the failed host, the following trace is present

2017-08-18 09:16:52,477 [INFO] subscription-manager:380:MainThread @hwprobe.py:916 - collected virt facts: virt.is_guest=True, virt.host_type=lxc, docker, virt.uuid=Not Set
2017-08-18 09:16:52,478 [INFO] subscription-manager:380:MainThread @facts.py:139 - Loading custom facts from: /etc/rhsm/facts/katello.facts
2017-08-18 09:23:52,851 [ERROR] subscription-manager:380:MainThread @managercli.py:174 - Error during registration: timed out
2017-08-18 09:23:52,851 [ERROR] subscription-manager:380:MainThread @managercli.py:175 - timed out
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/subscription_manager/managercli.py", line 1136, in _do_command
    content_tags=self.installed_mgr.tags)
  File "/usr/lib64/python2.7/site-packages/rhsm/connection.py", line 928, in registerConsumer
    return self.conn.request_post(url, params)
  File "/usr/lib64/python2.7/site-packages/rhsm/connection.py", line 697, in request_post
    return self._request("POST", method, params)
  File "/usr/lib64/python2.7/site-packages/rhsm/connection.py", line 591, in _request
    response = conn.getresponse()
  File "/usr/lib64/python2.7/httplib.py", line 1089, in getresponse
    response.begin()
  File "/usr/lib64/python2.7/httplib.py", line 444, in begin
    version, status, reason = self._read_status()
  File "/usr/lib64/python2.7/httplib.py", line 400, in _read_status
    line = self.fp.readline(_MAXLINE + 1)
  File "/usr/lib64/python2.7/socket.py", line 476, in readline
    data = self._sock.recv(self._rbufsize)
  File "/usr/lib64/python2.7/site-packages/M2Crypto/SSL/Connection.py", line 228, in read
    return self._read_bio(size)
  File "/usr/lib64/python2.7/site-packages/M2Crypto/SSL/Connection.py", line 213, in _read_bio
    return m2.ssl_read(self.ssl, size, self._timeout)
SSLTimeoutError: timed out


As a measure to fix the solution a little bit, I tried to increase the RHSM timeout, which indeed has some effect on the total no. of hosts which are able to register successfully but that does not seem to help much.

In Satellite 6.2, we were easily able to achieve a concurrent registration count of 75 hosts but the same is giving problems with Satellite 6.3.

Please let me know, if you require logs from any specific component

Regards,
Saurabh Badhwar

Comment 3 Ivan Necas 2017-08-23 08:34:42 UTC
Michael: is this something we're aware of and should be adressed with later cp version? Is there another bz we could close this as a dupe?

Comment 4 Barnaby Court 2017-08-28 18:55:28 UTC
It is likely not a complete fix (as I have not analyzed across katello interactions) but Candlepin 2.1.3-1 includes code to significantly reduce lock contention during bind which often happens during concurrent registration using activation keys that attach to a single pool.

Comment 9 Mike McCune 2017-08-29 16:08:09 UTC
We are going to be including Candlepin 2.1 in 6.3 which has some registration performance fixes that we hope will help with this situation.

Comment 10 Mike McCune 2017-08-29 16:08:44 UTC
flagged as a regression as this degraded from 6.2

Comment 13 Ivan Necas 2017-11-02 12:16:24 UTC
Based on https://bugzilla.redhat.com/show_bug.cgi?id=1480071#c7, it seems the issue is no long reproducible + we pulled in new candlepin version, that should have also positive influence on the performance. I'm closing it for now: feel free to re-open, if it re-appears, with more details on what the issue is.


Note You need to log in before you can comment on or make changes to this bug.