Bug 1480071
| Summary: | Concurrent registrations seems to have frequent RHSM timeouts | ||
|---|---|---|---|
| Product: | Red Hat Satellite | Reporter: | sbadhwar |
| Component: | Performance | Assignee: | satellite6-bugs <satellite6-bugs> |
| Status: | CLOSED WORKSFORME | QA Contact: | |
| Severity: | low | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 6.3.0 | CC: | alosadag, bbuckingham, bcourt, inecas, mmccune, mstead, psuriset, sbadhwar |
| Target Milestone: | Unspecified | Keywords: | Regression, Triaged |
| Target Release: | Unused | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-11-02 12:16:24 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
sbadhwar
2017-08-10 05:32:03 UTC
Could you provide as much debug information as possible to pinpoint what the issues are? Without any data we can't do any analysis of where the issues actually are. (In reply to Ivan Necas from comment #1) > Could you provide as much debug information as possible to pinpoint what the > issues are? Without any data we can't do any analysis of where the issues > actually are. Hello Ivan, While doing concurrent registrations for the content hosts(75 hosts registering concurrently to Satellite 6.3), we see a number of hosts failing to register to Satellite. The error that is displayed is: Unable to verify server's identity: timed out On checking the RHSM log of the failed host, the following trace is present 2017-08-18 09:16:52,477 [INFO] subscription-manager:380:MainThread @hwprobe.py:916 - collected virt facts: virt.is_guest=True, virt.host_type=lxc, docker, virt.uuid=Not Set 2017-08-18 09:16:52,478 [INFO] subscription-manager:380:MainThread @facts.py:139 - Loading custom facts from: /etc/rhsm/facts/katello.facts 2017-08-18 09:23:52,851 [ERROR] subscription-manager:380:MainThread @managercli.py:174 - Error during registration: timed out 2017-08-18 09:23:52,851 [ERROR] subscription-manager:380:MainThread @managercli.py:175 - timed out Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/subscription_manager/managercli.py", line 1136, in _do_command content_tags=self.installed_mgr.tags) File "/usr/lib64/python2.7/site-packages/rhsm/connection.py", line 928, in registerConsumer return self.conn.request_post(url, params) File "/usr/lib64/python2.7/site-packages/rhsm/connection.py", line 697, in request_post return self._request("POST", method, params) File "/usr/lib64/python2.7/site-packages/rhsm/connection.py", line 591, in _request response = conn.getresponse() File "/usr/lib64/python2.7/httplib.py", line 1089, in getresponse response.begin() File "/usr/lib64/python2.7/httplib.py", line 444, in begin version, status, reason = self._read_status() File "/usr/lib64/python2.7/httplib.py", line 400, in _read_status line = self.fp.readline(_MAXLINE + 1) File "/usr/lib64/python2.7/socket.py", line 476, in readline data = self._sock.recv(self._rbufsize) File "/usr/lib64/python2.7/site-packages/M2Crypto/SSL/Connection.py", line 228, in read return self._read_bio(size) File "/usr/lib64/python2.7/site-packages/M2Crypto/SSL/Connection.py", line 213, in _read_bio return m2.ssl_read(self.ssl, size, self._timeout) SSLTimeoutError: timed out As a measure to fix the solution a little bit, I tried to increase the RHSM timeout, which indeed has some effect on the total no. of hosts which are able to register successfully but that does not seem to help much. In Satellite 6.2, we were easily able to achieve a concurrent registration count of 75 hosts but the same is giving problems with Satellite 6.3. Please let me know, if you require logs from any specific component Regards, Saurabh Badhwar Michael: is this something we're aware of and should be adressed with later cp version? Is there another bz we could close this as a dupe? It is likely not a complete fix (as I have not analyzed across katello interactions) but Candlepin 2.1.3-1 includes code to significantly reduce lock contention during bind which often happens during concurrent registration using activation keys that attach to a single pool. We are going to be including Candlepin 2.1 in 6.3 which has some registration performance fixes that we hope will help with this situation. flagged as a regression as this degraded from 6.2 Based on https://bugzilla.redhat.com/show_bug.cgi?id=1480071#c7, it seems the issue is no long reproducible + we pulled in new candlepin version, that should have also positive influence on the performance. I'm closing it for now: feel free to re-open, if it re-appears, with more details on what the issue is. |