Bug 1103824
| Summary: | Cronjob /etc/cron.daily/rhsmd fails sometimes with python traceback | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Robert Scheck <redhat-bugzilla> | |
| Component: | subscription-manager | Assignee: | Devan Goodwin <dgoodwin> | |
| Status: | CLOSED ERRATA | QA Contact: | John Sefler <jsefler> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 6.5 | CC: | ahumbe, alikins, bkearney, brian, brian.murrell, crispin.bivans, dgoodwin, hmiles, jamorgan, jsvarova, pgervase, robert.scheck, skallesh, xdmoon | |
| Target Milestone: | rc | Keywords: | Reopened, ZStream | |
| Target Release: | --- | |||
| Hardware: | All | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: |
Previously, unexpected problems in the server could lead to replies to subscription-manager that were not Javascript Object Notation (JSON)-formatted. This in turn caused subscription-manager to log traceback errors. As a consequence, subscription-manager appeared to be working incorrectly even though the error was only in the server. With this update, subscription-manager has been fixed to handle network errors more discretely: network errors and non-JSON responses from the server are now filtered and logged gracefully.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1167463 1167464 1170768 1171385 (view as bug list) | Environment: | ||
| Last Closed: | 2015-07-22 06:51:52 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1125249, 1167463, 1167464, 1170768, 1170769, 1171385 | |||
Reopened related case #01092677 at Red Hat Customer Portal. Another one from tonight also with python-rhsm-1.9.7-1.el6_5.x86_64 from
another box:
--- snipp ---
Date: Tue, 3 Jun 2014 03:13:20 +0200 (CEST)
From: Anacron <root.net>
To: root.net
Subject: Anacron job 'cron.daily' on puffy.example.net
Message-Id: <20140603011320.59749301C42.net>
/etc/cron.daily/rhsmd:
Traceback (most recent call last):
File "/usr/libexec/rhsmd", line 263, in <module>
main()
File "/usr/libexec/rhsmd", line 224, in main
status = check_status(force_signal)
File "/usr/libexec/rhsmd", line 86, in check_status
sorter = require(CERT_SORTER)
File "/usr/share/rhsm/subscription_manager/injection.py", line 98, in require
return FEATURES.require(feature, *args, **kwargs)
File "/usr/share/rhsm/subscription_manager/injection.py", line 72, in require
self.providers[feature] = provider()
File "/usr/share/rhsm/subscription_manager/cert_sorter.py", line 321, in __init__
super(CertSorter, self).__init__()
File "/usr/share/rhsm/subscription_manager/cert_sorter.py", line 66, in __init__
self.load()
File "/usr/share/rhsm/subscription_manager/cert_sorter.py", line 116, in load
self._parse_server_status()
File "/usr/share/rhsm/subscription_manager/cert_sorter.py", line 135, in _parse_server_status
status = self.get_compliance_status()
File "/usr/share/rhsm/subscription_manager/cert_sorter.py", line 333, in get_compliance_status
return status_cache.load_status(self.cp_provider.get_consumer_auth_cp(), self.identity.uuid)
File "/usr/share/rhsm/subscription_manager/cache.py", line 216, in load_status
self._sync_with_server(uep, uuid)
File "/usr/share/rhsm/subscription_manager/cache.py", line 201, in _sync_with_server
self.server_status = uep.getCompliance(uuid)
File "/usr/lib64/python2.6/site-packages/rhsm/connection.py", line 817, in getCompliance
return self.conn.request_get(method)
File "/usr/lib64/python2.6/site-packages/rhsm/connection.py", line 541, in request_get
return self._request("GET", method)
File "/usr/lib64/python2.6/site-packages/rhsm/connection.py", line 448, in _request
conn.request(request_type, handler, body=body, headers=headers)
File "/usr/lib64/python2.6/httplib.py", line 917, in request
if v.args[0] != 32 or not self.auto_open:
IndexError: tuple index out of range
--- snapp ---
Looks like variations of drop or malformed server responses, presumably originating with server errors from subscription.rhn.redhat.com. Various folks are working on sorting out the causes of those errors, but of course the client shouldn't be throwing stacktraces in those cases, especially when invoked from cron. Given that IndexError bubblinh up from httplib example, it likely needs a top level sys.excepthook to catch and log those cases. I'm seeing a very similar problem with subscription-manager {register|attach}. From /var/log/rhsm/rhsm.log:
2014-09-16 10:57:22,265 [DEBUG] subscription-manager @connection.py:446 - Making request: GET https://subscription.rhn.redhat.com:443/subscription/users/xxxxxxx%40example.com/owners
2014-09-16 10:58:22,534 [ERROR] subscription-manager @managercli.py:154 - Error during registration: tuple index out of range
2014-09-16 10:58:22,535 [ERROR] subscription-manager @managercli.py:155 - tuple index out of range
Traceback (most recent call last):
File "/usr/share/rhsm/subscription_manager/managercli.py", line 1035, in _do_command
owner_key = self._determine_owner_key(admin_cp)
File "/usr/share/rhsm/subscription_manager/managercli.py", line 1178, in _determine_owner_key
owners = cp.getOwnerList(self.username)
File "/usr/lib64/python2.6/site-packages/rhsm/connection.py", line 911, in getOwnerList
return self.conn.request_get(method)
File "/usr/lib64/python2.6/site-packages/rhsm/connection.py", line 550, in request_get
return self._request("GET", method)
File "/usr/lib64/python2.6/site-packages/rhsm/connection.py", line 453, in _request
conn.request(request_type, handler, body=body, headers=headers)
File "/usr/lib64/python2.6/httplib.py", line 917, in request
if v.args[0] != 32 or not self.auto_open:
IndexError: tuple index out of range
We have been seeing python tracebacks when an RHEL6 server is using rhsmd via Zscaler www+https content filtering. Avoiding the content filter resolves the issue for us. Moving bugs to 6.7 Moving bugs to 6.7 Moving bugs to 6.7 This issue has been unfortunately still not been resolved. The recent outage
or downtime of the Red Hat portal caused the following message for all of our
customers (subscription-manager-1.12.14-7.el6.x86_64 in use):
--- snipp ---
Date: Sun, 9 Nov 2014 03:21:06 +0100 (CET)
From: Anacron <root.net>
To: root.net
Subject: Anacron job 'cron.daily' on tux.example.net
Message-Id: <20141109022106.C4DC16801C7.net>
/etc/cron.daily/rhsmd:
Traceback (most recent call last):
File "/usr/libexec/rhsmd", line 265, in <module>
main()
File "/usr/libexec/rhsmd", line 226, in main
status = check_status(force_signal)
File "/usr/libexec/rhsmd", line 88, in check_status
sorter = require(CERT_SORTER)
File "/usr/share/rhsm/subscription_manager/injection.py", line 102, in require
return FEATURES.require(feature, *args, **kwargs)
File "/usr/share/rhsm/subscription_manager/injection.py", line 76, in require
self.providers[feature] = provider()
File "/usr/share/rhsm/subscription_manager/cert_sorter.py", line 317, in __init__
super(CertSorter, self).__init__()
File "/usr/share/rhsm/subscription_manager/cert_sorter.py", line 65, in __init__
self.load()
File "/usr/share/rhsm/subscription_manager/cert_sorter.py", line 111, in load
self._parse_server_status()
File "/usr/share/rhsm/subscription_manager/cert_sorter.py", line 130, in _parse_server_status
status = self.get_compliance_status()
File "/usr/share/rhsm/subscription_manager/cert_sorter.py", line 325, in get_compliance_status
return status_cache.load_status(self.cp_provider.get_consumer_auth_cp(), self.identity.uuid)
File "/usr/share/rhsm/subscription_manager/cache.py", line 181, in load_status
self._sync_with_server(uep, uuid)
File "/usr/share/rhsm/subscription_manager/cache.py", line 276, in _sync_with_server
self.server_status = uep.getCompliance(uuid)
File "/usr/lib64/python2.6/site-packages/rhsm/connection.py", line 869, in getCompliance
return self.conn.request_get(method)
File "/usr/lib64/python2.6/site-packages/rhsm/connection.py", line 550, in request_get
return self._request("GET", method)
File "/usr/lib64/python2.6/site-packages/rhsm/connection.py", line 478, in _request
self.validateResponse(result, request_type, handler)
File "/usr/lib64/python2.6/site-packages/rhsm/connection.py", line 526, in validateResponse
handler=handler)
rhsm.connection.RemoteServerException: Server error attempting a GET to /subscription/consumers/<removed>/compliance returned status 502
--- snapp ---
Escalated case #01115159 on the Red Hat customer portal to move this forward Having this bug marked as a duplicate causes mixed feelings to me... This bug was reported first before the 1113741 bug, and the duplicate bug is hidden behind an "I'm not authorized enough as a customer to see this highly visible and intrusive bug" message. Every customer now has a webservice monitoring tool that is annoying. At least make it open if the newer bug somehow got better technical info and priority before this one. Yes, if you are going to close a bug as a duplicate, please don't make it a duplicate of a bug that nobody can see. How are we to follow progress and/or contribute to the resolution? (In reply to Brian J. Murrell from comment #17) > Yes, if you are going to close a bug as a duplicate, please don't make it a > duplicate of a bug that nobody can see. How are we to follow progress > and/or contribute to the resolution? I posted to the other bug about the past three comments. We'll work on this. Apologies trying to clean up the bugs related to this, picked one that looked like it had the most info and was where the resolution was ongoing, didn't realize it was private. Will re-open this for time being. For an update, we have two teams tackling this from different angles. The underlying cause has proven extremely hard to track down, but we're close to rolling out a change server side which should make the issue disappear for everyone without any client updates needed. @Devan: Thanks for the reopen here and more importantly the update. Please do keep us posted as I am sure all of the real activity is going on in the private case. (In reply to Devan Goodwin from comment #19) > For an update, we have two teams tackling this from different angles. The > underlying cause has proven extremely hard to track down, but we're close to > rolling out a change server side which should make the issue disappear for > everyone without any client updates needed. Extremely hard to track down is somewhat relative. Install a RHEL 6, register to RHSM Hosted using subscription-manager and schedule a RHSM Hosted outage/ downtime/maintainance for 24 hours - and you will have that reproducible. But as this isn't realistic a staging environment should be fine as well. In the end the trackbacks are IMHO caused due to "outage websites" that are set up by the Red Hat infrastructure while RHSM Hosted is down. We're beginning the process to push a zstream fix out for latest RHEL 5, 6 and 7 to handle this issue more gracefully. Server side investigation into why intermittent 502s are happening is also on-going but nothing successful thus far. Fixed to handle intermittent 502s went into subscription-manager.git in: commit f26b4f7c6c98f090efba6063e97434e23a83fbbc Author: Adrian Likins <alikins> Date: Thu Nov 6 15:16:35 2014 -0500 This first appears in build: subscription-manager-1.13.9-1 Testing this will require a mock server (http://www.mock-server.com/) as we can't reliably reproduce the bad gateway issue that is popping up. Steps to verify: 1. Run a mock server docker container on port 8444. Make sure the port is open if you will be having subscription-manager connect from another machine. $ docker run -d -p 8444:8090 jamesdbloom/mockserver 2. Create a file outlining the expectations we have: $ cat mock.json { "httpRequest": { "method": "GET", "path": ".*/compliance" }, "httpResponse": { "statusCode": 502, "body": "<!DOCTYPE HTML PUBLIC -//IETF//DTD HTML 2.0//EN>\n<html><head>\n<title>502 Proxy Error</title>\n</head><body><h1>Proxy Error</h1>\n<p>The proxy server received an invalid\r\nresponse from an upstream server.<br />\r\nThe proxy server could not handle the request <em><a href=/subscription/consumers/9decaece-a2eb-42e6-8d2c-0fd541a38499/compliance>GET /subscription/consumers/9decaece-a2eb-42e6-8d2c-0fd541a38499/compliance</a></em>.<p>\nReason: <strong>Error reading from remote server</strong></p></p>\n<hr>\n<address>Apache Server at subscription.rhn.redhat.com Port 443</address>\n</body></html>" } } 3. Push the expectations to the mock server: $ curl -k https://localhost:8444/expectation -X PUT -H "Content-Type: application/json" --data @mock.json 4. Make sure it works manually: $ curl -k https://localhost:8444/consumers/kjashdkjashd/compliance <!DOCTYPE HTML PUBLIC -//IETF//DTD HTML 2.0//EN> <html><head> <title>502 Proxy Error</title> </head><body><h1>Proxy Error</h1> <p>The proxy server received an invalid response from an upstream server.<br /> The proxy server could not handle the request <em><a href=/subscription/consumers/9decaece-a2eb-42e6-8d2c-0fd541a38499/compliance>GET /subscription/consumers/9decaece-a2eb-42e6-8d2c-0fd541a38499/compliance</a></em>.<p> Reason: <strong>Error reading from remote server</strong></p></p> <hr> <address>Apache Server at subscription.rhn.redhat.com Port 443</address> </body></html>% $ curl -k https://localhost:8444/somethingelse $ Now we're ready to have subscription-manager talk to this. 5. Register a subscription-manager client normally to any server/environment you wish. 6. After successful registration, edit /etc/rhsm/rhsm.conf: hostname = mock-server-hostname/ip port = 8444 insecure = 1 # we need to disable ssl cert verification for this to work At this point, manually running rhsmd should result in: $ /usr/bin/python /usr/libexec/rhsmd -s Traceback (most recent call last): File "/usr/libexec/rhsmd", line 287, in <module> main() File "/usr/libexec/rhsmd", line 248, in main status = check_status(force_signal) File "/usr/libexec/rhsmd", line 88, in check_status sorter = require(CERT_SORTER) File "/usr/share/rhsm/subscription_manager/injection.py", line 102, in require return FEATURES.require(feature, *args, **kwargs) File "/usr/share/rhsm/subscription_manager/injection.py", line 76, in require self.providers[feature] = provider() File "/usr/share/rhsm/subscription_manager/cert_sorter.py", line 317, in __init__ super(CertSorter, self).__init__() File "/usr/share/rhsm/subscription_manager/cert_sorter.py", line 65, in __init__ self.load() File "/usr/share/rhsm/subscription_manager/cert_sorter.py", line 111, in load self._parse_server_status() File "/usr/share/rhsm/subscription_manager/cert_sorter.py", line 130, in _parse_server_status status = self.get_compliance_status() File "/usr/share/rhsm/subscription_manager/cert_sorter.py", line 325, in get_compliance_status return status_cache.load_status(self.cp_provider.get_consumer_auth_cp(), self.identity.uuid) File "/usr/share/rhsm/subscription_manager/cache.py", line 181, in load_status self._sync_with_server(uep, uuid) File "/usr/share/rhsm/subscription_manager/cache.py", line 276, in _sync_with_server self.server_status = uep.getCompliance(uuid) File "/usr/lib64/python2.7/site-packages/rhsm/connection.py", line 891, in getCompliance return self.conn.request_get(method) File "/usr/lib64/python2.7/site-packages/rhsm/connection.py", line 572, in request_get return self._request("GET", method) File "/usr/lib64/python2.7/site-packages/rhsm/connection.py", line 500, in _request self.validateResponse(result, request_type, handler) File "/usr/lib64/python2.7/site-packages/rhsm/connection.py", line 548, in validateResponse handler=handler) rhsm.connection.RemoteServerException: Server error attempting a GET to /candlepin/consumers/60a01a9f-9820-4443-bc79-fe4d26479623/compliance returned status 404 Upgrading to newest subscription-manager: $ /usr/bin/python /usr/libexec/rhsmd -s $ echo $? 0 NOTE: This does result in a probably misleading "This system is missing one or more subscriptions. Please run subscription-manager for more information." in syslog. Needs to be dealt with. My mistake, syslog entry was correct, and is using the last cached status from server. This behavior seems correct. Verified!!
[root@dhcp35-93 ~]# subscription-manager version
server type: Red Hat Subscription Management
subscription management server: Unknown
subscription management rules: Unknown
subscription-manager: 1.13.10-1.el6
python-rhsm: 1.13.8-1.el6
[root@dhcp35-93 ~]# /usr/bin/python /usr/libexec/rhsmd -s
[root@dhcp35-93 ~]# echo $?
0
[root@dhcp35-93 ~]# tail -f /var/log/messages
Dec 3 11:17:50 dhcp35-93 mcelog: failed to prefill DIMM database from DMI data
Dec 3 11:17:52 dhcp35-93 abrtd: Init complete, entering main loop
Dec 3 11:17:54 dhcp35-93 kernel: Bridge firewalling registered
Dec 3 11:23:10 dhcp35-93 dbus: [system] Reloaded configuration
Dec 3 11:23:10 dhcp35-93 dbus: [system] Reloaded configuration
Dec 3 11:23:10 dhcp35-93 dbus: [system] Reloaded configuration
Dec 3 11:23:10 dhcp35-93 dbus: [system] Reloaded configuration
Dec 3 11:23:10 dhcp35-93 dbus: [system] Reloaded configuration
Dec 3 11:23:13 dhcp35-93 dbus: [system] Reloaded configuration
Dec 3 11:24:07 dhcp35-93 rhsmd: This system is missing one or more subscriptions. Please run subscription-manager for more information.
Dec 3 11:25:42 dhcp35-93 rhsmd: This system is missing one or more subscriptions. Please run subscription-manager for more information.
[root@dhcp35-93 ~]# tail -f /var/log/rhsm/rhsm.log
2014-12-03 11:25:41,925 [INFO] rhsmd @rhsmd:302 - rhsmd started
2014-12-03 11:25:41,925 [INFO] rhsmd @rhsmd:333 - logging subscription status to syslog
2014-12-03 11:25:41,957 [DEBUG] rhsmd @identity.py:131 - Loading consumer info from identity certificates.
2014-12-03 11:25:41,964 [DEBUG] rhsmd @profile.py:97 - Loading current RPM profile.
2014-12-03 11:25:42,034 [INFO] rhsmd @connection.py:679 - Using certificate authentication: key = /etc/pki/consumer/key.pem, cert = /etc/pki/consumer/cert.pem, ca = /etc/rhsm/ca/, insecure = True
2014-12-03 11:25:42,034 [INFO] rhsmd @connection.py:690 - Connection Built: host: localhost, port: 8444, handler: /subscription
2014-12-03 11:25:42,035 [INFO] rhsmd @cache.py:138 - Checking current system info against cache: /var/lib/rhsm/cache/installed_products.json
2014-12-03 11:25:42,036 [INFO] rhsmd @cache.py:155 - No changes.
2014-12-03 11:25:42,036 [DEBUG] rhsmd @certdirectory.py:216 - Installed product IDs: ['69', '85']
2014-12-03 11:25:42,037 [DEBUG] rhsmd @connection.py:466 - Making request: GET /subscription/consumers/ff024b48-60e9-4b97-b4c3-0a77b85fa311/compliance
2014-12-03 11:25:42,048 [ERROR] rhsmd @cache.py:204 - [Errno 111] Connection refused
2014-12-03 11:25:42,048 [WARNING] rhsmd @cache.py:210 - Unable to reach server, using cached status.
2014-12-03 11:25:42,049 [DEBUG] rhsmd @cert_sorter.py:193 - valid entitled products: []
2014-12-03 11:25:42,049 [DEBUG] rhsmd @cert_sorter.py:194 - expired entitled products: []
2014-12-03 11:25:42,049 [DEBUG] rhsmd @cert_sorter.py:195 - partially entitled products: []
2014-12-03 11:25:42,050 [DEBUG] rhsmd @cert_sorter.py:196 - unentitled products: [u'69', u'85']
2014-12-03 11:25:42,050 [DEBUG] rhsmd @cert_sorter.py:197 - future products: []
2014-12-03 11:25:42,050 [DEBUG] rhsmd @cert_sorter.py:198 - partial stacks: []
2014-12-03 11:25:42,050 [DEBUG] rhsmd @cert_sorter.py:199 - entitlements valid until: None
2014-12-03 11:25:42,057 [INFO] rhsmd @rhsmd:295 - rhsmd: This system is missing one or more subscriptions. Please run subscription-manager for more information.
For the index error case:
[root@dhcp35-93 ~]# /etc/cron.daily/rhsmd
[root@dhcp35-93 ~]#
[root@dhcp35-93 ~]# tail -f /var/log/rhsm/rhsm.log
2014-12-03 12:11:01,260 [INFO] rhsmd @rhsmd:302 - rhsmd started
2014-12-03 12:11:01,261 [INFO] rhsmd @rhsmd:333 - logging subscription status to syslog
2014-12-03 12:11:01,292 [DEBUG] rhsmd @identity.py:131 - Loading consumer info from identity certificates.
2014-12-03 12:11:01,297 [DEBUG] rhsmd @profile.py:97 - Loading current RPM profile.
2014-12-03 12:11:01,366 [INFO] rhsmd @connection.py:679 - Using certificate authentication: key = /etc/pki/consumer/key.pem, cert = /etc/pki/consumer/cert.pem, ca = /etc/rhsm/ca/, insecure = True
2014-12-03 12:11:01,367 [INFO] rhsmd @connection.py:690 - Connection Built: host: localhost, port: 8444, handler: /subscription
2014-12-03 12:11:01,367 [INFO] rhsmd @cache.py:138 - Checking current system info against cache: /var/lib/rhsm/cache/installed_products.json
2014-12-03 12:11:01,368 [INFO] rhsmd @cache.py:155 - No changes.
2014-12-03 12:11:01,369 [DEBUG] rhsmd @certdirectory.py:216 - Installed product IDs: ['69', '85']
2014-12-03 12:11:01,369 [DEBUG] rhsmd @connection.py:466 - Making request: GET /subscription/consumers/ff024b48-60e9-4b97-b4c3-0a77b85fa311/compliance
2014-12-03 12:11:01,371 [ERROR] rhsmd @rhsmd:60 - Unhandled rhsmd exception caught by the logging excepthook: Traceback (most recent call last):
File "/usr/libexec/rhsmd", line 379, in <module>
main()
File "/usr/libexec/rhsmd", line 334, in main
status = check_status(force_signal)
File "/usr/libexec/rhsmd", line 120, in check_status
sorter = require(CERT_SORTER)
File "/usr/share/rhsm/subscription_manager/injection.py", line 102, in require
return FEATURES.require(feature, *args, **kwargs)
File "/usr/share/rhsm/subscription_manager/injection.py", line 76, in require
self.providers[feature] = provider()
File "/usr/share/rhsm/subscription_manager/cert_sorter.py", line 317, in __init__
super(CertSorter, self).__init__()
File "/usr/share/rhsm/subscription_manager/cert_sorter.py", line 65, in __init__
self.load()
File "/usr/share/rhsm/subscription_manager/cert_sorter.py", line 111, in load
self._parse_server_status()
File "/usr/share/rhsm/subscription_manager/cert_sorter.py", line 130, in _parse_server_status
status = self.get_compliance_status()
File "/usr/share/rhsm/subscription_manager/cert_sorter.py", line 325, in get_compliance_status
return status_cache.load_status(self.cp_provider.get_consumer_auth_cp(), self.identity.uuid)
File "/usr/share/rhsm/subscription_manager/cache.py", line 181, in load_status
self._sync_with_server(uep, uuid)
File "/usr/share/rhsm/subscription_manager/cache.py", line 266, in _sync_with_server
self.server_status = uep.getCompliance(uuid)
File "/usr/lib64/python2.6/site-packages/rhsm/connection.py", line 889, in getCompliance
return self.conn.request_get(method)
File "/usr/lib64/python2.6/site-packages/rhsm/connection.py", line 570, in request_get
return self._request("GET", method)
File "/usr/lib64/python2.6/site-packages/rhsm/connection.py", line 473, in _request
conn.request(request_type, handler, body=body, headers=headers)
File "/usr/lib64/python2.6/httplib.py", line 915, in request
raise IndexError()
IndexError
[root@dhcp35-93 ~]# tail -f /var/log/messages
Dec 3 11:17:50 dhcp35-93 mcelog: failed to prefill DIMM database from DMI data
Dec 3 11:17:52 dhcp35-93 abrtd: Init complete, entering main loop
Dec 3 11:17:54 dhcp35-93 kernel: Bridge firewalling registered
Dec 3 11:23:10 dhcp35-93 dbus: [system] Reloaded configuration
Dec 3 11:23:10 dhcp35-93 dbus: [system] Reloaded configuration
Dec 3 11:23:10 dhcp35-93 dbus: [system] Reloaded configuration
Dec 3 11:23:10 dhcp35-93 dbus: [system] Reloaded configuration
Dec 3 11:23:10 dhcp35-93 dbus: [system] Reloaded configuration
Dec 3 11:23:13 dhcp35-93 dbus: [system] Reloaded configuration
Dec 3 11:24:07 dhcp35-93 rhsmd: This system is missing one or more subscriptions. Please run subscription-manager for more information.
Dec 3 11:25:42 dhcp35-93 rhsmd: This system is missing one or more subscriptions. Please run subscription-manager for more information.
Dec 3 11:30:23 dhcp35-93 rhsmd: This system is missing one or more subscriptions. Please run subscription-manager for more information.
John requests moving this back to MODIFIED as there is no 6.7 errata yet. Doing so and cloning for z-stream. As indicated in the comments above, there are two fixes needed to address the issues presented in this bug report. The first is server-side and will be handled by bug 1113741 to fix the source of the offending 502 Proxy responses. The second is client-side which requires fixes to subscription-manager to more gracefully handle 502 Proxy responses that are received from the server. We'll use this bug to assert that subscription-manager version 1.14 targeted for release on rhel6.7 includes a client-side fix. First, let's begin with subscription-manager-1.12.14-7.el6 which was released on rhel6.6 to demonstrate/create/fake a reproducible scenario resulting in rhsmd tracebacks from a cron job which flood root with unnecessary cron e-mails as reported in comment 0. I will employ netcat to force a fake "502 Proxy Response" from the entitlement server like the one reported in comment 0. 0. Register the system using subscription-manager version released on RHEL6.6... [root@jsefler-os6 ~]# subscription-manager version server type: This system is currently not registered. subscription management server: 0.9.26.8-1 subscription management rules: 5.12 subscription-manager: 1.12.14-7.el6 python-rhsm: 1.12.5-2.el6 [root@jsefler-os6 ~]# [root@jsefler-os6 ~]# subscription-manager register --serverurl=subscription.rhn.stage.redhat.com:443/subscription Username: qa Password: The system has been registered with ID: e9f2c7ae-0e11-4131-9235-599476faa175 [root@jsefler-os6 ~]# Now that the system is registered, I will use a different system, jsefler-7.usersys.redhat.com, to create a very crude server using netcat that only replies with a 502 response. Once this is running, we'll come back to the RHEL6.6 system and change the rhsm configuration to communicate with the crude netcat server. Here we go, step... 1. Create a self-signed certificate for netcat to use. [root@jsefler-7 ~]# hostname jsefler-7.usersys.redhat.com [root@jsefler-7 ~]# mkdir crude [root@jsefler-7 ~]# cd crude/ [root@jsefler-7 crude]# openssl genrsa -out nc.key 2048 Generating RSA private key, 2048 bit long modulus ...........................+++ ...........................+++ e is 65537 (0x10001) [root@jsefler-7 crude]# openssl req -new -key nc.key -out nc.csr -subj "/C=US/O=QA/CN=localhost" [root@jsefler-7 crude]# openssl x509 -req -days 3650 -in nc.csr -signkey nc.key -out nc.crt Signature ok subject=/C=US/O=QA/CN=localhost Getting Private key [root@jsefler-7 crude]# ls nc.crt nc.csr nc.key [root@jsefler-7 crude]# Now we have a self-signed certificate and a private key for that cert. The CSR file is extraneous at this point. 2. Create a response file named 502.response with exactly these contents... [root@jsefler-7 crude]# cat 502.response HTTP/1.1 502 Bad Gateway Content-Length: 590 Connection: keep-alive <!DOCTYPE HTML PUBLIC -//IETF//DTD HTML 2.0//EN> <html><head> <title>502 Proxy Error</title> </head><body><h1>Proxy Error</h1> <p>The proxy server received an invalid response from an upstream server.<br /> The proxy server could not handle the request <em><a href=/subscription/consumers/9decaece-a2eb-42e6-8d2c-0fd541a38499/compliance>GET /subscription/consumers/9decaece-a2eb-42e6-8d2c-0fd541a38499/compliance</a></em>.<p> Reason: <strong>Error reading from remote server</strong></p></p> <hr> <address>Apache Server at subscription.rhn.redhat.com Port 443</address> </body></html> [root@jsefler-7 crude]# 3. Start netcat configured to respond to any request with the desired reply. The -l argument takes the port number to listen on (8445 in this case). The -k option keeps netcat listening forever (ctrl-C to stop). The ssl options allow netcat to respond to requests over SSL. The -e option is a command that netcat will execute on every request. In this case, it will cat the file 502.response file and send it back to subscription-manager. [root@jsefler-7 crude]# nc -l 8445 -k --ssl-cert nc.crt --ssl-key nc.key -e '/bin/cat 502.response' 4. To verify that netcat is working, open a second terminal shell since the one above is running in the foreground and run... [root@jsefler-7 ~]# cd crude/ [root@jsefler-7 crude]# curl -k -v https://localhost:8445/ * About to connect() to localhost port 8445 (#0) * Trying ::1... * Connected to localhost (::1) port 8445 (#0) * Initializing NSS with certpath: sql:/etc/pki/nssdb * skipping SSL peer certificate verification * SSL connection using TLS_RSA_WITH_AES_128_CBC_SHA * Server certificate: * subject: CN=localhost,O=QA,C=US * start date: Mar 17 20:02:39 2015 GMT * expire date: Mar 14 20:02:39 2025 GMT * common name: localhost * issuer: CN=localhost,O=QA,C=US > GET / HTTP/1.1 > User-Agent: curl/7.29.0 > Host: localhost:8445 > Accept: */* > < HTTP/1.1 502 Bad Gateway < Content-Length: 590 < Connection: keep-alive < <!DOCTYPE HTML PUBLIC -//IETF//DTD HTML 2.0//EN> <html><head> <title>502 Proxy Error</title> </head><body><h1>Proxy Error</h1> <p>The proxy server received an invalid response from an upstream server.<br /> The proxy server could not handle the request <em><a href=/subscription/consumers/9decaece-a2eb-42e6-8d2c-0fd541a38499/compliance>GET /subscription/consumers/9decaece-a2eb-42e6-8d2c-0fd541a38499/compliance</a></em>.<p> Reason: <strong>Error reading from remote server</strong></p></p> <hr> <address>Apache Server at subscription.rhn.redhat.com Port 443</address> </body></html> * Connection #0 to host localhost left intact [root@jsefler-7 crude]# Its working! 5. Now let's get back to our RHEL6.6 system and reconfigure rhsm to communicate to the crude server... [root@jsefler-os6 ~]# subscription-manager config --server.hostname=jsefler-7.usersys.redhat.com --server.port=8445 --server.insecure=1 [root@jsefler-os6 ~]# 6. Now let's clear the crontab, delete all root mail, and then create a new cron job that runs the offending daily cron job on a per minute basis so we don't have to wait a day. crontab -e will put you into a vi editor; insert the code "* * * * * /etc/cron.daily/rhsmd" shown from the crontab -l listing... [root@jsefler-os6 ~]# crontab -r [root@jsefler-os6 ~]# echo "d*" | mail > /dev/null [root@jsefler-os6 ~]# mail No mail for root [root@jsefler-os6 ~]# crontab -e no crontab for root - using an empty one crontab: installing new crontab [root@jsefler-os6 ~]# crontab -l * * * * * /etc/cron.daily/rhsmd [root@jsefler-os6 ~]# sleep 60 You have new mail in /var/spool/mail/root [root@jsefler-os6 ~]# mail Heirloom Mail version 12.4 7/29/08. Type ? for help. "/var/spool/mail/root": 2 messages 2 new >N 1 Cron Daemon Tue Mar 17 16:40 55/2901 "Cron <root@jsefler-os6> /etc/cron.daily/rhsmd" N 2 ABRT Daemon Tue Mar 17 16:40 74/4263 "[abrt] a crash has been detected again" & 1 Message 1: From root Tue Mar 17 16:40:02 2015 Return-Path: <root> X-Original-To: root Delivered-To: root From: root (Cron Daemon) To: root Subject: Cron <root@jsefler-os6> /etc/cron.daily/rhsmd Content-Type: text/plain; charset=UTF-8 Auto-Submitted: auto-generated X-Cron-Env: <LANG=en_US.UTF-8> X-Cron-Env: <SHELL=/bin/sh> X-Cron-Env: <HOME=/root> X-Cron-Env: <PATH=/usr/bin:/bin> X-Cron-Env: <LOGNAME=root> X-Cron-Env: <USER=root> Date: Tue, 17 Mar 2015 16:40:02 -0400 (EDT) Status: R Traceback (most recent call last): File "/usr/libexec/rhsmd", line 265, in <module> main() File "/usr/libexec/rhsmd", line 226, in main status = check_status(force_signal) File "/usr/libexec/rhsmd", line 88, in check_status sorter = require(CERT_SORTER) File "/usr/share/rhsm/subscription_manager/injection.py", line 102, in require return FEATURES.require(feature, *args, **kwargs) File "/usr/share/rhsm/subscription_manager/injection.py", line 76, in require self.providers[feature] = provider() File "/usr/share/rhsm/subscription_manager/cert_sorter.py", line 317, in __init__ super(CertSorter, self).__init__() File "/usr/share/rhsm/subscription_manager/cert_sorter.py", line 65, in __init__ self.load() File "/usr/share/rhsm/subscription_manager/cert_sorter.py", line 111, in load self._parse_server_status() File "/usr/share/rhsm/subscription_manager/cert_sorter.py", line 130, in _parse_server_status status = self.get_compliance_status() File "/usr/share/rhsm/subscription_manager/cert_sorter.py", line 325, in get_compliance_status return status_cache.load_status(self.cp_provider.get_consumer_auth_cp(), self.identity.uuid) File "/usr/share/rhsm/subscription_manager/cache.py", line 181, in load_status self._sync_with_server(uep, uuid) File "/usr/share/rhsm/subscription_manager/cache.py", line 276, in _sync_with_server self.server_status = uep.getCompliance(uuid) File "/usr/lib64/python2.6/site-packages/rhsm/connection.py", line 869, in getCompliance return self.conn.request_get(method) File "/usr/lib64/python2.6/site-packages/rhsm/connection.py", line 550, in request_get return self._request("GET", method) File "/usr/lib64/python2.6/site-packages/rhsm/connection.py", line 478, in _request self.validateResponse(result, request_type, handler) File "/usr/lib64/python2.6/site-packages/rhsm/connection.py", line 526, in validateResponse handler=handler) rhsm.connection.RemoteServerException: Server error attempting a GET to /subscription/consumers/e9f2c7ae-0e11-4131-9235-599476faa175/compliance returned status 502 New mail has arrived. Loaded 2 new messages N 3 Cron Daemon Tue Mar 17 16:41 55/2901 "Cron <root@jsefler-os6> /etc/cron.daily/rhsmd" N 4 ABRT Daemon Tue Mar 17 16:41 74/4263 "[abrt] a crash has been detected again" & d* New mail has arrived. Loaded 1 new message N 5 Cron Daemon Tue Mar 17 16:42 55/2901 "Cron <root@jsefler-os6> /etc/cron.daily/rhsmd" & q Held 1 message in /var/spool/mail/root [root@jsefler-os6 ~]# crontab -r [root@jsefler-os6 ~]# echo "d*" | mail > /dev/null [root@jsefler-os6 ~]# CONFIRMED: We have now demonstrated that when the entitlement server responds with a 502 Proxy Error, tracebacks encountered while executing rhsmd cron jobs will flood root's mail box on RHEL6.6 with subscription-manager-1.12.14-7.el6 installed. 7. Now that we have demonstrated/created/faked a server that responds with 502 proxy errors and floods root's mbox with mail, let's update the system to the latest subscription-manager packages targeted for release on RHEL6.7 (I am quietly using a pre-released repo to get the latest subscription-manager-1.14)... [root@jsefler-os6 ~]# yum -y --quiet update subscription-manager* --enablerepo=latest-RHEL6* Server error attempting a GET to /subscription/ returned status 502 [root@jsefler-os6 ~]# subscription-manager version server type: Red Hat Subscription Management subscription management server: Unknown subscription management rules: Unknown subscription-manager: 1.14.1-1.el6 python-rhsm: 1.14.1-1.el6 [root@jsefler-os6 ~]# 8. Now, let's verify the flood of rhsmd cron mail stops... [root@jsefler-os6 ~]# mail No mail for root [root@jsefler-os6 ~]# crontab -e no crontab for root - using an empty one crontab: installing new crontab [root@jsefler-os6 ~]# crontab -l * * * * * /etc/cron.daily/rhsmd [root@jsefler-os6 ~]# sleep 60 [root@jsefler-os6 ~]# mail No mail for root [root@jsefler-os6 ~]# VERIFIED: The offending tracebacks encountered by the rhsmd cron jobs caused by 502 Proxy responses no longer flood root's mail box. [root@jsefler-os6 ~]# tail -f /var/log/rhsm/rhsm.log 2015-03-17 16:59:02,013 [INFO] rhsmd:15826 @rhsmd:276 - rhsmd started 2015-03-17 16:59:02,019 [INFO] rhsmd:15826 @rhsmd:307 - logging subscription status to syslog 2015-03-17 16:59:02,118 [DEBUG] rhsmd:15826 @identity.py:131 - Loading consumer info from identity certificates. 2015-03-17 16:59:02,137 [INFO] rhsmd:15826 @connection.py:684 - Using certificate authentication: key = /etc/pki/consumer/key.pem, cert = /etc/pki/consumer/cert.pem, ca = /etc/rhsm/ca/, insecure = True 2015-03-17 16:59:02,137 [INFO] rhsmd:15826 @connection.py:695 - Connection Built: host: jsefler-7.usersys.redhat.com, port: 8445, handler: /subscription 2015-03-17 16:59:02,138 [INFO] rhsmd:15826 @cache.py:138 - Checking current system info against cache: /var/lib/rhsm/cache/installed_products.json 2015-03-17 16:59:02,140 [INFO] rhsmd:15826 @cache.py:140 - System data has changed, updating server. 2015-03-17 16:59:02,143 [DEBUG] rhsmd:15826 @connection.py:471 - Making request: PUT /subscription/consumers/e9f2c7ae-0e11-4131-9235-599476faa175 2015-03-17 16:59:02,244 [DEBUG] rhsmd:15826 @connection.py:494 - Response: status=502 2015-03-17 16:59:02,244 [ERROR] rhsmd:15826 @connection.py:523 - Response: 502 2015-03-17 16:59:02,245 [ERROR] rhsmd:15826 @connection.py:524 - JSON parsing error: No JSON object could be decoded 2015-03-17 16:59:02,245 [ERROR] rhsmd:15826 @cache.py:150 - Error updating system data on the server 2015-03-17 16:59:02,245 [ERROR] rhsmd:15826 @cache.py:151 - Server error attempting a PUT to /subscription/consumers/e9f2c7ae-0e11-4131-9235-599476faa175 returned status 502 Traceback (most recent call last): File "/usr/share/rhsm/subscription_manager/cache.py", line 142, in update_check self._sync_with_server(uep, consumer_uuid) File "/usr/share/rhsm/subscription_manager/cache.py", line 493, in _sync_with_server content_tags=self.tags) File "/usr/lib64/python2.6/site-packages/rhsm/connection.py", line 826, in updateConsumer ret = self.conn.request_put(method, params) File "/usr/lib64/python2.6/site-packages/rhsm/connection.py", line 584, in request_put return self._request("PUT", method, params) File "/usr/lib64/python2.6/site-packages/rhsm/connection.py", line 503, in _request self.validateResponse(result, request_type, handler) File "/usr/lib64/python2.6/site-packages/rhsm/connection.py", line 551, in validateResponse handler=handler) RemoteServerException: Server error attempting a PUT to /subscription/consumers/e9f2c7ae-0e11-4131-9235-599476faa175 returned status 502 2015-03-17 16:59:02,247 [ERROR] rhsmd:15826 @rhsmd:60 - Unhandled rhsmd exception caught by the logging excepthook: Traceback (most recent call last): File "/usr/libexec/rhsmd", line 353, in <module> main() File "/usr/libexec/rhsmd", line 308, in main status = check_status(force_signal) File "/usr/libexec/rhsmd", line 121, in check_status sorter = require(CERT_SORTER) File "/usr/share/rhsm/subscription_manager/injection.py", line 103, in require return FEATURES.require(feature, *args, **kwargs) File "/usr/share/rhsm/subscription_manager/injection.py", line 77, in require self.providers[feature] = provider() File "/usr/share/rhsm/subscription_manager/cert_sorter.py", line 315, in __init__ self.update_product_manager() File "/usr/share/rhsm/subscription_manager/cert_sorter.py", line 333, in update_product_manager consumer_identity.uuid) File "/usr/share/rhsm/subscription_manager/cache.py", line 152, in update_check raise Exception(_("Error updating system data on the server, see /var/log/rhsm/rhsm.log " Exception: Error updating system data on the server, see /var/log/rhsm/rhsm.log for more details. MOREOVER: rhsm.log shows that the 502 response was actually caught while trying to decode an expected JSON response. VERIFIED: subscription-manager-1.14.1-1.el6 is a good fix for this bug. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-1345.html |
Description of problem: --- snipp --- Date: Mon, 2 Jun 2014 03:33:11 +0200 (CEST) From: Anacron <root.net> To: root.net Subject: Anacron job 'cron.daily' on tux.example.net Message-Id: <20140602013311.1962D3FD2D.net> /etc/cron.daily/rhsmd: Traceback (most recent call last): File "/usr/libexec/rhsmd", line 263, in <module> main() File "/usr/libexec/rhsmd", line 224, in main status = check_status(force_signal) File "/usr/libexec/rhsmd", line 86, in check_status sorter = require(CERT_SORTER) File "/usr/share/rhsm/subscription_manager/injection.py", line 98, in require return FEATURES.require(feature, *args, **kwargs) File "/usr/share/rhsm/subscription_manager/injection.py", line 72, in require self.providers[feature] = provider() File "/usr/share/rhsm/subscription_manager/cert_sorter.py", line 321, in __init__ super(CertSorter, self).__init__() File "/usr/share/rhsm/subscription_manager/cert_sorter.py", line 66, in __init__ self.load() File "/usr/share/rhsm/subscription_manager/cert_sorter.py", line 116, in load self._parse_server_status() File "/usr/share/rhsm/subscription_manager/cert_sorter.py", line 135, in _parse_server_status status = self.get_compliance_status() File "/usr/share/rhsm/subscription_manager/cert_sorter.py", line 333, in get_compliance_status return status_cache.load_status(self.cp_provider.get_consumer_auth_cp(), self.identity.uuid) File "/usr/share/rhsm/subscription_manager/cache.py", line 216, in load_status self._sync_with_server(uep, uuid) File "/usr/share/rhsm/subscription_manager/cache.py", line 201, in _sync_with_server self.server_status = uep.getCompliance(uuid) File "/usr/lib64/python2.6/site-packages/rhsm/connection.py", line 817, in getCompliance return self.conn.request_get(method) File "/usr/lib64/python2.6/site-packages/rhsm/connection.py", line 541, in request_get return self._request("GET", method) File "/usr/lib64/python2.6/site-packages/rhsm/connection.py", line 469, in _request self.validateResponse(result, request_type, handler) File "/usr/lib64/python2.6/site-packages/rhsm/connection.py", line 517, in validateResponse handler=handler) rhsm.connection.RemoteServerException: Server error attempting a GET to /subscription/consumers/<removed>/compliance returned status 502 --- snapp --- Version-Release number of selected component (if applicable): python-rhsm-1.9.7-1.el6_5.x86_64 How reproducible: Everytime when the Red Hat hosted subscription manager server is down? Actual results: Cronjob /etc/cron.daily/rhsmd fails sometimes with python traceback Expected results: No cronjob failures - or if, silently only. Additional info: I guess this is similar like bug #1098644 or the result of the fix of that bug. Unfortunately that still is flooding us with unnecessary cron e-mails.