Bug 1287925
Summary: | /bin/sh /etc/cron.daily/rhsmd does not stop. | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Yoshinori Takahashi <hkim> |
Component: | subscription-manager | Assignee: | Chris Snyder <csnyder> |
Status: | CLOSED ERRATA | QA Contact: | John Sefler <jsefler> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 6.7 | CC: | alikins, bcourt, csnyder, hkim, jgalipea, redakkan, tmraz, vrjain |
Target Milestone: | rc | Keywords: | Reopened, Triaged |
Target Release: | 6.9 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-03-21 10:54:07 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1269194, 1355878 |
Comment 12
Adrian Likins
2016-01-07 16:06:39 UTC
That implies that there is still a blocking/locking bug, but it may be rhsmcertd-worker and not rhsmd. Yoshinori, apologies that this bug did not get attention recently. We have a potential fix but we cant not be sure that the fix would solve the customer's issue, and it would not be good customer experience to wait until then to find out if that was a reliable fix. Also, unfortunately the logs attached do not give us enough information to be sure of what the issue is. 0. is this still an issue? 1. did we try killing the rhsmcertd-worker process like Adrian had suggested in comment 12 ? 2. is this a recurring issue? that is , does this happen every time rhsmcertd starts or was this a one time occurrence? 3. If the answer to all the above questions is yes, please provide an strace so we get find out why that process hangs every time. you could use the command: strace -p `ps -C rhsmcertd -o pid=` -o rhsmcertd_trace.txt I believe this to be fixed in versions of subscription-manager greater than or equal to 1.17.15-1 and python-rhsm version greater than or equal to 1.17.4-1. If you are still running into this bug using these versions (or newer) of subscription-manager and python-rhsm please reopen this bug. Thank you. Subscription-manager 1.17.X and python-rhsm 1.17.X are being released with RHEL 7.3. As there have been a few fixes to go into the 1.17.X versions of subscription-manager and python-rhsm and when we build our next release (for EL6) it will be rebased from the previous release (1.17.X), the fix for this bug should be included when we build subscription-manager 1.18.X for RHEL 6.9. Here are links to the bugs mentioned in comment 31: - https://bugzilla.redhat.com/show_bug.cgi?id=1351370 - A fix to ensure rhsmd exits when an exception occurs during a call to a method exposed over dbus. - https://bugzilla.redhat.com/show_bug.cgi?id=1346417 Other possibly related upstream issues / prs: - https://github.com/candlepin/subscription-manager/issues/1006 - See comment 5 - https://github.com/candlepin/python-rhsm/pull/170 - A fix from awood allowing the socket timeout to be set in rhsm.conf (and elsewhere in the codebase). In the PR above, there was a default socket timeout of 180 added. Hopefully this is helpful to QA! Cheers. since this bug doesnt have a direct reproducer, we believe that the fixes for following issues resolved the bug : 1) Bug 1351370 - [ERROR] subscription-manager:31276 @dbus_interface.py:60 - org.freedesktop.DBus.Python.OSError: Traceback 2) Bug 1346417 - [RFE] Allow users to set socket timeout. 1) Demonstrating that "OS error " no longer happens on rhel69 with the build python-rhsm-1.18.6-1.el6.x86_64 python-rhsm-certificates-1.18.6-1.el6.x86_64 subscription-manager-firstboot-1.18.6-1.el6.x86_64 subscription-manager-migration-data-2.0.32-1.el6.noarch subscription-manager-debuginfo-1.18.6-1.el6.x86_64 subscription-manager-1.18.6-1.el6.x86_64 subscription-manager-plugin-container-1.18.6-1.el6.x86_64 subscription-manager-migration-1.18.6-1.el6.x86_64 subscription-manager-gui-1.18.6-1.el6.x86_64 # cp -R /etc/pki/product-default/ /tmp/ # ls -R /tmp/product-default/ /tmp/product-default/: 69.pem [root@dhcp35-181 tmp]# subscription-manager config --rhsm.productcertdir=/tmp/product-default/ [root@dhcp35-181 tmp]# subscription-manager clean All local data removed rhsm.log: ====== 2017-01-06 07:14:02,817 [INFO] subscription-manager:5292:MainThread @managercli.py:389 - Client Versions: {'python-rhsm': '1.18.6-1.el6', 'subscription-manager': '1.18.6-1.el6'} 2017-01-06 07:14:02,820 [INFO] subscription-manager:5292:MainThread @managerlib.py:879 - Cleaned local data 2017-01-06 07:14:02,996 [INFO] rhsmd:5294:MainThread @rhsmd:261 - rhsmd started 2017-01-06 07:14:03,729 [INFO] subscription-manager-gui:32419:CertMonitorThread @connection.py:758 - Connection built: host=F21-candlepin.usersys.redhat.com port=8443 handler=/candlepin auth=identity_cert ca_dir=/etc/rhsm/ca/ insecure=False 2017-01-06 07:14:03,857 [INFO] rhsmd:5296:MainThread @rhsmd:261 - rhsmd started # ps -aux | grep "rhsmd" Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ root 5584 0.0 0.0 103332 816 pts/1 R+ 07:36 0:00 grep rhsmd ^^ Verified that NO os Error was occuring and "rhsmd" service was running 2)Verifying with a existing non-responsive entitlement server setup (please refer https://bugzilla.redhat.com/show_bug.cgi?id=1346417#c11 to setup a non-responsive server ) [root@auto-services ncat_listener]# systemctl is-active ncat_listener.service active ^^ making sure that the service is active on the server retesting with the version : subscription-manager version server type: This system is currently not registered. subscription management server: 0.9.51.20-1 subscription management rules: 5.15.1 subscription-manager: 1.18.6-1.el6 python-rhsm: 1.18.6-1.el6 Now copying the server cert and trying to register the client against the non-responsive server ; the expected result is to get a timeout in the specified period # scp root.redhat.com:/root/ncat_listener/ncat_listener.pem /etc/rhsm/ca/ root.redhat.com's password: ncat_listener.pem 100% 1935 1.9KB/s 00:00 # chmod 0644 /etc/rhsm/ca/ncat_listener.pem # subscription-manager config --server.hostname=auto-services.usersys.redhat.com --server.port=8884 # subscription-manager config --server.server_timeout=20 # time subscription-manager register --username=foo --password=bar Registering to: auto-services.usersys.redhat.com:8884/subscription Unable to verify server's identity: real 0m21.397s user 0m0.224s sys 0m0.055s After a real time of 21.397s, subscription-manager command was timed out against the non-responsive server ( auto-services.usersys.redhat.com:8884) Conclusion : With the verification of these 1351370,1346417 bugs its verified that "rhsmd" service no-longer hangs after any error. on : subscription management server: 0.9.51.20-1 subscription management rules: 5.15.1 subscription-manager: 1.18.6-1.el6 python-rhsm: 1.18.6-1.el6 Marking as Verified!! Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0698.html |