Bug 700958

Summary: Message did not receive a reply (timeout by message bus)
Product: Red Hat Enterprise Linux 5 Reporter: John Sefler <jsefler>
Component: subscription-managerAssignee: Bryan Kearney <bkearney>
Status: CLOSED ERRATA QA Contact: John Sefler <jsefler>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 5.7CC: bkearney, spandey
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-07-21 08:46:20 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 675214    

Description John Sefler 2011-04-29 22:11:22 UTC
Description of problem:
I don't know how to reproduce this problem, but I am intermittently getting an error to stdout with "Message did not receive a reply (timeout by message bus)" when executing subscription-manager commands (e.g. unregister) and then the following traceback is found in the rhsm.log...

2011-04-29 17:59:29,451 [INFO]  @connection.py:307 - Using certificate authentication: key = /etc/pki/consumer/key.pem, cert = /etc/pki/consumer/cert.pem, ca = /etc/rhsm/ca/, insecure = False
2011-04-29 17:59:29,451 [INFO]  @connection.py:311 - Connection Established: host: jsefler-f14-5candlepin.usersys.redhat.com, port: 8443, handler: /candlepin
2011-04-29 17:59:29,453 [INFO]  @connection.py:159 - loading ca pem certificates from: /etc/rhsm/ca/
2011-04-29 17:59:29,453 [INFO]  @connection.py:146 - loading ca certificate '/etc/rhsm/ca/redhat-uep.pem'
2011-04-29 17:59:29,454 [INFO]  @connection.py:146 - loading ca certificate '/etc/rhsm/ca/fakamai-cp1.pem'
2011-04-29 17:59:29,454 [INFO]  @connection.py:146 - loading ca certificate '/etc/rhsm/ca/jsefler-f14-5candlepin.pem'
2011-04-29 17:59:29,454 [INFO]  @connection.py:146 - loading ca certificate '/etc/rhsm/ca/candlepin-stage.pem'
2011-04-29 17:59:29,455 [INFO]  @connection.py:161 - work in insecure mode ?:False
2011-04-29 17:59:29,636 [INFO]  @connection.py:189 - status code: 204
2011-04-29 17:59:29,640 [INFO]  @managerlib.py:644 - Successfully un-registered.
2011-04-29 17:59:29,642 [INFO]  @connection.py:307 - Using certificate authentication: key = /etc/pki/consumer/key.pem, cert = /etc/pki/consumer/cert.pem, ca = /etc/rhsm/ca/, insecure = False
2011-04-29 17:59:29,642 [INFO]  @connection.py:311 - Connection Established: host: jsefler-f14-5candlepin.usersys.redhat.com, port: 8443, handler: /candlepin
2011-04-29 17:59:29,658 [INFO]  @repolib.py:88 - repos updated: 0
2011-04-29 17:59:29,749 [DEBUG]  @certlib.py:639 - Sorting product and entitlement cert status for: 2011-04-29 17:59:29.749008
2011-04-29 17:59:29,757 [DEBUG]  @certlib.py:658 - Installed product IDs: []
2011-04-29 17:59:29,757 [DEBUG]  @certlib.py:649 - valid entitled products: []
2011-04-29 17:59:29,757 [DEBUG]  @certlib.py:650 - expired entitled products: []
2011-04-29 17:59:29,758 [INFO]  @factlib.py:58 - facts updated: 0
2011-04-29 17:59:29,759 [ERROR]  @certlib.py:156 - [Errno 2] No such file or directory: '/etc/pki/consumer/key.pem'
2011-04-29 17:59:29,995 [ERROR]  @managercli.py:48 - exception caught in subscription-manager
2011-04-29 17:59:29,997 [ERROR]  @managercli.py:49 - Message did not receive a reply (timeout by message bus)
Traceback (most recent call last):
  File "/usr/sbin/subscription-manager", line 78, in ?
    sys.exit(abs(main() or 0))
  File "/usr/sbin/subscription-manager", line 69, in main
    return managercli.CLI().main()
  File "/usr/share/rhsm/subscription_manager/managercli.py", line 854, in main
    cmd.main()
  File "/usr/share/rhsm/subscription_manager/managercli.py", line 192, in main
    self._do_command()
  File "/usr/share/rhsm/subscription_manager/managercli.py", line 475, in _do_command
    self._request_validity_check()
  File "/usr/share/rhsm/subscription_manager/managercli.py", line 121, in _request_validity_check
    validity_iface.check_status()
  File "/usr/lib64/python2.4/site-packages/dbus/proxies.py", line 25, in __call__
    ret = self._proxy_method (*args, **keywords)
  File "/usr/lib64/python2.4/site-packages/dbus/proxies.py", line 102, in __call__
    reply_message = self._connection.send_with_reply_and_block(message, timeout)
  File "dbus_bindings.pyx", line 455, in dbus_bindings.Connection.send_with_reply_and_block
DBusException: Message did not receive a reply (timeout by message bus)

[root@jsefler-onprem-5server ~]# tail -f /var/log/rhsm/rhsm.log
  File "/usr/share/rhsm/subscription_manager/managercli.py", line 475, in _do_command
    self._request_validity_check()
  File "/usr/share/rhsm/subscription_manager/managercli.py", line 121, in _request_validity_check
    validity_iface.check_status()
  File "/usr/lib64/python2.4/site-packages/dbus/proxies.py", line 25, in __call__
    ret = self._proxy_method (*args, **keywords)
  File "/usr/lib64/python2.4/site-packages/dbus/proxies.py", line 102, in __call__
    reply_message = self._connection.send_with_reply_and_block(message, timeout)
  File "dbus_bindings.pyx", line 455, in dbus_bindings.Connection.send_with_reply_and_block
DBusException: Message did not receive a reply (timeout by message bus)



Version-Release number of selected component (if applicable):
[root@jsefler-onprem-5server product]# rpm -qa | grep subscription-manager
subscription-manager-firstboot-0.95.5.8-1.git.6.42d8706.el5
subscription-manager-gnome-0.95.5.8-1.git.6.42d8706.el5
subscription-manager-0.95.5.8-1.git.6.42d8706.el5
[root@jsefler-onprem-5server product]# rpm -q python-rhsm
python-rhsm-0.95.5.3-1.git.2.01e0da5.el5


How reproducible:
occasional

Steps to Reproduce:
1. I don't know
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 James Bowes 2011-05-10 16:43:13 UTC
fixed in master e9430af8df75eabe01a05d8d7288e4a8a44b6f4b
and rhel 5 4e42e767708a4bf6c694a819b2b709e2b39ba17c

I'm not able to prevent the timeout from occuring (the daemon just isn't replying in time). but that's not too big a deal, so from the cli we can simply ignore the error.

to test, add a time.sleep(some_value_greater_than_30) to
check_status in rhsm_d.py, and run the daemon from source with -d -k

Comment 2 John Sefler 2011-05-26 21:35:28 UTC
Verifying Version...
[root@jsefler-rhel57-beta12-i386 tmp]# rpm -qa | grep subscription-manager
subscription-manager-0.95.5.19-1.git.1.8278e3d.el5
subscription-manager-firstboot-0.95.5.19-1.git.1.8278e3d.el5
subscription-manager-gnome-0.95.5.19-1.git.1.8278e3d.el5

As suggested in comment 1 for testing...

[root@jsefler-rhel57-beta12-i386 ~]# cp /usr/libexec/rhsmd /usr/libexec/rhsmd.orig
[root@jsefler-rhel57-beta12-i386 ~]# vi /usr/libexec/rhsmd  (and added the suggested sleep inside def check_status():
[root@jsefler-rhel57-beta12-i386 ~]# diff /usr/libexec/rhsmd /usr/libexec/rhsmd.orig
18d17
< import time
64,66d62
<     print "sleeping for 50..."
<     time.sleep(50)
<     print "awake"
[root@jsefler-rhel57-beta12-i386 ~]# 

[root@jsefler-rhel57-beta12-i386 ~]# /usr/libexec/rhsmd -k -d
sleeping for 50...
awake
System has one or more certificates that are not valid
['83', '101', '69', '90']
[]
Validity status changed, fire signal
signal fired! code is 1
sleeping for 50...
awake
System has one or more certificates that are not valid
['83', '101', '69', '90']
[]


In a separate terminal, I ran the following after starting rhsmd which in turn spawned the call to check_status and produced the output from rhsmd above

[root@jsefler-rhel57-beta12-i386 ~]# subscription-manager register --username=stage_test_38
Password: 
f2bd9cba-7c10-4872-9ee3-07e4f5686851 jsefler-rhel57-beta12-i386.usersys.redhat.com
[root@jsefler-rhel57-beta12-i386 ~]# subscription-manager unregister
System has been un-registered.
[root@jsefler-rhel57-beta12-i386 ~]# 

[root@jsefler-rhel57-beta12-i386 ~]# grep -i Traceback /var/log/rhsm/rhsm.log 
[root@jsefler-rhel57-beta12-i386 ~]# 


With the absence of any stdout "Message did not receive a reply (timeout by message bus)" as well as an absence of a traceback from rhsm.log during a register or unregister, I hope this is enough to declare this defect as VERIFIED.

Comment 3 errata-xmlrpc 2011-07-21 08:46:20 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2011-1078.html

Comment 4 errata-xmlrpc 2011-07-21 12:30:21 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2011-1078.html