condor-low-latency-1.0-10.el5 condor-job-hooks-common-1.0-5.el5 Exception from carod when qpidd is restarted... Exception in thread Thread-363: Traceback (most recent call last): File "/usr/lib64/python2.4/threading.py", line 442, in __bootstrap self.run() File "/usr/lib64/python2.4/threading.py", line 422, in run self.__target(*self.__args, **self.__kwargs) File "/usr/sbin/carod", line 418, in handle_reply_fetch send_AMQP_msg(broker_connection, saved_work.AMQP_msg, msg_props) File "/usr/sbin/carod", line 264, in send_AMQP_msg connection.message_transfer(destination=reply_to['exchange'], message=Message(msg_properties, delivery_props, data)) File "/usr/lib/python2.4/site-packages/qpid/generator.py", line 25, in <lambda> method = lambda self, *args, **kwargs: self.invoke(inst, args, kwargs) File "/usr/lib/python2.4/site-packages/qpid/session.py", line 143, in invoke return self.do_invoke(type, args, kwargs) File "/usr/lib/python2.4/site-packages/qpid/session.py", line 152, in do_invoke raise SessionDetached() SessionDetached
Exception in thread Thread-2: Traceback (most recent call last): File "/usr/lib64/python2.4/threading.py", line 442, in __bootstrap self.run() File "/usr/lib64/python2.4/threading.py", line 422, in run self.__target(*self.__args, **self.__kwargs) File "/usr/sbin/carod", line 239, in lease_monitor item.unlock(False) File "/usr/sbin/carod", line 56, in unlock self.__access_lock__.release() File "/usr/lib64/python2.4/threading.py", line 113, in release assert self.__owner is me, "release() of un-acquire()d lock" AssertionError: release() of un-acquire()d lock
Fixed in: condor-low-latency-1.0-18
Tested on RHEL5.4 condor-7.4.0-0.5 and RHEL4.8 condor-7.4.0-0.4 i386/x86_64 and with condor-low-latency-1.0-19 and it works --> VERIFIED
Release note added. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Carod is no longer crashing when broker is restarted (488998)
Release note updated. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1 +1,8 @@ -Carod is no longer crashing when broker is restarted (488998)+Grid bug fix + +C: MRG Messaging Broker restarted +C: Carod would experience and exception and crash +F: +R: Carod no longer crashes. + +NEED FURTHER INFORMATION FOR RELNOTE.
C: MRG Messaging Broker restarted while low-latency is running on a grid execute node C: The low-latency daemon (carod) would stop processing jobs and crash F: Fixed the daemon to check for disconnections and to attempt to reconnect R: The daemon no longer crashes and will resume processing jobs once the broker is running again
Release note updated. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,8 +1,8 @@ Grid bug fix -C: MRG Messaging Broker restarted -C: Carod would experience and exception and crash -F: -R: Carod no longer crashes. +C: MRG Messaging Broker restarted while low-latency is running on a grid execute node +C: The low-latency daemon (carod) would stop processing jobs and crash +F: Fixed the daemon to check for disconnections and to attempt to reconnect +R: The daemon no longer crashes and will resume processing jobs once the broker is running again -NEED FURTHER INFORMATION FOR RELNOTE.+If the MRG Messaging Broker was restarted while low-latency was running on a grid execute node, the low-latency daemon (carod) would stop processing jobs and crash. The daemon now checks for disconnections and attempts to reconnect. This prevents the daemon from crashing and will resume processing jobs once the broker is running again.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2009-1633.html