Bug 488998 - carod cannot handle broker restart
Summary: carod cannot handle broker restart
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: grid
Version: 1.1
Hardware: All
OS: Linux
medium
medium
Target Milestone: 1.2
: ---
Assignee: Robert Rati
QA Contact: Martin Kudlej
URL:
Whiteboard:
Depends On: 522467
Blocks: 527551
TreeView+ depends on / blocked
 
Reported: 2009-03-06 17:11 UTC by Matthew Farrellee
Modified: 2018-10-27 14:57 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Grid bug fix C: MRG Messaging Broker restarted while low-latency is running on a grid execute node C: The low-latency daemon (carod) would stop processing jobs and crash F: Fixed the daemon to check for disconnections and to attempt to reconnect R: The daemon no longer crashes and will resume processing jobs once the broker is running again If the MRG Messaging Broker was restarted while low-latency was running on a grid execute node, the low-latency daemon (carod) would stop processing jobs and crash. The daemon now checks for disconnections and attempts to reconnect. This prevents the daemon from crashing and will resume processing jobs once the broker is running again.
Clone Of:
Environment:
Last Closed: 2009-12-03 09:16:06 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2009:1633 normal SHIPPED_LIVE Red Hat Enterprise MRG Messaging and Grid Version 1.2 2009-12-03 09:15:33 UTC

Description Matthew Farrellee 2009-03-06 17:11:19 UTC
condor-low-latency-1.0-10.el5
condor-job-hooks-common-1.0-5.el5

Exception from carod when qpidd is restarted...

Exception in thread Thread-363:
Traceback (most recent call last):
  File "/usr/lib64/python2.4/threading.py", line 442, in __bootstrap
    self.run()
  File "/usr/lib64/python2.4/threading.py", line 422, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/sbin/carod", line 418, in handle_reply_fetch
    send_AMQP_msg(broker_connection, saved_work.AMQP_msg, msg_props)
  File "/usr/sbin/carod", line 264, in send_AMQP_msg
    connection.message_transfer(destination=reply_to['exchange'], message=Message(msg_properties, delivery_props, data))
  File "/usr/lib/python2.4/site-packages/qpid/generator.py", line 25, in <lambda>
    method = lambda self, *args, **kwargs: self.invoke(inst, args, kwargs)
  File "/usr/lib/python2.4/site-packages/qpid/session.py", line 143, in invoke
    return self.do_invoke(type, args, kwargs)
  File "/usr/lib/python2.4/site-packages/qpid/session.py", line 152, in do_invoke
    raise SessionDetached()
SessionDetached

Comment 1 Matthew Farrellee 2009-03-06 19:00:33 UTC
Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/lib64/python2.4/threading.py", line 442, in __bootstrap
    self.run()
  File "/usr/lib64/python2.4/threading.py", line 422, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/sbin/carod", line 239, in lease_monitor
    item.unlock(False)
  File "/usr/sbin/carod", line 56, in unlock
    self.__access_lock__.release()
  File "/usr/lib64/python2.4/threading.py", line 113, in release
    assert self.__owner is me, "release() of un-acquire()d lock"
AssertionError: release() of un-acquire()d lock

Comment 2 Robert Rati 2009-08-17 19:45:54 UTC
Fixed in:
condor-low-latency-1.0-18

Comment 3 Martin Kudlej 2009-09-24 11:35:07 UTC
Tested on RHEL5.4 condor-7.4.0-0.5 and RHEL4.8 condor-7.4.0-0.4 i386/x86_64 and with condor-low-latency-1.0-19 and it works --> VERIFIED

Comment 4 Irina Boverman 2009-10-28 17:05:26 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
Carod is no longer crashing when broker is restarted (488998)

Comment 5 Lana Brindley 2009-11-05 02:06:43 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1 +1,8 @@
-Carod is no longer crashing when broker is restarted (488998)+Grid bug fix
+
+C: MRG Messaging Broker restarted
+C: Carod would experience and exception and crash
+F: 
+R: Carod no longer crashes.
+
+NEED FURTHER INFORMATION FOR RELNOTE.

Comment 6 Robert Rati 2009-11-24 13:48:41 UTC
C: MRG Messaging Broker restarted while low-latency is running on a grid execute node
C: The low-latency daemon (carod) would stop processing jobs and crash
F: Fixed the daemon to check for disconnections and to attempt to reconnect
R: The daemon no longer crashes and will resume processing jobs once the broker is running again

Comment 7 Lana Brindley 2009-11-26 20:29:21 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1,8 +1,8 @@
 Grid bug fix
 
-C: MRG Messaging Broker restarted
-C: Carod would experience and exception and crash
-F: 
-R: Carod no longer crashes.
+C: MRG Messaging Broker restarted while low-latency is running on a grid execute node
+C: The low-latency daemon (carod) would stop processing jobs and crash
+F: Fixed the daemon to check for disconnections and to attempt to reconnect
+R: The daemon no longer crashes and will resume processing jobs once the broker is running again 
 
-NEED FURTHER INFORMATION FOR RELNOTE.+If the MRG Messaging Broker was restarted while low-latency was running on a grid execute node, the low-latency daemon (carod) would stop processing jobs and crash. The daemon now checks for disconnections and attempts to reconnect. This prevents the daemon from crashing and will resume processing jobs once the broker is running again.

Comment 8 errata-xmlrpc 2009-12-03 09:16:06 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-1633.html


Note You need to log in before you can comment on or make changes to this bug.