Bug 673541 - carod ends with exception
Summary: carod ends with exception
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor-low-latency
Version: Development
Hardware: i386
OS: Linux
unspecified
unspecified
Target Milestone: 1.3.2
: ---
Assignee: Robert Rati
QA Contact: Martin Kudlej
URL:
Whiteboard:
Depends On: 673991
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-01-28 16:23 UTC by Martin Kudlej
Modified: 2011-02-15 12:10 UTC (History)
2 users (show)

Fixed In Version: condor-low-latency-1.1-2
Doc Type: Bug Fix
Doc Text:
C: Integer overflow fixes in python C: The carod daemon won't start and exits with an exception on RHEL4 32-bit F: Pass a large value as a long instead of an int R: The carod deamon will start and run as expected
Clone Of:
: 673991 (view as bug list)
Environment:
Last Closed: 2011-02-15 12:10:37 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:0217 0 normal SHIPPED_LIVE Red Hat Enterprise MRG Messaging and Grid bug fix and enhancement update 2011-02-15 12:10:15 UTC

Description Martin Kudlej 2011-01-28 16:23:10 UTC
Description of problem:
I've configured low latency on RHEL 5.6/4.9 x i386/x86_64 and carod ends with exception only on RHEL 4.9 i386.
Exception:

$ carod -l
/usr/sbin/carod:894: FutureWarning: hex/oct constants > sys.maxint will return positive values in Python 2.4 and up
  session.message_flow(data.amqp_config['work_queue_name'], 0, 0xFFFFFFFF)
/usr/sbin/carod:895: FutureWarning: hex/oct constants > sys.maxint will return positive values in Python 2.4 and up
  session.message_flow(data.amqp_config['work_queue_name'], 1, 0xFFFFFFFF)
01/28 13:29:39 INFO: Starting up
Traceback (most recent call last):
  File "/usr/sbin/carod", line 1085, in ?
    sys.exit(main())
  File "/usr/sbin/carod", line 1014, in main
    connect_to_broker(share_data, broker)
  File "/usr/sbin/carod", line 894, in connect_to_broker
    session.message_flow(data.amqp_config['work_queue_name'], 0, 0xFFFFFFFF)
  File "/usr/lib/python2.3/site-packages/qpid/generator.py", line 25, in <lambda> 
    method = lambda self, *args, **kwargs: self.invoke(op, args, kwargs)
  File "/usr/lib/python2.3/site-packages/qpid/session.py", line 138, in invoke
    return self.do_invoke(op, args, kwargs)
  File "/usr/lib/python2.3/site-packages/qpid/session.py", line 171, in do_invoke 
    self.send(cmd)
  File "/usr/lib/python2.3/site-packages/qpid/session.py", line 204, in send
    self.sender.send(cmd)
  File "/usr/lib/python2.3/site-packages/qpid/session.py", line 255, in send
    ch.connection.write_op(cmd)
  File "/usr/lib/python2.3/site-packages/qpid/connection.py", line 195, in write_op
    self.op_enc.write(op)
  File "/usr/lib/python2.3/site-packages/qpid/framing.py", line 214, in write
    enc = self.encode_command(op)
  File "/usr/lib/python2.3/site-packages/qpid/framing.py", line 239, in encode_command
    sc.write_fields(cmd)
  File "/usr/lib/python2.3/site-packages/qpid/codec010.py", line 360, in write_fields
    enc(value)
  File "/usr/lib/python2.3/site-packages/qpid/codec010.py", line 131, in write_uint32
    raise CodecException("Cannot encode %d as uint32" % n)
qpid.codec010.CodecException: Cannot encode -1 as uint32

Version-Release number of selected component (if applicable):
python-wallabyclient-3.9-2.el4
condor-job-hooks-1.4-6.el4
qpid-java-client-0.7.946106-14.el4
qpid-cpp-server-ssl-0.7.946106-27.el4
qpid-cpp-server-xml-0.7.946106-27.el4
qpid-cpp-client-devel-docs-0.7.946106-27.el4
condor-debuginfo-7.4.5-0.7.el4
python-condorutils-1.4-6.el4
python-qpid-0.7.946106-15.el4
condor-wallaby-client-3.9-2.el4
condor-low-latency-1.1-0.2.el4
qpid-cpp-client-0.7.946106-27.el4
qpid-cpp-server-0.7.946106-27.el4
qpid-cpp-client-ssl-0.7.946106-27.el4
qpid-java-common-0.7.946106-14.el4
qpid-java-example-0.7.946106-14.el4
qpid-cpp-server-devel-0.7.946106-27.el4
qpid-tools-0.7.946106-12.el4
qpid-tests-0.7.946106-1.el4
qpid-cpp-mrg-debuginfo-0.7.946106-27.el4
condor-7.4.5-0.7.el4
rh-tests-distribution-MRG-Messaging-qpid_common-1.6-56
qpid-cpp-client-devel-0.7.946106-27.el4
qpid-cpp-server-store-0.7.946106-27.el4

How reproducible:
100%

Steps to Reproduce:
1. set up low latency by remote configuration
2. restart condor
3. watch MasterLog
  
Actual results:
carod ends with unhandled exception.

Expected results:
carod will not end with exception and will properly communicate with broker and will work its job.

Comment 1 Robert Rati 2011-01-28 20:46:54 UTC
There are two interesting parts to this error.  The 1st is:
/usr/sbin/carod:894: FutureWarning: hex/oct constants > sys.maxint will return
positive values in Python 2.4 and up
  session.message_flow(data.amqp_config['work_queue_name'], 0, 0xFFFFFFFF)
/usr/sbin/carod:895: FutureWarning: hex/oct constants > sys.maxint will return
positive values in Python 2.4 and up
  session.message_flow(data.amqp_config['work_queue_name'], 1, 0xFFFFFFFF)

These warnings are a part of python2.3 according to the thread:
http://mail.python.org/pipermail/python-dev/2002-November/030583.html

The above thread also gives a few solutions to solve the situation for a few different cases.

The exception comes from oxFFFFFFFF being larger than sys.maxint on 32-bit systems, causing the value to be negative.  There are a number of overflow fixes that have gone into RHEL4.9 (486329, 486330, 455008) that could have caused this to pop up.

0xFFFFFFFF is a special value in amqp, and in the instance in carod is setting an unlimited message size for messages received.  The fix is to pass the value with a trailing L (0xFFFFFFFFL)

Comment 2 Robert Rati 2011-01-28 20:50:17 UTC
Fixed on master.

Comment 3 Robert Rati 2011-01-28 20:59:36 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
C: Integer overflow fixes in python
C: The carod daemon won't start and exits with an exception on RHEL4 32-bit
F: Pass a large value as a long instead of an int
R: The carod deamon will start and run as expected

Comment 6 Martin Kudlej 2011-01-31 12:07:15 UTC
Sorry, it was bug in our low latency test client. --> CLOSE as NOTABUG

Comment 7 Martin Kudlej 2011-01-31 12:09:11 UTC
Sorry wrong bug :( It's working now. --> ASSIGNED --> VERIFIED

Comment 8 errata-xmlrpc 2011-02-15 12:10:37 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0217.html


Note You need to log in before you can comment on or make changes to this bug.