Bug 470320 - RHEL4 daemons not getting published
Summary: RHEL4 daemons not getting published
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: grid
Version: 1.0
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: 1.1
: ---
Assignee: Matthew Farrellee
QA Contact: Kim van der Riet
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-11-06 17:40 UTC by Matthew Farrellee
Modified: 2009-02-04 16:03 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-02-04 16:03:52 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2009:0036 0 normal SHIPPED_LIVE Red Hat Enterprise MRG Grid 1.1 Release 2009-02-04 16:03:49 UTC

Description Matthew Farrellee 2008-11-06 17:40:26 UTC
condor-7.1.4-0.3.el4
qmf-0.3.709187-4.el4
qpidc-0.3.709187-4.el4

qpid-tool on north-15 shows all RHEL5 daemons, e.g. Masters, and none from RHEL4 machines.

the plugins on the RHEL4 machines are loading successfully

Comment 1 Will Benton 2008-11-18 15:00:23 UTC
It appears that this is a qpid problem.  The call to failover.reset at the end of ConnectionImpl::open() never returns; because this executes in a separate thread (for the grid qmf plugin), it does not cause the condor_master to hang, but the master will not be published in any case.

To reproduce, run "condor_master -t -f" on north-04, which should try and connect to a broker on north-15.

Comment 2 Will Benton 2008-11-18 15:02:21 UTC
Mick observed a crash in a similar place in an example program; I am reproducing his email below.

---

I found a little something interesting about willb's hang, which I want
to record here for posterity.

It doesn't look like a race -- and in my case I do not see a hang -- but
I do see Interesting Behavior in the same place will is seeing a hang.

I reproduced by running a simple client (declare_queues) on RHEL4, and
talking to a broker on RHEL5.  The FailoverListener ctor exits early
because this is true:

    session.exchangeQuery(arg::name=AMQ_FAILOVER).getNotFound()

That looks reasonable, since the RHEL5 broker is non-clustered -- but I
bet that's where Will is seeing it hang rather than return early as
expected.

That's all I've got so far....

Comment 3 Matthew Farrellee 2008-11-26 21:46:23 UTC
r720973 | tross | 2008-11-26 14:48:44 -0600 (Wed, 26 Nov 2008) | 7 lines

Bug fixes for QMF:
  ManagementAgentImpl - don't send messages if broker is not connected.
  ManagementBroker - agents could be assigned the same agentBank
                   - don't send console-attached for attached agents
                   - handle multiple qmf messages in an AMQP body
  schema.py - Don't use the FieldTable copy-constructor, use .clear()

------------------------------------------------------------------------
r720972 | tross | 2008-11-26 14:43:14 -0600 (Wed, 26 Nov 2008) | 12 lines

Added a copy constructor and assignment operator to FieldTable.
This was done to solve a library problem with the RHEL4 distribution.

The compiler generated the assignment operator in an application using
the C++ qpid client libraries.  This generated function (referenced by
a weak symbol) appeared to be causing problems in the heart of the
library (handling of the ConnectionStartBody) with regard to the
handling of field tables.

The failure mechanism is not fully understood, but this seemingly
innocuous change solves the problem.

Comment 4 Matthew Farrellee 2008-11-26 22:13:00 UTC
condor 7.2.0-0.6 will require qpidc&qmf >= 720973

Comment 6 Matthew Farrellee 2008-12-02 21:09:18 UTC
This was not resolved in 720973, turns out ft.clear() did not solve the problem, which is somehow related to weak symbols and/or function definitions in header files.

The known workaround is to define FieldTable::clear in a .cpp file, or to make qmf-gen generate separate blocks, e.g. {}, around ft's use.

Comment 7 Matthew Farrellee 2008-12-08 19:14:33 UTC
This appears to be resolved in 7.2.0-0.8 with the addition of -I/usr/local/qpid-boost for RHEL4 builds

Comment 9 errata-xmlrpc 2009-02-04 16:03:52 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0036.html


Note You need to log in before you can comment on or make changes to this bug.