Bug 470320 - RHEL4 daemons not getting published
RHEL4 daemons not getting published
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: grid (Show other bugs)
All Linux
urgent Severity urgent
: 1.1
: ---
Assigned To: Matthew Farrellee
Kim van der Riet
Depends On:
  Show dependency treegraph
Reported: 2008-11-06 12:40 EST by Matthew Farrellee
Modified: 2009-02-04 11:03 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2009-02-04 11:03:52 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Matthew Farrellee 2008-11-06 12:40:26 EST

qpid-tool on north-15 shows all RHEL5 daemons, e.g. Masters, and none from RHEL4 machines.

the plugins on the RHEL4 machines are loading successfully
Comment 1 Will Benton 2008-11-18 10:00:23 EST
It appears that this is a qpid problem.  The call to failover.reset at the end of ConnectionImpl::open() never returns; because this executes in a separate thread (for the grid qmf plugin), it does not cause the condor_master to hang, but the master will not be published in any case.

To reproduce, run "condor_master -t -f" on north-04, which should try and connect to a broker on north-15.
Comment 2 Will Benton 2008-11-18 10:02:21 EST
Mick observed a crash in a similar place in an example program; I am reproducing his email below.


I found a little something interesting about willb's hang, which I want
to record here for posterity.

It doesn't look like a race -- and in my case I do not see a hang -- but
I do see Interesting Behavior in the same place will is seeing a hang.

I reproduced by running a simple client (declare_queues) on RHEL4, and
talking to a broker on RHEL5.  The FailoverListener ctor exits early
because this is true:


That looks reasonable, since the RHEL5 broker is non-clustered -- but I
bet that's where Will is seeing it hang rather than return early as

That's all I've got so far....
Comment 3 Matthew Farrellee 2008-11-26 16:46:23 EST
r720973 | tross | 2008-11-26 14:48:44 -0600 (Wed, 26 Nov 2008) | 7 lines

Bug fixes for QMF:
  ManagementAgentImpl - don't send messages if broker is not connected.
  ManagementBroker - agents could be assigned the same agentBank
                   - don't send console-attached for attached agents
                   - handle multiple qmf messages in an AMQP body
  schema.py - Don't use the FieldTable copy-constructor, use .clear()

r720972 | tross | 2008-11-26 14:43:14 -0600 (Wed, 26 Nov 2008) | 12 lines

Added a copy constructor and assignment operator to FieldTable.
This was done to solve a library problem with the RHEL4 distribution.

The compiler generated the assignment operator in an application using
the C++ qpid client libraries.  This generated function (referenced by
a weak symbol) appeared to be causing problems in the heart of the
library (handling of the ConnectionStartBody) with regard to the
handling of field tables.

The failure mechanism is not fully understood, but this seemingly
innocuous change solves the problem.
Comment 4 Matthew Farrellee 2008-11-26 17:13:00 EST
condor 7.2.0-0.6 will require qpidc&qmf >= 720973
Comment 6 Matthew Farrellee 2008-12-02 16:09:18 EST
This was not resolved in 720973, turns out ft.clear() did not solve the problem, which is somehow related to weak symbols and/or function definitions in header files.

The known workaround is to define FieldTable::clear in a .cpp file, or to make qmf-gen generate separate blocks, e.g. {}, around ft's use.
Comment 7 Matthew Farrellee 2008-12-08 14:14:33 EST
This appears to be resolved in 7.2.0-0.8 with the addition of -I/usr/local/qpid-boost for RHEL4 builds
Comment 9 errata-xmlrpc 2009-02-04 11:03:52 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.