Bug 470320
Summary: | RHEL4 daemons not getting published | ||
---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | Matthew Farrellee <matt> |
Component: | grid | Assignee: | Matthew Farrellee <matt> |
Status: | CLOSED ERRATA | QA Contact: | Kim van der Riet <kim.vdriet> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 1.0 | CC: | tross |
Target Milestone: | 1.1 | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2009-02-04 16:03:52 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Matthew Farrellee
2008-11-06 17:40:26 UTC
It appears that this is a qpid problem. The call to failover.reset at the end of ConnectionImpl::open() never returns; because this executes in a separate thread (for the grid qmf plugin), it does not cause the condor_master to hang, but the master will not be published in any case. To reproduce, run "condor_master -t -f" on north-04, which should try and connect to a broker on north-15. Mick observed a crash in a similar place in an example program; I am reproducing his email below. --- I found a little something interesting about willb's hang, which I want to record here for posterity. It doesn't look like a race -- and in my case I do not see a hang -- but I do see Interesting Behavior in the same place will is seeing a hang. I reproduced by running a simple client (declare_queues) on RHEL4, and talking to a broker on RHEL5. The FailoverListener ctor exits early because this is true: session.exchangeQuery(arg::name=AMQ_FAILOVER).getNotFound() That looks reasonable, since the RHEL5 broker is non-clustered -- but I bet that's where Will is seeing it hang rather than return early as expected. That's all I've got so far.... r720973 | tross | 2008-11-26 14:48:44 -0600 (Wed, 26 Nov 2008) | 7 lines Bug fixes for QMF: ManagementAgentImpl - don't send messages if broker is not connected. ManagementBroker - agents could be assigned the same agentBank - don't send console-attached for attached agents - handle multiple qmf messages in an AMQP body schema.py - Don't use the FieldTable copy-constructor, use .clear() ------------------------------------------------------------------------ r720972 | tross | 2008-11-26 14:43:14 -0600 (Wed, 26 Nov 2008) | 12 lines Added a copy constructor and assignment operator to FieldTable. This was done to solve a library problem with the RHEL4 distribution. The compiler generated the assignment operator in an application using the C++ qpid client libraries. This generated function (referenced by a weak symbol) appeared to be causing problems in the heart of the library (handling of the ConnectionStartBody) with regard to the handling of field tables. The failure mechanism is not fully understood, but this seemingly innocuous change solves the problem. condor 7.2.0-0.6 will require qpidc&qmf >= 720973 This was not resolved in 720973, turns out ft.clear() did not solve the problem, which is somehow related to weak symbols and/or function definitions in header files. The known workaround is to define FieldTable::clear in a .cpp file, or to make qmf-gen generate separate blocks, e.g. {}, around ft's use. This appears to be resolved in 7.2.0-0.8 with the addition of -I/usr/local/qpid-boost for RHEL4 builds An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-0036.html |