Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 625229 - higher than expected CPU load from condor_configd
higher than expected CPU load from condor_configd
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor-wallaby-client (Show other bugs)
beta
All Linux
high Severity medium
: 1.3
: ---
Assigned To: Robert Rati
Lubos Trilety
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-08-18 18:36 EDT by Pete MacKinnon
Modified: 2010-11-08 10:43 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-11-08 10:43:45 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
agent_ruby.rb (9.98 KB, text/plain)
2010-10-19 05:39 EDT, Lubos Trilety
no flags Details

  None (edit)
Description Pete MacKinnon 2010-08-18 18:36:04 EDT
Testing mostly on mrg31 which hosts the broker for the MRG grid pool plugins, cumin, etc. it has been observed that the local condor_configd consumes more CPU than expected. In fact, its CPU usage mirrors closely whatever the qpidd is doing at that time within approximately 5%. Sometimes the broker can jump to 30% and the configd follows suit, say 26%.

This is believed to be due to the fact that the QMF Session object in the configd is receiving events for various unsolicited QMF activities (eg., agent re-connects) in the entire broker space.

A fix has been proposed that will restrict the configd's event intake to that of only the wallaby-agent as follows:

      self.session = Session(self.console, manageConnections=False, rcvObjects=True, rcvHeartbeats=False, rcvEvents=True, userBindings=True)
      self.session.bindAgent("com.redhat.grid.config", "Store")
      self.session.addEventFilter(package='com.redhat.grid.config', event='NodeUpdatedNotice')
Comment 1 Matthew Farrellee 2010-08-18 21:12:09 EDT
The CPU load was observed on all systems running a configd.
Comment 2 Robert Rati 2010-08-20 16:07:02 EDT
Added the bindAgent call.

Fixed in:
condor-wallaby-3.5-1
Comment 3 Lubos Trilety 2010-10-14 12:10:20 EDT
I run test with reconnecting agents to broker and configd still has cpu load
related to broker load.

Tested with (version):
condor-wallaby-client-3.6-6.el5
wallaby-utils-0.9.18-2.el5
ruby-wallaby-0.9.18-2.el5
condor-wallaby-base-db-1.4-5.el5
python-wallabyclient-3.6-6.el5
condor-wallaby-tools-3.6-6.el5
wallaby-0.9.18-2.el5
Comment 7 Ken Giusti 2010-10-18 13:32:57 EDT
Hi Lubos,

I'm having difficulty trying to reproduce the problem using my setup.  Can you provide more detail on exactly how you cause the CPU spike to occur?

thanks,

-K
Comment 8 Lubos Trilety 2010-10-19 05:39:26 EDT
Created attachment 454311 [details]
agent_ruby.rb

My reproduction scenario is easy one, I just set auth=no in qpidd.conf and run enough amount of slightly modified agent_ruby.rb using this script:

NUM_OF_AGENTS=21
I=0;
while true; do
  if [ "$(ps -eo comm | grep -i agent_ruby.rb | wc -l)" -lt "$NUM_OF_AGENTS" ]; then
    I=$(($I+1));
    ( ./agent_ruby.rb $I > /dev/null 2>&1 & sleep $((${RANDOM}%2)); kill $! ) &
  else
    sleep 1;
  fi;
done

The used agent_ruby.rb script can be found in attachment.
Comment 9 Ken Giusti 2010-10-20 14:01:59 EDT
I can reproduce the problem with Lubos' attachment & script.

The test creates and destroys agents rapidly.  This causes the following message message pattern to be generated to all consoles each time an agent create/destroy is done:

newPackage: org.apache.qpid.qmf399
newClass: 1 org.apache.qpid.qmf399:child(4c09a917-7402-0000-5050-5050d0d2d2d2)
newClass: 1 org.apache.qpid.qmf399:parent(68ce7740-acf0-e8ee-c460-a5ac54da7f74)
newClass: 2 org.apache.qpid.qmf399:test_event(2686c8c0-f552-f78f-48a4-a0a0a0a0a0a0)
newClass: 1 org.apache.qpid.qmf399:child(4c09a917-7402-0000-5050-5050d0d2d2d2)
newClass: 1 org.apache.qpid.qmf399:parent(68ce7740-acf0-e8ee-c460-a5ac54da7f74)
newClass: 2 org.apache.qpid.qmf399:test_event(2686c8c0-f552-f78f-48a4-a0a0a0a0a0a0)
newAgent: Agent(v1) at bank 1.251 (agent_test_label399)
objectProps: org.apache.qpid.broker:agent[0-21-1-0-1993] 0-21-1-0-1987
delAgent: Agent(v1) at bank 1.251 (agent_test_label399)
objectProps: org.apache.qpid.broker:agent[0-21-1-0-1993] 0-21-1-0-1987

These are qmf-related messages.  

The newPackage/newClass messages are being generated because each test agent instantiates uniquely-named packages and classes.  This forces a newPackage/newClass event on every console.  The current QMF impl uses V1 style schema messages, so there is no way (yet) to filter messages of these types.  This is known a known QMF behavior that is being addressed in V2.

The newAgent/delAgent updates appear to be a QMF bug, in that V1 style agents that are managed by the broker are not being filtered by the bindAgent() call.  I will open a BZ against this.
Comment 10 Matthew Farrellee 2010-10-26 13:12:07 EDT
New issue is captured as bug 645015.
Comment 11 Lubos Trilety 2010-11-01 10:21:03 EDT
I was not able to reproduce the bug on old version:
qpid-cpp-client-0.7.946106-11.el5
qpid-cpp-server-devel-0.7.946106-11.el5
qpid-cpp-mrg-debuginfo-0.7.946106-16.el5
qpid-cpp-server-0.7.946106-11.el5
qpid-java-common-0.7.946106-7.el5
qpid-cpp-client-devel-docs-0.7.946106-11.el5
qpid-cpp-client-devel-0.7.946106-11.el5
python-qpid-0.7.946106-11.el5
qpid-cpp-server-store-0.7.946106-11.el5
qpid-cpp-server-xml-0.7.946106-11.el5
qpid-cpp-client-ssl-0.7.946106-11.el5
qpid-cpp-server-cluster-0.7.946106-11.el5
qpid-java-client-0.7.946106-7.el5
qpid-cpp-server-ssl-0.7.946106-11.el5
qpid-tools-0.7.946106-8.el5
condor-wallaby-client-3.4-1.el5

I tried to start/stop condor continuously with condor-qmf installed and configured, but it doesn't load broker enough.
I also tried to start/stop multiple sesame processes (without any modification), this test created some load on broker but none on condor_configd.

Could you please provide me with reproduction scenario?

Thanks,
Lubos
Comment 12 Lubos Trilety 2010-11-02 11:02:34 EDT
Successfully reproduced with multiple qmf-agents start/stoping.

Reproduced on:
condor-wallaby-client-3.4-1.el5
qpid-cpp-client-ssl-0.7.946106-12.el5
qpid-cpp-server-ssl-0.7.946106-12.el5
qpid-cpp-server-store-0.7.946106-12.el5
qpid-cpp-mrg-debuginfo-0.7.946106-16.el5
qpid-cpp-server-0.7.946106-12.el5
qpid-java-common-0.7.946106-7.el5
qpid-cpp-server-xml-0.7.946106-12.el5
qpid-cpp-client-devel-docs-0.7.946106-12.el5
qpid-cpp-server-cluster-0.7.946106-12.el5
qpid-cpp-server-devel-0.7.946106-12.el5
qpid-cpp-client-devel-0.7.946106-12.el5
python-qpid-0.7.946106-12.el5
qpid-java-client-0.7.946106-7.el5
qpid-tools-0.7.946106-8.el5
qpid-cpp-client-0.7.946106-12.el5


Tested with (version):
qpid-cpp-server-xml-0.7.946106-17.el5
qpid-tools-0.7.946106-11.el5
qpid-cpp-mrg-debuginfo-0.7.946106-16.el5
qpid-cpp-server-0.7.946106-17.el5
qpid-cpp-client-rdma-0.7.946106-17.el5
qpid-cpp-server-ssl-0.7.946106-17.el5
qpid-cpp-server-store-0.7.946106-17.el5
qpid-java-client-0.7.946106-11.el5
qpid-cpp-client-0.7.946106-17.el5
qpid-cpp-client-devel-0.7.946106-17.el5
qpid-cpp-server-cluster-0.7.946106-17.el5
qpid-java-common-0.7.946106-11.el5
qpid-java-example-0.7.946106-11.el5
qpid-tests-0.7.946106-1.el5
qpid-cpp-server-devel-0.7.946106-17.el5
rh-qpid-cpp-tests-0.7.946106-17.el5
python-qpid-0.7.946106-14.el5
qpid-cpp-client-ssl-0.7.946106-17.el5
qpid-cpp-server-rdma-0.7.946106-17.el5
ruby-qpid-0.7.946106-2.el5
qpid-cpp-client-devel-docs-0.7.946106-17.el5
condor-wallaby-client-3.6-6

Tested on:
RHEL5 x86_64,i386                 - passed
RHEL4 x86_64,i386 (only configd)  - passed

>>> VERIFIED

Note You need to log in before you can comment on or make changes to this bug.