Bug 625229 - higher than expected CPU load from condor_configd
Summary: higher than expected CPU load from condor_configd
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor-wallaby-client
Version: beta
Hardware: All
OS: Linux
high
medium
Target Milestone: 1.3
: ---
Assignee: Robert Rati
QA Contact: Lubos Trilety
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-08-18 22:36 UTC by Pete MacKinnon
Modified: 2010-11-08 15:43 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-11-08 15:43:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
agent_ruby.rb (9.98 KB, text/plain)
2010-10-19 09:39 UTC, Lubos Trilety
no flags Details

Description Pete MacKinnon 2010-08-18 22:36:04 UTC
Testing mostly on mrg31 which hosts the broker for the MRG grid pool plugins, cumin, etc. it has been observed that the local condor_configd consumes more CPU than expected. In fact, its CPU usage mirrors closely whatever the qpidd is doing at that time within approximately 5%. Sometimes the broker can jump to 30% and the configd follows suit, say 26%.

This is believed to be due to the fact that the QMF Session object in the configd is receiving events for various unsolicited QMF activities (eg., agent re-connects) in the entire broker space.

A fix has been proposed that will restrict the configd's event intake to that of only the wallaby-agent as follows:

      self.session = Session(self.console, manageConnections=False, rcvObjects=True, rcvHeartbeats=False, rcvEvents=True, userBindings=True)
      self.session.bindAgent("com.redhat.grid.config", "Store")
      self.session.addEventFilter(package='com.redhat.grid.config', event='NodeUpdatedNotice')

Comment 1 Matthew Farrellee 2010-08-19 01:12:09 UTC
The CPU load was observed on all systems running a configd.

Comment 2 Robert Rati 2010-08-20 20:07:02 UTC
Added the bindAgent call.

Fixed in:
condor-wallaby-3.5-1

Comment 3 Lubos Trilety 2010-10-14 16:10:20 UTC
I run test with reconnecting agents to broker and configd still has cpu load
related to broker load.

Tested with (version):
condor-wallaby-client-3.6-6.el5
wallaby-utils-0.9.18-2.el5
ruby-wallaby-0.9.18-2.el5
condor-wallaby-base-db-1.4-5.el5
python-wallabyclient-3.6-6.el5
condor-wallaby-tools-3.6-6.el5
wallaby-0.9.18-2.el5

Comment 7 Ken Giusti 2010-10-18 17:32:57 UTC
Hi Lubos,

I'm having difficulty trying to reproduce the problem using my setup.  Can you provide more detail on exactly how you cause the CPU spike to occur?

thanks,

-K

Comment 8 Lubos Trilety 2010-10-19 09:39:26 UTC
Created attachment 454311 [details]
agent_ruby.rb

My reproduction scenario is easy one, I just set auth=no in qpidd.conf and run enough amount of slightly modified agent_ruby.rb using this script:

NUM_OF_AGENTS=21
I=0;
while true; do
  if [ "$(ps -eo comm | grep -i agent_ruby.rb | wc -l)" -lt "$NUM_OF_AGENTS" ]; then
    I=$(($I+1));
    ( ./agent_ruby.rb $I > /dev/null 2>&1 & sleep $((${RANDOM}%2)); kill $! ) &
  else
    sleep 1;
  fi;
done

The used agent_ruby.rb script can be found in attachment.

Comment 9 Ken Giusti 2010-10-20 18:01:59 UTC
I can reproduce the problem with Lubos' attachment & script.

The test creates and destroys agents rapidly.  This causes the following message message pattern to be generated to all consoles each time an agent create/destroy is done:

newPackage: org.apache.qpid.qmf399
newClass: 1 org.apache.qpid.qmf399:child(4c09a917-7402-0000-5050-5050d0d2d2d2)
newClass: 1 org.apache.qpid.qmf399:parent(68ce7740-acf0-e8ee-c460-a5ac54da7f74)
newClass: 2 org.apache.qpid.qmf399:test_event(2686c8c0-f552-f78f-48a4-a0a0a0a0a0a0)
newClass: 1 org.apache.qpid.qmf399:child(4c09a917-7402-0000-5050-5050d0d2d2d2)
newClass: 1 org.apache.qpid.qmf399:parent(68ce7740-acf0-e8ee-c460-a5ac54da7f74)
newClass: 2 org.apache.qpid.qmf399:test_event(2686c8c0-f552-f78f-48a4-a0a0a0a0a0a0)
newAgent: Agent(v1) at bank 1.251 (agent_test_label399)
objectProps: org.apache.qpid.broker:agent[0-21-1-0-1993] 0-21-1-0-1987
delAgent: Agent(v1) at bank 1.251 (agent_test_label399)
objectProps: org.apache.qpid.broker:agent[0-21-1-0-1993] 0-21-1-0-1987

These are qmf-related messages.  

The newPackage/newClass messages are being generated because each test agent instantiates uniquely-named packages and classes.  This forces a newPackage/newClass event on every console.  The current QMF impl uses V1 style schema messages, so there is no way (yet) to filter messages of these types.  This is known a known QMF behavior that is being addressed in V2.

The newAgent/delAgent updates appear to be a QMF bug, in that V1 style agents that are managed by the broker are not being filtered by the bindAgent() call.  I will open a BZ against this.

Comment 10 Matthew Farrellee 2010-10-26 17:12:07 UTC
New issue is captured as bug 645015.

Comment 11 Lubos Trilety 2010-11-01 14:21:03 UTC
I was not able to reproduce the bug on old version:
qpid-cpp-client-0.7.946106-11.el5
qpid-cpp-server-devel-0.7.946106-11.el5
qpid-cpp-mrg-debuginfo-0.7.946106-16.el5
qpid-cpp-server-0.7.946106-11.el5
qpid-java-common-0.7.946106-7.el5
qpid-cpp-client-devel-docs-0.7.946106-11.el5
qpid-cpp-client-devel-0.7.946106-11.el5
python-qpid-0.7.946106-11.el5
qpid-cpp-server-store-0.7.946106-11.el5
qpid-cpp-server-xml-0.7.946106-11.el5
qpid-cpp-client-ssl-0.7.946106-11.el5
qpid-cpp-server-cluster-0.7.946106-11.el5
qpid-java-client-0.7.946106-7.el5
qpid-cpp-server-ssl-0.7.946106-11.el5
qpid-tools-0.7.946106-8.el5
condor-wallaby-client-3.4-1.el5

I tried to start/stop condor continuously with condor-qmf installed and configured, but it doesn't load broker enough.
I also tried to start/stop multiple sesame processes (without any modification), this test created some load on broker but none on condor_configd.

Could you please provide me with reproduction scenario?

Thanks,
Lubos

Comment 12 Lubos Trilety 2010-11-02 15:02:34 UTC
Successfully reproduced with multiple qmf-agents start/stoping.

Reproduced on:
condor-wallaby-client-3.4-1.el5
qpid-cpp-client-ssl-0.7.946106-12.el5
qpid-cpp-server-ssl-0.7.946106-12.el5
qpid-cpp-server-store-0.7.946106-12.el5
qpid-cpp-mrg-debuginfo-0.7.946106-16.el5
qpid-cpp-server-0.7.946106-12.el5
qpid-java-common-0.7.946106-7.el5
qpid-cpp-server-xml-0.7.946106-12.el5
qpid-cpp-client-devel-docs-0.7.946106-12.el5
qpid-cpp-server-cluster-0.7.946106-12.el5
qpid-cpp-server-devel-0.7.946106-12.el5
qpid-cpp-client-devel-0.7.946106-12.el5
python-qpid-0.7.946106-12.el5
qpid-java-client-0.7.946106-7.el5
qpid-tools-0.7.946106-8.el5
qpid-cpp-client-0.7.946106-12.el5


Tested with (version):
qpid-cpp-server-xml-0.7.946106-17.el5
qpid-tools-0.7.946106-11.el5
qpid-cpp-mrg-debuginfo-0.7.946106-16.el5
qpid-cpp-server-0.7.946106-17.el5
qpid-cpp-client-rdma-0.7.946106-17.el5
qpid-cpp-server-ssl-0.7.946106-17.el5
qpid-cpp-server-store-0.7.946106-17.el5
qpid-java-client-0.7.946106-11.el5
qpid-cpp-client-0.7.946106-17.el5
qpid-cpp-client-devel-0.7.946106-17.el5
qpid-cpp-server-cluster-0.7.946106-17.el5
qpid-java-common-0.7.946106-11.el5
qpid-java-example-0.7.946106-11.el5
qpid-tests-0.7.946106-1.el5
qpid-cpp-server-devel-0.7.946106-17.el5
rh-qpid-cpp-tests-0.7.946106-17.el5
python-qpid-0.7.946106-14.el5
qpid-cpp-client-ssl-0.7.946106-17.el5
qpid-cpp-server-rdma-0.7.946106-17.el5
ruby-qpid-0.7.946106-2.el5
qpid-cpp-client-devel-docs-0.7.946106-17.el5
condor-wallaby-client-3.6-6

Tested on:
RHEL5 x86_64,i386                 - passed
RHEL4 x86_64,i386 (only configd)  - passed

>>> VERIFIED


Note You need to log in before you can comment on or make changes to this bug.