Bug 591535 - wallaby becomes disk i/o bound during scale testing.
Summary: wallaby becomes disk i/o bound during scale testing.
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: wallaby
Version: beta
Hardware: All
OS: Linux
high
high
Target Milestone: 1.3
: ---
Assignee: Will Benton
QA Contact: Tomas Rusnak
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-05-12 14:18 UTC by Ken Giusti
Modified: 2011-02-02 13:23 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-02-02 13:23:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
scripts used to setup wallaby-agent and configd scale tests (3.46 KB, application/x-gzip)
2010-05-12 17:27 UTC, Ken Giusti
no flags Details
scripts used to setup wallaby-agent and configd scale tests for 1.3 wallaby (1.23 KB, application/x-gzip)
2010-11-22 15:30 UTC, Tomas Rusnak
no flags Details

Description Ken Giusti 2010-05-12 14:18:52 UTC
Description of problem:

Running 1 wallaby against 1000 configd processes (on a different physical machine), the machine hosting wallaby experienced very high disk utilization (98% reported by iostat) when condor_configure_pool was used to update the default group.   The condor_configure_pool command eventually timed out.

We then attempted to reproduce the test, but this time moving the config, snapshot, and log files (-d, -s, -l wallaby-agent options) to a fast ssd drive.  Using the ssd drive allowed the condor_configure_pool command to complete without timing out.



How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Will suggests moving just the config file to the ssd and retest:

(10:10:55 AM) willb: kgiusti:  what if you just put the -d dir on the ssd?

Comment 1 Ken Giusti 2010-05-12 14:21:47 UTC
Example - command used to cause wallaby <-> configd io "storm":

[kgiusti@localhost qmf-scale]$ PYTHONPATH="/usr/lib/python2.4/site-packages" condor_configure_pool --default-group -a -f Master

Apply these changes [Y/n] ? 
The following parameters need to be set for this configuration to be valid.
QMF_BROKER_HOST
CONDOR_HOST
Set these parameters now ? [y/N] y
QMF_BROKER_HOST: localhost
CONDOR_HOST: localhost
Configuration applied

Save this configuration [y/N] ?  

Activate the changes [y/N] ? y
Configuration activated

Comment 2 Ken Giusti 2010-05-12 17:27:07 UTC
Created attachment 413495 [details]
scripts used to setup wallaby-agent and configd scale tests

see readme file included in tar

Comment 3 Ken Giusti 2010-05-12 17:50:02 UTC
0) get scripts from attached tar - see the included README.txt

1) Run wallaby-setup.sh, using a local qpidd and /tmp filesystem for config.db:

./wallaby-setup.sh --dir /tmp/wallaby

ps -fwwww

ruby /usr/bin/wallaby-agent -H localhost -p 5672 -d /tmp/wallaby/config.db -s /tmp/wallaby/snapshot.db -l /tmp/wallaby/logfile.txt


2) create 1000 configd's on a different host:

CONDOR_CONFIGD_CMD="/root/kgiusti/configuration-tools/condor_configd" ./qmf-scale/configd-setup.sh --broker pman08 --port 5672 --count 1000 --name pman04

3) time condor_configure_pool --default-group -a -f Master
....
Apply these changes [Y/n] ? Y
The following parameters need to be set for this configuration to be valid.
QMF_BROKER_HOST
CONDOR_HOST
Set these parameters now ? [y/N] y
QMF_BROKER_HOST: localhost
CONDOR_HOST: localhost
Configuration applied

Save this configuration [y/N] ? n

Activate the changes [y/N] ? Y

Traceback (most recent call last):
  File "/usr/sbin/condor_configure_pool", line 736, in ?
    sys.exit(main())
  File "/usr/sbin/condor_configure_pool", line 729, in main
    activate_configuration(config_store)
  File "/usr/sbin/condor_configure_pool", line 351, in activate_configuration
    result = store.activateConfiguration()
  File "/usr/lib/python2.4/site-packages/qmf/console.py", line 285, in <lambda>
    return lambda *args, **kwargs : self._invoke(name, args, kwargs)
  File "/usr/lib/python2.4/site-packages/qmf/console.py", line 419, in _invoke
    raise RuntimeError("Timed out waiting for method to respond")
RuntimeError: Timed out waiting for method to respond

real	1m29.660s
user	0m0.090s
sys	0m0.013s

Now - rerun the test, but put wallaby config db on the fast ssd drive:

./wallaby-setup.sh --cdb /ssd/tmp/wallaby/myconfig.db


ps -wf

ruby /usr/bin/wallaby-agent -H localhost -p 5672 -d /ssd/tmp/wallaby/myconfig.db -s /tmp/snapshot.db -l /tmp/logfile.txt

<snip>


time condor_configure_pool --default-group -a -f Master

Apply these changes [Y/n] ? Y
The following parameters need to be set for this configuration to be valid.
QMF_BROKER_HOST
CONDOR_HOST
Set these parameters now ? [y/N] Y
QMF_BROKER_HOST: localhost
CONDOR_HOST: localhost
Configuration applied

Save this configuration [y/N] ? 

Activate the changes [y/N] ? Y
Configuration activated

real	0m53.190s
user	0m0.089s
sys	0m0.018s

Comment 4 Will Benton 2010-05-27 15:00:18 UTC
I believe commit 3db09f0a (included in wallaby-0.3.5-1 and beyond) fixes this problem.

Comment 5 Tomas Rusnak 2010-10-26 15:19:44 UTC
The condor_configd from condor-wallaby-client beyond 0.3.5-1 doesn't support parameters on command line, like -hostname -log-file etc. I tried condor-wallaby-client-2.5-0.1 to reproduce it. 

Do you have any other idea how to run 1000 configd process on the same machine and give it different parameters?

Comment 6 Matthew Farrellee 2010-10-26 15:32:09 UTC
It should,

$ rpm -qf $(which condor_configd)
condor-wallaby-client-3.6-6.el5
$ grep -- --hostname $(which condor_configd)
         if option in ('-h', '--hostname'):
$ grep -- --log $(which condor_configd)
         if option in ('-l', '--logfile'):

Comment 7 Tomas Rusnak 2010-10-26 16:08:09 UTC
Sorry, I meant condor-wallaby-client versions before 0.3.5-1.

Comment 8 Matthew Farrellee 2010-10-26 16:47:31 UTC
The issue should be with wallaby. The version of condor-wallaby-client should only matter if there is an incompatibility between condor-wallaby-client 3.6-6 and wallaby pre 0.3.5-1.

Comment 9 Will Benton 2010-10-26 17:16:46 UTC
There almost surely is an incompatibility there; wallaby < 0.3.5 is ancient (it even predates the "com.redhat.grid" namespace IIRC, and there have been nontrivial API changes since).  Rob would know which version of the configd was first to support the features that Tomas needs for testing; hopefully there is such a version that is compatible with such an old wallaby.

Comment 10 Tomas Rusnak 2010-11-22 15:30:03 UTC
Created attachment 462057 [details]
scripts used to setup wallaby-agent and configd scale tests for 1.3 wallaby

Comment 11 Tomas Rusnak 2010-11-22 15:52:53 UTC
Retested over current wallaby: 

wallaby-0.9.18-2.el5

I have no SSD, fast drive simulated over tmpfs:

tmpfs on /tmp type tmpfs (rw,size=512m)

[root@localhost ~]# ls -la /tmp
total 836
drwxrwxrwt  2 root    root      100 Nov 22 10:43 .
drwxr-xr-x 24 root    root     4096 Nov 18 05:25 ..
-rw-r--r--  1 wallaby condor 221184 Nov 22 10:43 config.db
-rw-r--r--  1 wallaby condor   6781 Nov 22 10:41 logfile.txt
-rw-r--r--  1 wallaby condor 601088 Nov 22 10:43 snapshot.db

# time condor_configure_pool --default-group -a -f Master,NodeAccess

Apply these changes [Y/n] ? y
The following parameters need to be set for this configuration to be valid.
ALLOW_READ
ALLOW_WRITE
CONDOR_HOST
Set these parameters now ? [y/N] y
ALLOW_READ: *       
ALLOW_WRITE: *
CONDOR_HOST: localhost
Configuration applied

Create a named snapshot of this configuration [y/N] ? 

Activate the changes [y/N] ? y
Activating configuration.  This may take a while, please be patient
Configuration activated
Configuration saved

real	0m37.102s
user	0m0.312s
sys	0m0.029s
[root@hp-sl2x160zg6-02 ~]# 

No such regression found. Fixed in stable MRG 1.3 release in RHEL5.

>>> VERIFIED


Note You need to log in before you can comment on or make changes to this bug.