593062 – The default timeout associated with a session fails during scale testing.

Bug 593062 - The default timeout associated with a session fails during scale testing.

Summary: The default timeout associated with a session fails during scale testing.

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	Red Hat Enterprise MRG
Classification:	Red Hat
Component:	python-qpid
Sub Component:
Version:	beta
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	1.3
Target Release:	---
Assignee:	Ken Giusti
QA Contact:	MRG Quality Engineering
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	587410
TreeView+	depends on / blocked

Reported:	2010-05-17 18:16 UTC by Ken Giusti
Modified:	2011-07-15 19:32 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2011-07-15 19:32:16 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Ken Giusti 2010-05-17 18:16:08 UTC

Description of problem:

During the qmf scale testing - see 
https://bugzilla.redhat.com/show_bug.cgi?id=587410 - at approximately 2000 condor_configd consoles we start seeing the following failure:


 Traceback (most recent call last):
   File "/root/kgiusti/configuration-tools/condor_configd", line 370, in ?
     sys.exit(main())
   File "/root/kgiusti/configuration-tools/condor_configd", line 357, in main
     service.check_config_ver()
   File "/root/kgiusti/configuration-tools/condor_configd", line 241, in check_config_ver
     self.node_obj.checkin()
   File "/usr/lib/python2.4/site-packages/qmf/console.py", line 305, in <lambda>
     return lambda *args, **kwargs : self._invoke(name, args, kwargs)
   File "/usr/lib/python2.4/site-packages/qmf/console.py", line 428, in _invoke
     seq = self._sendMethodRequest(name, args, kwargs, sync, timeout)
   File "/usr/lib/python2.4/site-packages/qmf/console.py", line 409, in _sendMethodRequest
     self._broker._send(smsg, exchange)
   File "/usr/lib/python2.4/site-packages/qmf/console.py", line 2163, in _send
     self.amqpSession.message_transfer(destination=dest, message=msg)
   File "/usr/lib/python2.4/site-packages/qpid/generator.py", line 25, in <lambda>
     method = lambda self, *args, **kwargs: self.invoke(op, args, kwargs)
   File "/usr/lib/python2.4/site-packages/qpid/session.py", line 138, in invoke
     return self.do_invoke(op, args, kwargs)
   File "/usr/lib/python2.4/site-packages/qpid/session.py", line 183, in do_invoke
     self.sync(self.timeout)
   File "/usr/lib/python2.4/site-packages/qpid/session.py", line 101, in sync
     raise Timeout()
 qpid.exceptions.Timeout


This occurs during the simultaneous startup of many condor_configd processes, all communicating with one wallaby-agent via one qpidd broker.   This causes the broker to run at 100% cpu while the condor_configd processes initialize and download the configuration from the wallaby-agent.

The value for this timeout is 10 seconds.  This is the default assigned by the Session class __init__ method.

As a test, we bumped the default timeout from 10 to 120 seconds.  This eliminated the timeout failure when we run the scale tests.

We need to tune the default for this timeout such that the scale targets for BZ587410 can me met.



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Ken Giusti 2010-05-20 00:43:43 UTC

Have submitted what I believe will fix this issue upstream:

http://svn.apache.org/viewvc?view=revision&revision=945716

leaving bug open until I can run the scale tests against this fix.

Comment 2 Ken Giusti 2010-05-25 22:10:40 UTC

Ran the condor_configd scale tests (configuration-tools 2.7-0.4) against python-qmf 0.7.946106-3 - no timeout failures encountered.

Fix verified.

Note You need to log in before you can comment on or make changes to this bug.