This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 472320 - Hung Primary Collector Causes Serious Delays
Hung Primary Collector Causes Serious Delays
Status: CLOSED WONTFIX
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: grid (Show other bugs)
1.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: grid-maint-list
MRG Quality Engineering
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-11-19 21:06 EST by Matthew Farrellee
Modified: 2016-05-24 12:48 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-05-24 12:48:06 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Matthew Farrellee 2008-11-19 21:06:18 EST
Description of problem:

Extreme pauses were observed at the beginning of execution for condor_submit, condor_q and condor_status (likely all tools). The theory was the primary collector in a HA setup was not running and thus the tools had to wait a timeout before trying the secondary collector. That turned out to be only partially true. The primary collector was hung, but as soon as it was killed the cli tools began operating quickly again. Current theory is that the primary collector still had port 9618 open and was queuing connections, but not accepting/rejecting them.

Thanks to jross for noticing this.


Version-Release number of selected component (if applicable):

7.2.0-0.2


How reproducible:

Unsure, likely 100%


Steps to Reproduce:
1. Hang the primary Collector in a HA setup, or maybe nc -l 9618 on the primary
2. Use command-line tools
3. Observe delays


Expected results:

Faster response from tools


Additional info:

The delay when connecting to a Collector could be set very low for the command-line tools; a means to access collectors in parallel could be implemented; ???

One unexplored, and potentially more significant issue is what happens to the Negotiator->Collector communication, is it purely a delay in the negotiation cycle? Updates from daemons should not be a problem since they are via UDP and done to all collectors at once.

Note You need to log in before you can comment on or make changes to this bug.