Bug 634234 - Stable agent id for Startd
Summary: Stable agent id for Startd
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor-qmf
Version: 1.2
Hardware: All
OS: Linux
high
medium
Target Milestone: 1.3
: ---
Assignee: Pete MacKinnon
QA Contact: Jan Sarenik
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-09-15 15:30 UTC by Matthew Farrellee
Modified: 2010-10-21 18:45 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-10-21 18:45:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Matthew Farrellee 2010-09-15 15:30:53 UTC
Description of problem:

Without a stable agent id, Cumin cannot cleanly garbage collect stale agents in all cases. Slots can appear duplicated.


Version-Release number of selected component (if applicable):

7.4.4-0.13


How reproducible:

100%


Steps to Reproduce:
1. run console, view slot screen
2. service cumin stop
3. service condor restart
4. service cumin start


Actual results:

Duplicate entries in UI


Expected results:

Opposite of "Actual results"


Additional info:

ManagementAgent::setName's third parameter (optional) is a stable name.

src/management $ grep setName *
MgmtCollectorPlugin.cpp:		agent->setName("com.redhat.grid","collector",collName.c_str());
MgmtMasterPlugin.cpp:		agent->setName("com.redhat.grid","master", default_name);
MgmtNegotiatorPlugin.cpp:		agent->setName("com.redhat.grid","negotiator", mmName.c_str());
MgmtScheddPlugin.cpp:	agent->setName("com.redhat.grid","scheduler", schedd_name.c_str());
MgmtStartdPlugin.cpp:		agent->setName("com.redhat.grid","slot");

We set it in all QMF plugins, except the Startd.

We should set it in the Startd, using the extern char * Name from startd_main.cpp.

Comment 1 Pete MacKinnon 2010-09-15 16:06:32 UTC
FH sha 849ae64

no name specified (default)...

slot = 0-0-1-com.redhat.grid:slot:pmackinn@localhost.localdomain
slot = 0-0-1-com.redhat.grid:slot:pmackinn@localhost.localdomain


STARTD_NAME = whiteford...

slot = 0-0-1-com.redhat.grid:slot:whiteford@whiteford
slot = 0-0-1-com.redhat.grid:slot:whiteford@whiteford

condor_startd -t -f -name petey...
slot = 0-0-1-com.redhat.grid:slot:petey@petey
slot = 0-0-1-com.redhat.grid:slot:petey@petey

Comment 2 Jan Sarenik 2010-10-12 09:43:25 UTC
Reproduced on RHEL5 x86_64 with packages:

  condor-qmf-7.4.4-0.13.el5
  condor-7.4.4-0.13.el5
  cumin-0.1.4369-1.el5
  ...and their dependencies...

Verified with
  condor-7.4.4-0.16.el5
  condor-qmf-7.4.4-0.16.el5

Comment 3 Jan Sarenik 2010-10-12 11:24:28 UTC
Verified also with
  condor-7.4.4-0.16.el4
  condor-qmf-7.4.4-0.16.el4

On RHEL4 (both i386 and x86_64).

Comment 4 Jan Sarenik 2010-10-12 11:30:58 UTC
And finally with condor-qmf-7.4.4-0.16.el5.i386.rpm
on a RHEL5 i386 box.


Note You need to log in before you can comment on or make changes to this bug.