Bug 621465

Summary: redundant pool entries
Product: Red Hat Enterprise MRG Reporter: Jan Sarenik <jsarenik>
Component: cuminAssignee: Pete MacKinnon <pmackinn>
Status: CLOSED ERRATA QA Contact: Jan Sarenik <jsarenik>
Severity: medium Docs Contact:
Priority: medium    
Version: DevelopmentCC: pmackinn
Target Milestone: 1.3   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
# MRG packages are all from -devel repository yum -y install cumin qpid-cpp-server qpid-tools qpid-cpp-client-devel sesame condor-qmf cumin-database install <<< "yes" cat <<EOF > ~condor/config/001-qmf.conf QUEUE_ALL_USERS_TRUSTED=True STARTD.PLUGINS = $(LIB)/plugins/MgmtStartdPlugin-plugin.so SCHEDD.PLUGINS = $(LIB)/plugins/MgmtScheddPlugin-plugin.so COLLECTOR.PLUGINS = $(LIB)/plugins/MgmtCollectorPlugin-plugin.so NEGOTIATOR.PLUGINS = $(LIB)/plugins/MgmtNegotiatorPlugin-plugin.so MASTER.PLUGINS = $(LIB)/plugins/MgmtMasterPlugin-plugin.so QMF_BROKER_HOST = 127.0.0.1 EOF cumin-admin add-user cumin cumin service qpidd start service sesame start service condor start service cumin start firefox "http://localhost:45672"
Last Closed: 2010-10-20 11:30:26 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jan Sarenik 2010-08-05 07:05:46 UTC
When condor is restarted, another pool with the same name appears
in [ Administrator -> Grid ] and after clicking on any of them,
then "Create submission", it ends with the yellow box saying:

   * Create submission 'description': Failed (Agent
     'com.redhat.grid:scheduler:localhost.localdomain' is unknown)

The redundant copy of the pool does not disappear even after
hours (tried ~12 hours so far).

cumin-0.1.4185-1.el5
condor-qmf-7.4.4-0.7.el5

Actual results: As many times as you restart condor,
pool with the same name appears in Cumin making the
webUI job submission broken.

Expected results: I would like if cumin can continue with the
data it had before the restart of condor. Indication of being
offline (while condor is not running on remote) would help
as well. Then when condor is restarted, the pool should just
become online and continue with the same graphs it produced
before, with gaps (or special line) for offline. So I expect
no redundant entries for the same condor pool.

Comment 1 Jan Sarenik 2010-08-05 07:26:44 UTC
Actually, I found that there are at least two distinct problems.

One is the redundant entry in Cumin webUI.

The other is QMF classes/IDs - after restarting qpidd (nothing more)
I can already submit jobs via Cumin (though multiple records in the
Grid list are still present...)

Comment 2 Jan Sarenik 2010-08-05 08:25:02 UTC
Restarting qpidd is not enough, though one job may be submitted
successfully right after restart, the other jobs fail as described
earlier.

Comment 3 Jan Sarenik 2010-09-03 10:20:42 UTC
Verified on cumin-0.1.4219-1.el5