When condor is restarted, another pool with the same name appears
in [ Administrator -> Grid ] and after clicking on any of them,
then "Create submission", it ends with the yellow box saying:
* Create submission 'description': Failed (Agent
'com.redhat.grid:scheduler:localhost.localdomain' is unknown)
The redundant copy of the pool does not disappear even after
hours (tried ~12 hours so far).
Actual results: As many times as you restart condor,
pool with the same name appears in Cumin making the
webUI job submission broken.
Expected results: I would like if cumin can continue with the
data it had before the restart of condor. Indication of being
offline (while condor is not running on remote) would help
as well. Then when condor is restarted, the pool should just
become online and continue with the same graphs it produced
before, with gaps (or special line) for offline. So I expect
no redundant entries for the same condor pool.
Actually, I found that there are at least two distinct problems.
One is the redundant entry in Cumin webUI.
The other is QMF classes/IDs - after restarting qpidd (nothing more)
I can already submit jobs via Cumin (though multiple records in the
Grid list are still present...)
Restarting qpidd is not enough, though one job may be submitted
successfully right after restart, the other jobs fail as described
Verified on cumin-0.1.4219-1.el5