When condor is restarted, another pool with the same name appears in [ Administrator -> Grid ] and after clicking on any of them, then "Create submission", it ends with the yellow box saying: * Create submission 'description': Failed (Agent 'com.redhat.grid:scheduler:localhost.localdomain' is unknown) The redundant copy of the pool does not disappear even after hours (tried ~12 hours so far). cumin-0.1.4185-1.el5 condor-qmf-7.4.4-0.7.el5 Actual results: As many times as you restart condor, pool with the same name appears in Cumin making the webUI job submission broken. Expected results: I would like if cumin can continue with the data it had before the restart of condor. Indication of being offline (while condor is not running on remote) would help as well. Then when condor is restarted, the pool should just become online and continue with the same graphs it produced before, with gaps (or special line) for offline. So I expect no redundant entries for the same condor pool.
Actually, I found that there are at least two distinct problems. One is the redundant entry in Cumin webUI. The other is QMF classes/IDs - after restarting qpidd (nothing more) I can already submit jobs via Cumin (though multiple records in the Grid list are still present...)
Restarting qpidd is not enough, though one job may be submitted successfully right after restart, the other jobs fail as described earlier.
Verified on cumin-0.1.4219-1.el5