Description of problem: When a deltacloud jobs contains wrong credentials (invalid username or password), the job is not (immediately) moved to hold state, which is the expected behavior when the specified image id is wrong. GridmanagerLog (D_FULLDEBUG) contains lines like 07/11/12 20:42:39 [25679] GAHP[25683] -> '3' 'Instance_Fetch_Failure: 401 Unauthorized' 07/11/12 20:42:39 [25679] BaseResource::DoBatchStatus for http://dc.example.com:3002/api. 07/11/12 20:42:39 [25679] Error attempting a Deltacloud batch status query: Instance_Fetch_Failure: 401 Unauthorized 07/11/12 20:42:39 [25679] BaseResource::DoBatchStatus: An error occurred trying to finish a bulk poll of http://dc.example.com:3002/api and then, periodically, lines like: 07/11/12 20:47:40 [25679] GAHP[25683] -> '4' 'Instance_Fetch_Failure: 401 Unauthorized' 07/11/12 20:47:40 [25679] resource http://dc.example.com:3002/api is still down Version-Release number of selected component (if applicable): condor-deltacloud-gahp-7.6.5-0.16 condor-7.6.5-0.16
And moreover, when said job is removed, gridmanager still continue to try to fetch information and the jobs stays in X state.
Same behavior for jobs when the deltacloud server can't be reached (also for comment #1: this jobs are difficult to kill).
It is expected a job will stay X if any resources have been allocated via deltacloud. The gridmanager wants to make sure it has cleaned up the remote side.
MRG-Grid is in maintenance and only customer escalations will be considered. This issue can be reopened if a customer escalation associated with it occurs.