| Summary: | ec2_gahp: transient error leads to hold jobs and leaked AMI instances | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Luigi Toscano <ltoscano> | ||||||
| Component: | condor | Assignee: | grid-maint-list <grid-maint-list> | ||||||
| Status: | CLOSED WONTFIX | QA Contact: | MRG Quality Engineering <mrgqe-bugs> | ||||||
| Severity: | low | Docs Contact: | |||||||
| Priority: | low | ||||||||
| Version: | 2.1 | CC: | matt, tstclair | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | All | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2016-05-26 19:12:58 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Attachments: |
|
||||||||
|
Description
Luigi Toscano
2011-12-05 18:28:42 UTC
Created attachment 541040 [details]
Example of errors and job held
Created attachment 541041 [details]
Gridmanager.<user> log for a released job
This is the Gridmanager log for two jobs which where moved to hold because of an error and released (condor_release) afterwards.
EC2's Query API passes back appropriate HTTP error codes, e.g. 401 for AuthFailure, 500 for InternalError. Those codes can be used to determine if the error is fatal (e.g. client presenting invalid credentials) or non-fatal (e.g. internal server error, try again). Fatal errors should result in a job being held, non-fatal should not. The fatal/non determination can either be made in the ec2-gahp of gridmanager, preferred location TBD. ec2-gahp errors are currently ad-hoc. As for the Hold->Idle->Running, the correct approach is to take the instance down during the *->Hold transition. *** Bug 783713 has been marked as a duplicate of this bug. *** MRG-G is in maintenance only and only customer escalations will be addressed from this point forward. This issue can be re-opened if a customer escalation associated with this issue occurs. |