Bug 927381
| Summary: | [RFE][glance]: Graceful recovery from Image service outages | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Lon Hohberger <lhh> |
| Component: | openstack-glance | Assignee: | Flavio Percoco <fpercoco> |
| Status: | CLOSED UPSTREAM | QA Contact: | tkammer |
| Severity: | medium | Docs Contact: | |
| Priority: | low | ||
| Version: | 2.1 | CC: | afazekas, apevec, eglynn, fpercoco, scohen, yeylon |
| Target Milestone: | --- | Keywords: | FutureFeature, Triaged |
| Target Release: | 7.0 (Kilo) | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| URL: | https://blueprints.launchpad.net/glance/+spec/restartable-image-download | ||
| Whiteboard: | upstream_milestone_juno-rc1 upstream_status_implemented upstream_definition_discussion | ||
| Fixed In Version: | Doc Type: | Enhancement | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2015-02-10 23:15:47 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 919497, 1102481 | ||
| Bug Blocks: | |||
|
Description
Lon Hohberger
2013-03-25 19:38:28 UTC
One of the problems when discussing a manual restart was that newly installed python modules could be loaded by the old service. In that case things could break and behavior could be ill defined. This would also be a problem if we do a partial shutdown of the old services TCP listener and then immediately start the new process while the old is cleanly shutting down, as mentioned here: https://bugzilla.redhat.com/show_bug.cgi?id=919497#c10 I think the best option would be to add functionality to glance such that transfers are check-pointed and can be restarted. (In reply to comment #3) > I think the best option would be to add functionality to glance such that > transfers are check-pointed and can be restarted. Agreed, this would also cover any other temporary Image service outages, not just graceful restarts on upgrades. (In reply to comment #4) > any other temporary Image service outages For example, HA failover? (In reply to comment #3) > One of the problems when discussing a manual restart was that newly > installed python modules could be loaded by the old service. In that case > things could break and behavior could be ill defined. > > This would also be a problem if we do a partial shutdown of the old services > TCP listener and then immediately start the new process while the old is > cleanly shutting down, as mentioned here: > https://bugzilla.redhat.com/show_bug.cgi?id=919497#c10 > > I think the best option would be to add functionality to glance such that > transfers are check-pointed and can be restarted. It sounds like very unlikely situation. The old process expected to have all modules imported and will not read the new modules unless: - The code contains a conditional import which was not reached before - The code contains an explicit module reload Even if this happens, it only can cause issues when the new module is incompatible with the previews one. It is very likely the issue has the same impact as not having real graceful reload, for example it returns with an 500 error. Do we really have a real chance for a more serious issue, for example: - The old process causes issues to the new one ? - We leave behind a live resource consuming old process on every upgrade ? BTW: The glance-control command provides the reload action. (In reply to comment #6) > (In reply to comment #3) > > One of the problems when discussing a manual restart was that newly > > installed python modules could be loaded by the old service. In that case > > things could break and behavior could be ill defined. > > > > This would also be a problem if we do a partial shutdown of the old services > > TCP listener and then immediately start the new process while the old is > > cleanly shutting down, as mentioned here: > > https://bugzilla.redhat.com/show_bug.cgi?id=919497#c10 > > > > I think the best option would be to add functionality to glance such that > > transfers are check-pointed and can be restarted. > > It sounds like very unlikely situation. > The old process expected to have all modules imported and will not read the > new modules unless: > - The code contains a conditional import which was not reached before > - The code contains an explicit module reload > > Even if this happens, it only can cause issues when the new module is > incompatible with the previews one. > > It is very likely the issue has the same impact as not having real graceful > reload, for example it returns with an 500 error. > > Do we really have a real chance for a more serious issue, for example: > - The old process causes issues to the new one ? > - We leave behind a live resource consuming old process on every upgrade ? > > BTW: The glance-control command provides the reload action. Doesn't really matter since providing continuation of stopped tasks is a more robust solution which would cover a lot more cases. John, is there a blueprint around this? (In reply to Ayal Baron from comment #7) > (In reply to comment #6) > > (In reply to comment #3) > > > One of the problems when discussing a manual restart was that newly > > > installed python modules could be loaded by the old service. In that case > > > things could break and behavior could be ill defined. > > > > > > This would also be a problem if we do a partial shutdown of the old services > > > TCP listener and then immediately start the new process while the old is > > > cleanly shutting down, as mentioned here: > > > https://bugzilla.redhat.com/show_bug.cgi?id=919497#c10 > > > > > > I think the best option would be to add functionality to glance such that > > > transfers are check-pointed and can be restarted. > > > > It sounds like very unlikely situation. > > The old process expected to have all modules imported and will not read the > > new modules unless: > > - The code contains a conditional import which was not reached before > > - The code contains an explicit module reload > > > > Even if this happens, it only can cause issues when the new module is > > incompatible with the previews one. > > > > It is very likely the issue has the same impact as not having real graceful > > reload, for example it returns with an 500 error. > > > > Do we really have a real chance for a more serious issue, for example: > > - The old process causes issues to the new one ? > > - We leave behind a live resource consuming old process on every upgrade ? > > > > BTW: The glance-control command provides the reload action. > > Doesn't really matter since providing continuation of stopped tasks is a > more robust solution which would cover a lot more cases. > John, is there a blueprint around this? There are two. The most clearly applicable is: https://blueprints.launchpad.net/glance/+spec/restartable-image-download However, this one applies as well: https://blueprints.launchpad.net/glance/+spec/image-transfer-service It should also be noted that in the current implementation of glance you can send a SIGHUP to the parent api processes which will kill of the service but allow any child processes (the data movers) to continue working. There hasn't been any progress upstream on this specific area. However, the upload / download workflow is being enhanced. This may change the direction of this specific feature. I'd wait for that work to be completed - it's related to async workers too - before going forward with this feature. (In reply to Flavio Percoco from comment #9) > There hasn't been any progress upstream on this specific area. However, the > upload / download workflow is being enhanced. This may change the direction > of this specific feature. > > I'd wait for that work to be completed - it's related to async workers too - > before going forward with this feature. Flavio, any update on the work upstream? Glance now has support for re-startable image downloads. However, the client library support is yet to be added. I'm closing this bug as UPSTREAM since it's not worth keeping it here waiting for the client library to have support for it. Feel free to re-open if you feel otherwise. |