Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 927381

Summary: [RFE][glance]: Graceful recovery from Image service outages
Product: Red Hat OpenStack Reporter: Lon Hohberger <lhh>
Component: openstack-glanceAssignee: Flavio Percoco <fpercoco>
Status: CLOSED UPSTREAM QA Contact: tkammer
Severity: medium Docs Contact:
Priority: low    
Version: 2.1CC: afazekas, apevec, eglynn, fpercoco, scohen, yeylon
Target Milestone: ---Keywords: FutureFeature, Triaged
Target Release: 7.0 (Kilo)   
Hardware: Unspecified   
OS: Unspecified   
URL: https://blueprints.launchpad.net/glance/+spec/restartable-image-download
Whiteboard: upstream_milestone_juno-rc1 upstream_status_implemented upstream_definition_discussion
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-02-10 23:15:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 919497, 1102481    
Bug Blocks:    

Description Lon Hohberger 2013-03-25 19:38:28 UTC
Description of problem:

Glance is the imaging service for Red Hat OpenStack.  Sometimes, it may take a very long time to download an image.  When a new package is installed (thus, causing a restart per Fedora/RHEL packaging guidelines) or the administrator wishes to restart glance for some reason, long-running downloads of images may be lost and need to be restarted, impeding users' workflow.

This request is to support a graceful restart of the glance services in order to reduce or eliminate the need to restart these long-running downloads.

Comment 3 John Bresnahan 2013-03-25 21:08:54 UTC
One of the problems when discussing a manual restart was that newly installed python modules could be loaded by the old service.  In that case things could break and behavior could be ill defined.

This would also be a problem if we do a partial shutdown of the old services TCP listener and then immediately start the new process while the old is cleanly shutting down, as mentioned here: https://bugzilla.redhat.com/show_bug.cgi?id=919497#c10

I think the best option would be to add functionality to glance such that transfers are check-pointed and can be restarted.

Comment 4 Alan Pevec 2013-03-26 18:53:10 UTC
(In reply to comment #3)
> I think the best option would be to add functionality to glance such that
> transfers are check-pointed and can be restarted.

Agreed, this would also cover any other temporary Image service outages, not just graceful restarts on upgrades.

Comment 5 Alan Pevec 2013-03-26 18:56:53 UTC
(In reply to comment #4)
> any other temporary Image service outages

For example, HA failover?

Comment 6 Attila Fazekas 2013-05-08 21:06:48 UTC
(In reply to comment #3)
> One of the problems when discussing a manual restart was that newly
> installed python modules could be loaded by the old service.  In that case
> things could break and behavior could be ill defined.
> 
> This would also be a problem if we do a partial shutdown of the old services
> TCP listener and then immediately start the new process while the old is
> cleanly shutting down, as mentioned here:
> https://bugzilla.redhat.com/show_bug.cgi?id=919497#c10
> 
> I think the best option would be to add functionality to glance such that
> transfers are check-pointed and can be restarted.

It sounds like very unlikely situation.
The old process expected to have all modules imported and will not read the new modules unless:
- The code contains a conditional import which was not reached before
- The code contains an explicit module reload

Even if this happens, it only can cause issues when the new module is incompatible with the previews one.

It is very likely the issue has the same impact as not having real graceful reload, for example it returns with an 500 error.

Do we really have a real chance for a more serious issue, for example:
- The old process causes issues to the new one ? 
- We leave behind a live resource consuming old process on every upgrade ?

BTW: The glance-control command provides the reload action.

Comment 7 Ayal Baron 2013-05-14 09:08:19 UTC
(In reply to comment #6)
> (In reply to comment #3)
> > One of the problems when discussing a manual restart was that newly
> > installed python modules could be loaded by the old service.  In that case
> > things could break and behavior could be ill defined.
> > 
> > This would also be a problem if we do a partial shutdown of the old services
> > TCP listener and then immediately start the new process while the old is
> > cleanly shutting down, as mentioned here:
> > https://bugzilla.redhat.com/show_bug.cgi?id=919497#c10
> > 
> > I think the best option would be to add functionality to glance such that
> > transfers are check-pointed and can be restarted.
> 
> It sounds like very unlikely situation.
> The old process expected to have all modules imported and will not read the
> new modules unless:
> - The code contains a conditional import which was not reached before
> - The code contains an explicit module reload
> 
> Even if this happens, it only can cause issues when the new module is
> incompatible with the previews one.
> 
> It is very likely the issue has the same impact as not having real graceful
> reload, for example it returns with an 500 error.
> 
> Do we really have a real chance for a more serious issue, for example:
> - The old process causes issues to the new one ? 
> - We leave behind a live resource consuming old process on every upgrade ?
> 
> BTW: The glance-control command provides the reload action.

Doesn't really matter since providing continuation of stopped tasks is a more robust solution which would cover a lot more cases.
John, is there a blueprint around this?

Comment 8 John Bresnahan 2013-05-21 06:17:53 UTC
(In reply to Ayal Baron from comment #7)
> (In reply to comment #6)
> > (In reply to comment #3)
> > > One of the problems when discussing a manual restart was that newly
> > > installed python modules could be loaded by the old service.  In that case
> > > things could break and behavior could be ill defined.
> > > 
> > > This would also be a problem if we do a partial shutdown of the old services
> > > TCP listener and then immediately start the new process while the old is
> > > cleanly shutting down, as mentioned here:
> > > https://bugzilla.redhat.com/show_bug.cgi?id=919497#c10
> > > 
> > > I think the best option would be to add functionality to glance such that
> > > transfers are check-pointed and can be restarted.
> > 
> > It sounds like very unlikely situation.
> > The old process expected to have all modules imported and will not read the
> > new modules unless:
> > - The code contains a conditional import which was not reached before
> > - The code contains an explicit module reload
> > 
> > Even if this happens, it only can cause issues when the new module is
> > incompatible with the previews one.
> > 
> > It is very likely the issue has the same impact as not having real graceful
> > reload, for example it returns with an 500 error.
> > 
> > Do we really have a real chance for a more serious issue, for example:
> > - The old process causes issues to the new one ? 
> > - We leave behind a live resource consuming old process on every upgrade ?
> > 
> > BTW: The glance-control command provides the reload action.
> 
> Doesn't really matter since providing continuation of stopped tasks is a
> more robust solution which would cover a lot more cases.
> John, is there a blueprint around this?

There are two.  The most clearly applicable is: 
https://blueprints.launchpad.net/glance/+spec/restartable-image-download

However, this one applies as well:
https://blueprints.launchpad.net/glance/+spec/image-transfer-service

It should also be noted that in the current implementation of glance you can send a SIGHUP to the parent api processes which will kill of the service but allow any child processes (the data movers) to continue working.

Comment 9 Flavio Percoco 2013-12-24 15:34:35 UTC
There hasn't been any progress upstream on this specific area. However, the upload / download workflow is being enhanced. This may change the direction of this specific feature.

I'd wait for that work to be completed - it's related to async workers too - before going forward with this feature.

Comment 10 Ayal Baron 2014-01-21 10:22:09 UTC
(In reply to Flavio Percoco from comment #9)
> There hasn't been any progress upstream on this specific area. However, the
> upload / download workflow is being enhanced. This may change the direction
> of this specific feature.
> 
> I'd wait for that work to be completed - it's related to async workers too -
> before going forward with this feature.

Flavio, any update on the work upstream?

Comment 14 Flavio Percoco 2015-02-10 23:15:47 UTC
Glance now has support for re-startable image downloads. However, the client library support is yet to be added. I'm closing this bug as UPSTREAM since it's not worth keeping it here waiting for the client library to have support for it.

Feel free to re-open if you feel otherwise.