Description of problem:
I noticed that when there is a problem with instances they are stuck in build forever with no timeout.
this has happened on my setup for a cinder issue (after restart of cinder the problem was fixed and might be related to build update) but I also reproduced it on other scenario in which I launched 10 instances and rebooted the host.
some of the instances were stuck in build state forever.
perhaps it will be a good idea to create a time-out for status build and move instances to status Error
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. launch 10 instances
2. reboot the host
some vms move to error, some are stuck in build.
we should have a time-out and move instances to Error state
Additional info: logs
Created attachment 764291 [details]
In most cases - if something is wrong with the instance - it should go to ERROR - however in cases like rebooting the host - it can happen that they stay in BUILD.
Currently - the sync_power_states periodic task does not consider BUILD instances, and having a timeout (as opposed to just checking state as the periodic task does now) so this will require an upstream blueprint.
Moving to 5.0.
Vladan is this something that you have actively looked at or do I need to move to 6.0?
Sorry for the late response Stephen.
Didn't manage to look at this still, so it should be moved to 6.0.
Please report an upstream bug for this. The upstream bug will need to contain a lot more detail about how to reproduce this scenario, though.