Red Hat Bugzilla – Bug 1024003
nova: boot of instance is stuck in scheduling forever if we cannot copy image to swift backend during boot of instance
Last modified: 2016-04-26 15:24:53 EDT
Created attachment 816824 [details]
Description of problem:
I am working with swift backend.
My swift data server do not have enough disk space but when I create an image with --location the image is created locally and will only be copied to swift when we boot an instance.
and when I boot an instance, it just gets stuck in scheduling forever.
Version-Release number of selected component (if applicable):
[root@opens-vdsb ~(keystone_admin)]# rpm -qa |grep swift
[root@opens-vdsb ~(keystone_admin)]# rpm -qa |grep glance
[root@opens-vdsb ~(keystone_admin)]# rpm -qa |grep nova
Steps to Reproduce:
1. install openstack with swift and make sure your data server do not have a lot of space
2. create an image using --location
3. boot an instance
instance is stuck in scheduling forever
1. if we have a problem copying the image to swift we should fail the boot of instance
2. instance should get a timeout if stuck in scheduling.
Additional info: logs
This seems like a glance issue to me, however bug report is incomplete to the point that I cannot tell what is going on here.
No exact command line is provided, and it is very difficult to deduct it from the attached httpd logs, so I can only guess what the reporter actually attempted, and only partial logs are provided: Nova scheduler logs are not helpful on their own, as the boot process includes several services (api, scheduler, compute, conductor) and in order to get the whole picture - we need complete logs.
Here are a few possible scenarios based on the above:
* 'glance image-create' call with --location returned without an error even though it should have errored out.
* Image creation does not report correct issues back to Horizon
* Nova does not handle certain glance errors properly and does not error out the instance (no way to know since we don't have compute or API logs).
* Horizon fails to report some nova API errors back to the user properly.
To name only a few. It would be very helpful if the reporter could provide full logs and also a more detailed description of how the Horizon reacted both on attempting to create the image and boot the instance.
I've tried to reproduce it with the help of @Dafna and we were not able to do it.
There are some suspicions that this might be related to the lack of space on the hypervisor node. If that is the case - I'd say that this is not really a bug since we do not guarantee any kind of graceful failures when nova runs out of basic resources.
I will leave a needinfo on @Dafna until this can be confirmed.
we have a problem that if I create the image with --location and not with --copy-from the image is created locally and we do not try to upload to the store at all (which I think is a bug but not in nova).
the original issue was trying to boot an instance when the image cannot be uploaded to swift because of space issues on swift data servers - not local host.
After a conversation with Dafna it seems that the crux of the issue is that there are cases when booting an instance can fail and leave it in the SCHEDULING state. This is indeed an issue we need to address (if it is in fact real).
Based on the conversation it seems we need to change the title of this bug as it does not seem to be related to Swift at all.
Since it is not clear to me how this issue can be reproduced - I will leave a needinfo flag on this, and once we get a reproducer - adjust the title to reflect what the bug is about.
I tried to reproduce in latest build but the instances no longer get stuck in scheduling.
closing this bug - if we encounter this issue again we will reopen.