Bug 974051
Summary: | openstack-nova: instances state moves to 'shutoff' when we have time-outs on create snapshots for several instances | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Dafna Ron <dron> | ||||
Component: | openstack-nova | Assignee: | Vladan Popovic <vpopovic> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Ami Jeain <ajeain> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | unspecified | CC: | dallan, dron, jkt, ndipanov, yeylon | ||||
Target Milestone: | --- | Flags: | vpopovic:
needinfo+
|
||||
Target Release: | 4.0 | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2013-10-23 09:59:48 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
I struggled with this issue a lot, trying to reproduce it in 3.0 and didn't always get the "model server went away" error after lots of testing with small instances (64MB mem / 1GB storage). I got it a while ago but can't get the same behaviour again. Could you please tell me more on how could I actually reproduce this in Grizzly. Which flavor did you use? Whics image did you use to get this error? Everything else would be more than welcome. In 4.0 I never got this issue after numerous tests. I have only worked with Havana and not grizzly... flavour was 1 (tiny) these images no longer exist not sure what else to give you... two computes -> 10 instances on each -> try to create snapshots for each instance I'm sorry, but I cannot reproduce this on my local setup after countless times. I had lots of issues with getting myself in a situation where I can test this, and probably require more resources than I have on my laptop, but after managing that, I couldn't get the instances to go in shutoff state. Could you please provide me with access to the machines where I can reproduce and debug this. After speaking to Ami today I realised that I did open this bug on Grizzly. I tested this with Vladan in Havana and it no longer reproduced in Havana so I think we can close this. I agree, the described behaviour is unreachable by doing snapshots of 10 instances running on 2 hosts, so I guess we can close this bug. However, we managed to reproduce this behaviour with Dafna in Havana by snapshoting multiple instances and then opening a VNC console for one of them. All of the instances that were on that node got into Shutoff state. Dafna please correct me if I'm wrong. More investigation is needed on this issue. When we manage to reproduce it in 100% of the cases I suggest opening another bug and describe the steps to reproduce in detail. For now the traceback shows only this: _volume_snapshot_create /usr/lib/python2.6/site-packages/nova/virt/libvirt/driver.py:1594 % image_id, instance=instance) File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 309, in decorated_function *args, **kwargs) File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 2293, in snapshot_instance task_states.IMAGE_SNAPSHOT) File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 2324, in _snapshot_instance update_task_state) File "/usr/lib/python2.6/site-packages/nova/virt/libvirt/driver.py", line 1374, in snapshot virt_dom.managedSave(0) File "/usr/lib/python2.6/site-packages/eventlet/tpool.py", line 187, in doit result = proxy_call(self._autowrap, f, *args, **kwargs) File "/usr/lib/python2.6/site-packages/eventlet/tpool.py", line 147, in proxy_call rv = execute(f,*args,**kwargs) File "/usr/lib/python2.6/site-packages/eventlet/tpool.py", line 76, in tworker rv = meth(*args,**kwargs) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 863, in managedSave if ret == -1: raise libvirtError ('virDomainManagedSave() failed', dom=self) libvirtError: internal error received hangup / error event on socket I'm closing this bug now because it's not reproducible in 4.0 |
Created attachment 760591 [details] logs Description of problem: I launched 10 instances and tried creating snapshots for all of them once the instances were running. we get errors from nova: 2013-06-13 11:32:27.171 2966 ERROR nova.servicegroup.drivers.db [-] model server went away 2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db Timeout: Timeout while waiting on RPC response. and than some of the instances change state to shutoff. Version-Release number of selected component (if applicable): openstack-nova-compute-2013.1.1-4.el6ost.noarch openstack-nova-api-2013.1.1-4.el6ost.noarch libvirt-0.10.2-18.el6_4.5.x86_64 qemu-img-rhev-0.12.1.2-2.355.el6_4.4.x86_64 qemu-kvm-rhev-0.12.1.2-2.355.el6_4.4.x86_64 How reproducible: 100% Steps to Reproduce: 1. create an image and launch 10 instances from the image on two different hosts 2. create snapshots for each of the instances 3. Actual results: we get time-out errors in compute log and some of the instances move to shutoff and need a soft reboot. Expected results: instances should not move to shutoff Additional info: [root@opens-vdsb ~(keystone_admin)]# nova list +--------------------------------------+-------------------------------------------+---------+---------------------------+ | ID | Name | Status | Networks | +--------------------------------------+-------------------------------------------+---------+---------------------------+ | 1b1eb170-3bdb-496e-bcaa-2a3cd5078e06 | HAHA-1b1eb170-3bdb-496e-bcaa-2a3cd5078e06 | ACTIVE | novanetwork=192.168.32.19 | | 2e19d87f-44ac-4525-bccd-41f2d0350a92 | HAHA-2e19d87f-44ac-4525-bccd-41f2d0350a92 | SHUTOFF | novanetwork=192.168.32.17 | | 7f4df8e2-77fc-495e-acea-467b039e1ccd | HAHA-7f4df8e2-77fc-495e-acea-467b039e1ccd | SHUTOFF | novanetwork=192.168.32.3 | | 80fbf669-e8c5-422a-b2ff-90c34e09d8ec | HAHA-80fbf669-e8c5-422a-b2ff-90c34e09d8ec | ACTIVE | novanetwork=192.168.32.4 | | 9f0cb489-56d3-4258-93e6-0c927cf70352 | HAHA-9f0cb489-56d3-4258-93e6-0c927cf70352 | ACTIVE | novanetwork=192.168.32.2 | | a512a696-13f3-4e48-8154-d8de3ad35af4 | HAHA-a512a696-13f3-4e48-8154-d8de3ad35af4 | SHUTOFF | novanetwork=192.168.32.16 | | d965ae10-6175-4c57-93f9-47b57c7a2907 | HAHA-d965ae10-6175-4c57-93f9-47b57c7a2907 | ACTIVE | novanetwork=192.168.32.14 | | da936d2d-0273-4f47-b6cb-24ea24577317 | HAHA-da936d2d-0273-4f47-b6cb-24ea24577317 | ACTIVE | novanetwork=192.168.32.15 | | fa7ea571-a0c1-4cef-9788-ef1e0cf1bf8d | HAHA-fa7ea571-a0c1-4cef-9788-ef1e0cf1bf8d | SHUTOFF | novanetwork=192.168.32.18 | | 12893599-1418-4cf0-b74d-5f2200418a74 | haha10 | ACTIVE | novanetwork=192.168.32.5 | +--------------------------------------+-------------------------------------------+---------+---------------------------+