Bug 974045
Summary: | openstack-nova: 'model server went away' ERROR after creating more than 2 snapshots from different instances | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Dafna Ron <dron> | ||||
Component: | openstack-nova | Assignee: | Vladan Popovic <vpopovic> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Haim <hateya> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | unspecified | CC: | abaron, dallan, jkt, ndipanov, shedoh, yeylon | ||||
Target Milestone: | --- | ||||||
Target Release: | 4.0 | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | storage | ||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2013-10-18 14:33:06 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
I was unable to reproduce this behaviour in upstream Havana after many tries. I created one snapshot for every instance in a loop so there's no pause between the calls and the snapshots were created fine in the upstream Havana. While testing on RHOS 4.0 I ran into an issue with qemu-img-rhev https://bugzilla.redhat.com/show_bug.cgi?id=1016896 After fixing this the snapshots were created fine and there was no error at all. I got this behaviour in Grizzly though, after creating snapshots for few machines. I'll investigate it and try to apply the patches that fix this issue. |
Created attachment 760570 [details] logs Description of problem: I have 10 instances running on two different hosts. when trying to create snapshots from the 10 running instances, we constantly get 'model server went away' ERRORs. Version-Release number of selected component (if applicable): openstack-nova-compute-2013.1.1-4.el6ost.noarch openstack-nova-api-2013.1.1-4.el6ost.noarch libvirt-0.10.2-18.el6_4.5.x86_64 qemu-img-rhev-0.12.1.2-2.355.el6_4.4.x86_64 qemu-kvm-rhev-0.12.1.2-2.355.el6_4.4.x86_64 How reproducible: 100% Steps to Reproduce: 1. create an image and launch 10 instances from the image 2. once the instances are running create snapshot for each of the instances 3. Actual results: after the 3ed snapshot we seem to be getting "model server went away" ERRORs in the log, the commands 'nova <server> <name>' takes a long time to return and even the compute.log stops reacting for a few minutes. Expected results: we should be able to create several snapshots for different instances at the same time without timeouts. Additional info: 2013-06-13 11:32:27.171 2966 ERROR nova.servicegroup.drivers.db [-] model server went away 2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db Traceback (most recent call last): 2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.6/site-packages/nova/servicegroup/drivers/db.py", line 92, in _report_state 2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db service.service_ref, state_catalog) 2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.6/site-packages/nova/conductor/api.py", line 627, in service_update 2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db return self.conductor_rpcapi.service_update(context, service, values) 2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.6/site-packages/nova/conductor/rpcapi.py", line 365, in service_update 2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db return self.call(context, msg, version='1.34') 2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/proxy.py", line 80, in call 2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db return rpc.call(context, self._get_topic(topic), msg, timeout) 2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/__init__.py", line 140, in call 2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db return _get_impl().call(CONF, context, topic, msg, timeout) 2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/impl_qpid.py", line 610, in call 2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db rpc_amqp.get_connection_pool(conf, Connection)) 2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py", line 612, in call 2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db rv = list(rv) 2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py", line 554, in __iter__ 2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db self.done() 2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db File "/usr/lib64/python2.6/contextlib.py", line 23, in __exit__ 2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db self.gen.next() 2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py", line 551, in __iter__ 2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db self._iterator.next() 2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/impl_qpid.py", line 435, in iterconsume 2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db yield self.ensure(_error_callback, _consume) 2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/impl_qpid.py", line 379, in ensure 2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db error_callback(e) 2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/impl_qpid.py", line 420, in _error_callback 2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db raise rpc_common.Timeout() 2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db Timeout: Timeout while waiting on RPC response. 2013-06-13 11:32:27.171 2966 TRACE nova.servicegroup.drivers.db