Hide Forgot
I think it would be hard to monitor a HA/Large scale environment as shown in the below scenario. Instead of dealing with this issue in a cli solution I think that we need a bigger solution for all commands when running on a scale environment. +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++ This bug was initially created as a clone of Bug #1033188 +++ Description of problem: I have a two compute setup and was trying to attach a volume to an instance running on compute #2 from compute #1 (the command was run from compute#1). There was a failure on the action in libvirt on compute #2 because of selinux and all I could see in compute #1 was that the volume was attached. I think that if a volume action is run on a different compute and fails we should have some sort of message in the compute we are running the command from. Or, we should log that the request was sent to a X compute to allow easy debugging. Version-Release number of selected component (if applicable): openstack-cinder-2013.2-2.el6ost.noarch How reproducible: 100% Steps to Reproduce: 1. in a two compute setup with glusterfs set fuse to 0 on the second compute (setsebool -P virt_use_fusefs=0). 2. boot instance on compute#2 3. create a volume 4. run the following command from compute#1 on the instance running on compute#2: nova volume-attach <instance> <volume> /dev/vdc Actual results: we would fail to attach the volume because of selinux but there is no error reported on compute#1 or indication of which of the computes the request was sent to. Expected results: we should either log the failure in compute#1 or indicate to which compute the request was sent. Additional info: This is from compute#2 - ERROR is clearly reported: 2013-11-21 14:59:00.372+0000: 28993: error : qemuMonitorJSONCheckError:357 : internal error unable to execute QEMU command '__com.redhat_drive_add': Device 'drive-virtio-disk1' could not be initialized 2013-11-21 15:00:04.009+0000: 28993: error : qemuMonitorJSONCheckError:357 : internal error unable to execute QEMU command '__com.redhat_drive_add': Device 'drive-virtio-disk1' could not be initialized 2013-11-21 15:02:47.382+0000: 28996: error : qemuMonitorJSONCheckError:357 : internal error unable to execute QEMU command '__com.redhat_drive_add': Device 'drive-virtio-disk2' could not be initialized 2013-11-21 15:04:27.390+0000: 28997: error : qemuMonitorJSONCheckError:357 : internal error unable to execute QEMU command '__com.redhat_drive_add': Device 'drive-virtio-disk2' could not be initialized 2013-11-21 15:08:04.638+0000: 28993: error : qemuMonitorJSONCheckError:357 : internal error unable to execute QEMU command '__com.redhat_drive_add': Device 'drive-virtio-disk2' could not be initialized 2013-11-21 17:08:05.350 29477 ERROR nova.openstack.common.rpc.amqp [req-90daa905-9b91-4260-9d4b-4ed1cf0ee0b8 24b77982be8049ee9cd5ad7bed913565 7eb59aa89e8944d098554ff6f5a4cf88] Exception during message handling 2013-11-21 17:08:05.350 29477 TRACE nova.openstack.common.rpc.amqp Traceback (most recent call last): 2013-11-21 17:08:05.350 29477 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py", line 461, in _process_data 2013-11-21 17:08:05.350 29477 TRACE nova.openstack.common.rpc.amqp **args) 2013-11-21 17:08:05.350 29477 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/dispatcher.py", line 172, in dispatch 2013-11-21 17:08:05.350 29477 TRACE nova.openstack.common.rpc.amqp result = getattr(proxyobj, method)(ctxt, **kwargs) 2013-11-21 17:08:05.350 29477 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/exception.py", line 90, in wrapped 2013-11-21 17:08:05.350 29477 TRACE nova.openstack.common.rpc.amqp payload) 2013-11-21 17:08:05.350 29477 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/exception.py", line 73, in wrapped 2013-11-21 17:08:05.350 29477 TRACE nova.openstack.common.rpc.amqp return f(self, context, *args, **kw) 2013-11-21 17:08:05.350 29477 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 243, in decorated_function 2013-11-21 17:08:05.350 29477 TRACE nova.openstack.common.rpc.amqp pass 2013-11-21 17:08:05.350 29477 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 229, in decorated_function 2013-11-21 17:08:05.350 29477 TRACE nova.openstack.common.rpc.amqp return function(self, context, *args, **kwargs) 2013-11-21 17:08:05.350 29477 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 271, in decorated_function 2013-11-21 17:08:05.350 29477 TRACE nova.openstack.common.rpc.amqp e, sys.exc_info()) 2013-11-21 17:08:05.350 29477 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 258, in decorated_function 2013-11-21 17:08:05.350 29477 TRACE nova.openstack.common.rpc.amqp return function(self, context, *args, **kwargs) 2013-11-21 17:08:05.350 29477 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 3638, in attach_volume 2013-11-21 17:08:05.350 29477 TRACE nova.openstack.common.rpc.amqp context, instance, mountpoint) 2013-11-21 17:08:05.350 29477 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 3633, in attach_volume 2013-11-21 17:08:05.350 29477 TRACE nova.openstack.common.rpc.amqp mountpoint, instance) 2013-11-21 17:08:05.350 29477 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 3680, in _attach_volume 2013-11-21 17:08:05.350 29477 TRACE nova.openstack.common.rpc.amqp connector) 2013-11-21 17:08:05.350 29477 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 3670, in _attach_volume 2013-11-21 17:08:05.350 29477 TRACE nova.openstack.common.rpc.amqp encryption=encryption) 2013-11-21 17:08:05.350 29477 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/virt/libvirt/driver.py", line 1105, in attach_volume 2013-11-21 17:08:05.350 29477 TRACE nova.openstack.common.rpc.amqp disk_dev) 2013-11-21 17:08:05.350 29477 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/virt/libvirt/driver.py", line 1092, in attach_volume 2013-11-21 17:08:05.350 29477 TRACE nova.openstack.common.rpc.amqp virt_dom.attachDeviceFlags(conf.to_xml(), flags) 2013-11-21 17:08:05.350 29477 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/eventlet/tpool.py", line 187, in doit 2013-11-21 17:08:05.350 29477 TRACE nova.openstack.common.rpc.amqp result = proxy_call(self._autowrap, f, *args, **kwargs) 2013-11-21 17:08:05.350 29477 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/eventlet/tpool.py", line 147, in proxy_call 2013-11-21 17:08:05.350 29477 TRACE nova.openstack.common.rpc.amqp rv = execute(f,*args,**kwargs) 2013-11-21 17:08:05.350 29477 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/eventlet/tpool.py", line 76, in tworker 2013-11-21 17:08:05.350 29477 TRACE nova.openstack.common.rpc.amqp rv = meth(*args,**kwargs) 2013-11-21 17:08:05.350 29477 TRACE nova.openstack.common.rpc.amqp File "/usr/lib64/python2.6/site-packages/libvirt.py", line 419, in attachDeviceFlags 2013-11-21 17:08:05.350 29477 TRACE nova.openstack.common.rpc.amqp if ret == -1: raise libvirtError ('virDomainAttachDeviceFlags() failed', dom=self) 2013-11-21 17:08:05.350 29477 TRACE nova.openstack.common.rpc.amqp libvirtError: internal error unable to execute QEMU command '__com.redhat_drive_add': Device 'drive-virtio-disk2' could not be initialized 2013-11-21 17:08:05.350 29477 TRACE nova.openstack.common.rpc.amqp 2013-11-21 17:08:05.352 29477 DEBUG qpid.messaging.io.raw [-] SENT[5966680]: '\x0f\x00\x00;\x00\x00\x00\x00\x00\x00\x00\x00\x02\x01\x01\x00\x00)01801d71-64fe-44c8-99f5-f99f88b9e700:1115\x0f\x00\x00\x1c\x00\x00\x00\x00\x00\x00\x00\x00\x02 \x07\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' writeable /usr/lib/python2.6/site-packages/qpid/messaging/driver.py:480 2013-11-21 17:08:05.353 29477 DEBUG qpid.messaging.io.raw [-] READ[5966680]: '\x0f\x00\x00;\x00\x00\x00\x00\x00\x00\x00\x00\x02\x02\x01\x00\x00)01801d71-64fe-44c8-99f5-f99f8 This is on compute#1 - no error, just the request: 2013-11-21 16:54:25.835 19854 DEBUG nova.openstack.common.rpc.amqp [req-b773a8d2-175a-4b8c-aead-7c5dca33dadf 24b77982be8049ee9cd5ad7bed913565 7eb59aa89e8944d098554ff6f5a4cf88] Making synchronous call on conductor ... multicall /usr/lib/p ython2.6/site-packages/nova/openstack/common/rpc/amqp.py:553 2013-11-21 16:54:25.836 19854 DEBUG nova.openstack.common.rpc.amqp [req-b773a8d2-175a-4b8c-aead-7c5dca33dadf 24b77982be8049ee9cd5ad7bed913565 7eb59aa89e8944d098554ff6f5a4cf88] MSG_ID is b2921a9060e148ec9186e478717672e8 multicall /usr/lib /python2.6/site-packages/nova/openstack/common/rpc/amqp.py:556 2013-11-21 16:54:25.836 19854 DEBUG nova.openstack.common.rpc.amqp [req-b773a8d2-175a-4b8c-aead-7c5dca33dadf 24b77982be8049ee9cd5ad7bed913565 7eb59aa89e8944d098554ff6f5a4cf88] UNIQUE_ID is 677b4b67f7a64feebb87378496a87337. _add_unique_id /usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py:341 2013-11-21 16:54:25.839 19854 DEBUG qpid.messaging.io.ops [-] SENT[3afc290]: MessageTransfer(destination='amq.topic', id=serial(0), sync=True, headers=(DeliveryProperties(routing_key='topic/nova/conductor'), MessageProperties(content_typ e='amqp/map', application_headers={'qpid.subject': 'topic/nova/conductor'})), payload='\x00\x00\x06+\x00\x00\x00\x02\x0coslo.message\x95\x06\x04{"_context_roles": ["admin"], "_msg_id": "b2921a9060e148ec9186e478717672e8", "_context_quota_ class": null, "_context_request_id": "req-b773a8d2-175a-4b8c-aead-7c5dca33dadf", "_context_service_catalog": [{"endpoints_links": [], "endpoints": [{"adminURL": "http://10.35.160.133:8776/v1/7eb59aa89e8944d098554ff6f5a4cf88", "region": " RegionOne", "publicURL": "http://10.35.160.133:8776/v1/7eb59aa89e8944d098554ff6f5a4cf88", "id": "46f73e2ea31540a5aed0daa8ebc82857", "internalURL": "http://10.35.160.133:8776/v1/7eb59aa89e8944d098554ff6f5a4cf88"}], "type": "volume", "name ": "cinder"}], "_context_tenant": "7eb59aa89e8944d098554ff6f5a4cf88", "args": {"values": {"instance_uuid": "d1fa2f73-b81d-47f3-b30e-68ed069a849e", "finish_time": "2013-11-21T14:54:25.835276", "request_id": "req-b773a8d2-175a-4b8c-aead-7c 5dca33dadf", "result": "Success", "event": "compute_terminate_instance"}}, "_unique_id": "677b4b67f7a64feebb87378496a87337", "_context_timestamp": "2013-11-21T14:54:18.767872", "_context_user_id": "24b77982be8049ee9cd5ad7bed913565", "_co ntext_project_name": "admin", "_context_read_deleted": "no", "_reply_q": "reply_82fcbb14702642ffbf4a0d5e9c745598", "_context_auth_token": "ea5a64ace812ebaa1e84934d65c1eac8", "namespace": null, "_context_instance_lock_checked": false, "_c ontext_is_admin": true, "version": "1.25", "_context_project_id": "7eb59aa89e8944d098554ff6f5a4cf88", "_context_user": "24b77982be8049ee9cd5ad7bed913565", "_context_user_name": "admin", "method": "action_event_finish", "_context_remote_a ddress": "10.35.160.133"}\x0coslo.version\x95\x00\x032.0') write_op /usr/lib/python2.6/site-packages/qpid/messaging/driver.py:686 2013-11-21 16:54:25.839 19854 DEBUG qpid.messaging [-] SENT[3d9e638]: Message(properties={'qpid.subject': 'topic/nova/conductor'}, content={'oslo.message': '{"_context_roles": ["admin"], "_msg_id": "b2921a9060e148ec9186e478717672e8", "_c ontext_quota_class": null, "_context_request_id": "req-b773a8d2-175a-4b8c-aead-7c5dca33dadf", "_context_service_catalog": [{"endpoints_links": [], "endpoints": [{"adminURL": "http://10.35.160.133:8776/v1/7eb59aa89e8944d098554ff6f5a4cf88" , "region": "RegionOne", "publicURL": "http://10.35.160.133:8776/v1/7eb59aa89e8944d098554ff6f5a4cf88", "id": "46f73e2ea31540a5aed0daa8ebc82857", "internalURL": "http://10.35.160.133:8776/v1/7eb59aa89e8944d098554ff6f5a4cf88"}], "type": "v olume", "name": "cinder"}], "_context_tenant": "7eb59aa89e8944d098554ff6f5a4cf88", "args": {"values": {"instance_uuid": "d1fa2f73-b81d-47f3-b30e-68ed069a849e", "finish_time": "2013-11-21T14:54:25.835276", "request_id": "req-b773a8d2-175a -4b8c-aead-7c5dca33dadf", "result": "Success", "event": "compute_terminate_instance"}}, "_unique_id": "677b4b67f7a64feebb87378496a87337", "_context_timestamp": "2013-11-21T14:54:18.767872", "_context_user_id": "24b77982be8049ee9cd5ad7bed 913565", "_context_project_name": "admin", "_context_read_deleted": "no", "_reply_q": "reply_82fcbb14702642ffbf4a0d5e9c745598", "_context_auth_token": "ea5a64ace812ebaa1e84934d65c1eac8", "namespace": null, "_context_instance_lock_checked ": false, "_context_is_admin": true, "version": "1.25", "_context_project_id": "7eb59aa89e8944d098554ff6f5a4cf88", "_context_user": "24b77982be8049ee9cd5ad7bed913565", "_context_user_name": "admin", "method": "action_event_finish", "_con text_remote_address": "10.35.160.133"}', 'oslo.version': '2.0'}) send /usr/lib/python2.6/site-packages/qpid/messaging/driver.py:1280 2013-11-21 16:54:25.840 19854 DEBUG qpid.messaging.io.raw [-] SENT[3afc290]: '\x0f\x00\x00:\x00\x00\x00\x00\x00\x00\x00\x00\x02\x01\x01\x00\x00(27cb1792-1439-4b48-b1ea-7910b7dd1d1f:850\x0f\x00\x00\x1c\x00\x00\x00\x00\x00\x00\x00\x00\x02\ x07\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0b\x01\x00\x1c\x00\x01\x00\x00\x00\x00\x00\x00\x04\x01\x01\x01\x01\x00\tamq.topic\x03\x02\x00f\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x19\x04\x01\x00\x02\x14topic/nova/con ductor\x00\x00\x009\x04\x03\x10\x01\x08amqp/map\x00\x00\x00(\x00\x00\x00\x01\x0cqpid.subject\x95\x00\x14topic/nova/conductor\x07\x03\x06;\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x06+\x00\x00\x00\x02\x0coslo.message\x95\x06\x04{"_context_ roles": ["admin"], "_msg_id": "b2921a9060e148ec9186e478717672e8", "_context_quota_class": null, "_context_request_id": "req-b773a8d2-175a-4b8c-aead-7c5dca33dadf", "_context_service_catalog": [{"endpoints_links": [], "endpoints": [{"admin URL": "http://10.35.160.133:8776/v1/7eb59aa89e8944d098554ff6f5a4cf88", "region": "RegionOne", "publicURL": "http://10.35.160.133:8776/v1/7eb59aa89e8944d098554ff6f5a4cf88", "id": "46f73e2ea31540a5aed0daa8ebc82857", "internalURL": "http:// 10.35.160.133:8776/v1/7eb59aa89e8944d098554ff6f5a4cf88"}], "type": "volume", "name": "cinder"}], "_context_tenant": "7eb59aa89e8944d098554ff6f5a4cf88", "args": {"values": {"instance_uuid": "d1fa2f73-b81d-47f3-b30e-68ed069a849e", "finish_ time": "2013-11-21T14:54:25.835276", "request_id": "req-b773a8d2-175a-4b8c-aead-7c5dca33dadf", "result": "Success", "event": "compute_terminate_instance"}}, "_unique_id": "677b4b67f7a64feebb87378496a87337", "_context_timestamp": "2013-11 -21T14:54:18.767872", "_context_user_id": "24b77982be8049ee9cd5ad7bed913565", "_context_project_name": "admin", "_context_read_deleted": "no", "_reply_q": "reply_82fcbb14702642ffbf4a0d5e9c745598", "_context_auth_token": "ea5a64ace812ebaa 1e84934d65c1eac8", "namespace": null, "_context_instance_lock_checked": false, "_context_is_admin": true, "version": "1.25", "_context_project_id": "7eb59aa89e8944d098554ff6f5a4cf88", "_context_user": "24b77982be8049ee9cd5ad7bed913565", "_context_user_name": "admin", "method": "action_event_finish", "_context_remote_address": "10.35.160.133"}\x0coslo.version\x95\x00\x032.0' writeable /usr/lib/python2.6/site-packages/qpid/messaging/driver.py:480 2013-11-21 16:54:25.841 19854 DEBUG qpid.messaging.io.raw [-] READ[3afc290]: '\x0f\x00\x00:\x00\x00\x00\x00\x00\x00\x00\x00\x02\x02\x01\x00\x00(27cb1792-1439-4b48-b1ea-7910b7dd1d1f:850\x0f\x00\x00\x1c\x00\x00\x00\x00\x00\x00\x00\x00\x02\ x07\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0f\x00\x00\x12\x00\x00\x00\x00\x00\x00\x00\x00\x02\n\x03\x00\x00\x00\x0f\x00\x00\x12\x00\x00\x00\x00\x00\x00\x00\x00\x02\n\x03\x00\x00\x00\x0f\x00\x00\x1a\x00\x00\x00\x00\x00\x 00\x00\x00\x02\n\x01\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\x00' readable /usr/lib/python2.6/site-packages/qpid/messaging/driver.py:416 --- Additional comment from Dafna Ron on 2013-11-21 11:33:33 EST --- since logs are too large I will upload them to a vm on qe's production and will save it. --- Additional comment from Dafna Ron on 2013-11-21 11:39:05 EST --- logs are under 10.35.161.231:/home/Bug1033188/ --- Additional comment from Xavier Queralt on 2013-11-25 05:30:24 EST --- I think your assumption is wrong and that nova is already logging in the right place. The service receiving the attachment request from the client is nova-api, which will forward the request to the compute node hosting the specified instance. There (compute2), is where the request is being executed and it makes sense to log the error in that host. The api service only takes care of validating the request (parse the json/xml, ensure that all the fields are valid, ensure that the volume and the instance exist and reserve the volume so it cannot be attached a second time). Besides, we don't want to block the API service waiting for the result and/or the error as it's not a very RESTful approach. Remember that you can always get the host where the instance is running from the information of the command "nova show <instance_id>". Take for example a cluster with multiple nova-api services spread through different hosts. When you make a request using the nova client or the API you don't know on which host the request has been initially processed until you go through all the hosts and check the api logs. Doesn't it, in this case, make more sense to log the error in the compute hosts which you can figure out easily? Another question would be to make the nova client report an error if the attachment was unsuccessful. This could be done by implementing a "--poll" option for this command that checks the attachment status periodically until it either moves to the "attached" state (successful attachment) or to the "available" state (unsuccessful attachment). If that is the case I suggest to move the bug to python-novaclient as a feature request. --- Additional comment from Dafna Ron on 2013-11-25 06:01:43 EST --- well, first, nova volume-attach has no --poll :) usage: nova volume-attach <server> <volume> <device> Attach a volume to a server. Positional arguments: <server> Name or ID of server. <volume> ID of the volume to attach. <device> Name of the device e.g. /dev/vdb. Use "auto" for autoassign (if supported) second, you are talking about technical issues in fixing this bug and I am talking about user experience in debugging + I don't think this should be corrected for one single command but should be global change in how we monitor and or report issues). I understand that this may not be simple or easy fix but I think that the user experience in debugging an issue (especially when you have a large environment) is very time consuming and complicated. Currently If I try to perform an action like attach volume and instance is running on a different compute, if there is a problem completing the action, both Horizon and the log will not report an ERROR at all. so user will attach a volume from horizon -> it reports success -> volume is not actually attached -> user goes to the log on his main server (since we can have 1000 computes but would use one main one) -> no error is reported. This is not a correct debugging flow... I would say that the least we can do if we direct the command to a different compute is to log that that command was redirected to a different compute in a very clear way. Personally, I would like us to come up with a nice solution for HA and large scale environments (like a centralized log server or perhaps a log entry in a load server) since I think this would become a bigger issue once large scale environments are configured. --- Additional comment from Xavier Queralt on 2013-11-25 07:40:05 EST --- (In reply to Dafna Ron from comment #4) > well, first, nova volume-attach has no --poll :) > > usage: nova volume-attach <server> <volume> <device> > > Attach a volume to a server. > > Positional arguments: > <server> Name or ID of server. > <volume> ID of the volume to attach. > <device> Name of the device e.g. /dev/vdb. Use "auto" for autoassign (if > supported) > Sorry, I should have been more specific. I was trying to say that it should be implemented in the nova client and, although not mentioned, in the dashboard. > > second, you are talking about technical issues in fixing this bug and I am > talking about user experience in debugging + I don't think this should be > corrected for one single command but should be global change in how we > monitor and or report issues). > I understand that this may not be simple or easy fix but I think that the > user experience in debugging an issue (especially when you have a large > environment) is very time consuming and complicated. It's easy to fix by making the API call to block until it gets an answer from the compute node. I'm just saying that I think it is not the right way to do it and that it should be done by extending the clients so they can check and report the status of the action. > > Currently If I try to perform an action like attach volume and instance is > running on a different compute, if there is a problem completing the action, > both Horizon and the log will not report an ERROR at all. > so user will attach a volume from horizon -> it reports success -> volume is > not actually attached -> user goes to the log on his main server (since we > can have 1000 computes but would use one main one) -> no error is reported. > This is not a correct debugging flow... The main idea about having multiple and redundant services talking through the messaging queue is to not have a "main" server. So your assumption wouldn't be right in a real deployment and nothing prevents you from having 1000 api nodes too. Don't take me wrong, I completely agree with you on that we should improve the error reporting for cases like the one you reported. But considering that we're talking to a REST API, I think it's the clients duty to query the status of an action before reporting its status. The error should be reported from the nova client or the dashboard, and if the problem needs further debugging the sysadmin should know how to get the host where the instance was running (either from the database or using the "nova show" command) and get the logs for that compute node. > I would say that the least we can do if we direct the command to a different > compute is to log that that command was redirected to a different compute in > a very clear way. > > Personally, I would like us to come up with a nice solution for HA and large > scale environments (like a centralized log server or perhaps a log entry in > a load server) since I think this would become a bigger issue once large > scale environments are configured. Sure, I've used services for aggregating/parsing the logs in the past and it helps a lot in big deployments. --- Additional comment from Dafna Ron on 2013-11-25 07:51:00 EST --- I think we are pretty much in agreement ;) 1. this is a bug with user experience impact 2. the solution should not be to block API with sync requests even though its a simple solution 3. we need to give some thought to possible solutions. are we working on an async task system? perhaps it could be configured to query tasks in a scale environment? --- Additional comment from Dafna Ron on 2013-11-25 08:51:52 EST --- following the chat with Xavier I am moving this bug to cli and cloning an RFE to improve the monitoring for HA/large scale environments