Nova: nova reset-state returns :code": 500, "details": "No valid host was found. Environment: (RDO : All-in-one) ------------------------------- openstack-cinder-2013.1-2.el6.noarch openstack-nova-compute-2013.1-2.el6.noarch openstack-nova-cert-2013.1-2.el6.noarch openstack-nova-network-2013.1-2.el6.noarch openstack-nova-api-2013.1-2.el6.noarch Description: ------------- I attempted to recover instances that switched into ERROR (using : 'nova reset-state') and got code": 500, "details": "No valid host was found. The original reason that the Instances switched into an ERROR state, was due to lack of physical resources on the compute node - But, before I attempted to perform nova 'nova reset-state', I deleted few running instances, to make sure I have enough resources to recover the instances in status error. Scenario: ---------- 1) Attempt to boot instances multiple instances till there are not enough resources on compute node and instances get into 'ERROR' state. 2) Delete few running instances in-order to have enough resources to run/recover instances in ERROR state. 3) Attempt to 'nova reset-state <Instance_in_error_status ID>' Results: --------- Even if there are free resources on the compute node - nova reset-state fails over :code": 500, "details": "No valid host was found. nova --debug reset-state: --------------------------- [root@puma03 ~(keystone_admin)]# nova --debug reset-state fb650165-e928-4cc0-ac2d-030ae4901d84 REQ: curl -i http://10.35.160.15:35357/v2.0/tokens -X POST -H "Content-Type: application/json" -H "Accept: application/json" -H "User-Agent: python-novaclient" -d '{"auth": {"tenantName": "admin", "passwordCredentials": {"username": "admin", "password": "6823ab1090104ba8"}}}' INFO (connectionpool:203) Starting new HTTP connection (1): 10.35.160.15 DEBUG (connectionpool:295) "POST /v2.0/tokens HTTP/1.1" 200 2730 RESP: [200] {'date': 'Mon, 06 May 2013 12:47:56 GMT', 'content-type': 'application/json', 'content-length': '2730', 'vary': 'X-Auth-Token'} RESP BODY: {"access": {"token": {"issued_at": "2013-05-06T12:47:56.381048", "expires": "2013-05-07T12:47:56Z", "id": "3706ca9b9dd04f82ba3cf7eb7965ea17", "tenant": {"description": "admin tenant", "enabled": true, "id": "17b6e92789f443d8aac2a296c4050c60", "name": "admin"}}, "serviceCatalog": [{"endpoints": [{"adminURL": "http://10.35.160.15:8774/v2/17b6e92789f443d8aac2a296c4050c60", "region": "RegionOne", "internalURL": "http://10.35.160.15:8774/v2/17b6e92789f443d8aac2a296c4050c60", "id": "d4ae188a27974a2cbe377cf41bea8597", "publicURL": "http://10.35.160.15:8774/v2/17b6e92789f443d8aac2a296c4050c60"}], "endpoints_links": [], "type": "compute", "name": "nova"}, {"endpoints": [{"adminURL": "http://10.35.160.15:8080", "region": "RegionOne", "internalURL": "http://10.35.160.15:8080", "id": "bb11d78c8af34b4b996fbfe8a5f6b343", "publicURL": "http://10.35.160.15:8080"}], "endpoints_links": [], "type": "s3", "name": "swift_s3"}, {"endpoints": [{"adminURL": "http://10.35.160.15:9292", "region": "RegionOne", "internalURL": "http://10.35.160.15:9292", "id": "3974ce6830d44def9f479928e3030a7e", "publicURL": "http://10.35.160.15:9292"}], "endpoints_links": [], "type": "image", "name": "glance"}, {"endpoints": [{"adminURL": "http://10.35.160.15:8776/v1/17b6e92789f443d8aac2a296c4050c60", "region": "RegionOne", "internalURL": "http://10.35.160.15:8776/v1/17b6e92789f443d8aac2a296c4050c60", "id": "1b8e5dae239046af846c89f5d767c2ad", "publicURL": "http://10.35.160.15:8776/v1/17b6e92789f443d8aac2a296c4050c60"}], "endpoints_links": [], "type": "volume", "name": "cinder"}, {"endpoints": [{"adminURL": "http://10.35.160.15:8773/services/Admin", "region": "RegionOne", "internalURL": "http://10.35.160.15:8773/services/Cloud", "id": "cc24e1b277084c82a3fb362ab453c3ed", "publicURL": "http://10.35.160.15:8773/services/Cloud"}], "endpoints_links": [], "type": "ec2", "name": "nova_ec2"}, {"endpoints": [{"adminURL": "http://10.35.160.15:8080/", "region": "RegionOne", "internalURL": "http://10.35.160.15:8080/v1/AUTH_17b6e92789f443d8aac2a296c4050c60", "id": "e92a3e939aeb4a77a3294ec9ead3024e", "publicURL": "http://10.35.160.15:8080/v1/AUTH_17b6e92789f443d8aac2a296c4050c60"}], "endpoints_links": [], "type": "object-store", "name": "swift"}, {"endpoints": [{"adminURL": "http://10.35.160.15:35357/v2.0", "region": "RegionOne", "internalURL": "http://10.35.160.15:5000/v2.0", "id": "8b8650a8f46a4e31bfad7744bb4ac00c", "publicURL": "http://10.35.160.15:5000/v2.0"}], "endpoints_links": [], "type": "identity", "name": "keystone"}], "user": {"username": "admin", "roles_links": [], "id": "0933909f78974a0c91042bbd2d5c3c19", "roles": [{"name": "admin"}], "name": "admin"}, "metadata": {"is_admin": 0, "roles": ["8c1b9dbe10dd4c569a4a03511acd7fa0"]}}} REQ: curl -i http://10.35.160.15:8774/v2/17b6e92789f443d8aac2a296c4050c60/servers/fb650165-e928-4cc0-ac2d-030ae4901d84 -X GET -H "X-Auth-Project-Id: admin" -H "User-Agent: python-novaclient" -H "Accept: application/json" -H "X-Auth-Token: 3706ca9b9dd04f82ba3cf7eb7965ea17" INFO (connectionpool:203) Starting new HTTP connection (1): 10.35.160.15 DEBUG (connectionpool:295) "GET /v2/17b6e92789f443d8aac2a296c4050c60/servers/fb650165-e928-4cc0-ac2d-030ae4901d84 HTTP/1.1" 200 1495 RESP: [200] {'date': 'Mon, 06 May 2013 12:47:56 GMT', 'x-compute-request-id': 'req-df2810ec-0cd7-4088-aea3-993d07b60fbb', 'content-type': 'application/json', 'content-length': '1495'} RESP BODY: {"server": {"status": "ERROR", "updated": "2013-05-06T11:33:18Z", "hostId": "", "OS-EXT-SRV-ATTR:host": null, "addresses": {}, "links": [{"href": "http://10.35.160.15:8774/v2/17b6e92789f443d8aac2a296c4050c60/servers/fb650165-e928-4cc0-ac2d-030ae4901d84", "rel": "self"}, {"href": "http://10.35.160.15:8774/17b6e92789f443d8aac2a296c4050c60/servers/fb650165-e928-4cc0-ac2d-030ae4901d84", "rel": "bookmark"}], "key_name": "oskey", "image": {"id": "65fa81cc-e139-4c45-ae61-a72db44ae4e3", "links": [{"href": "http://10.35.160.15:8774/17b6e92789f443d8aac2a296c4050c60/images/65fa81cc-e139-4c45-ae61-a72db44ae4e3", "rel": "bookmark"}]}, "OS-EXT-STS:task_state": null, "OS-EXT-STS:vm_state": "error", "OS-EXT-SRV-ATTR:instance_name": "instance-00000052", "OS-EXT-SRV-ATTR:hypervisor_hostname": null, "flavor": {"id": "2", "links": [{"href": "http://10.35.160.15:8774/17b6e92789f443d8aac2a296c4050c60/flavors/2", "rel": "bookmark"}]}, "id": "fb650165-e928-4cc0-ac2d-030ae4901d84", "security_groups": [{"name": "default"}], "OS-EXT-AZ:availability_zone": null, "user_id": "0933909f78974a0c91042bbd2d5c3c19", "name": "fedo~-fb650165-e928-4cc0-ac2d-030ae4901d84", "created": "2013-05-06T11:33:05Z", "tenant_id": "17b6e92789f443d8aac2a296c4050c60", "OS-DCF:diskConfig": "MANUAL", "accessIPv4": "", "accessIPv6": "", "fault": {"message": "NoValidHost", "code": 500, "details": "No valid host was found. \n", "created": "2013-05-06T11:33:18Z"}, "OS-EXT-STS:power_state": 0, "config_drive": "", "metadata": {}}} REQ: curl -i http://10.35.160.15:8774/v2/17b6e92789f443d8aac2a296c4050c60/servers/fb650165-e928-4cc0-ac2d-030ae4901d84/action -X POST -H "X-Auth-Project-Id: admin" -H "User-Agent: python-novaclient" -H "Content-Type: application/json" -H "Accept: application/json" -H "X-Auth-Token: 3706ca9b9dd04f82ba3cf7eb7965ea17" -d '{"os-resetState": {"state": "error"}}' INFO (connectionpool:203) Starting new HTTP connection (1): 10.35.160.15 DEBUG (connectionpool:295) "POST /v2/17b6e92789f443d8aac2a296c4050c60/servers/fb650165-e928-4cc0-ac2d-030ae4901d84/action HTTP/1.1" 202 0 RESP: [202] {'date': 'Mon, 06 May 2013 12:47:56 GMT', 'content-length': '0', 'content-type': 'text/html; charset=UTF-8'} RESP BODY:
Created attachment 744138 [details] compute.log
I would like add here two more 'nova reset-state' scenarios, that should be investigated:: 1) Attempt 'nova reset-state' on instance that in status 'Active' --> will switch the instance to status 'ERROR'. 2) After attempt of 'nova migrate' (when there only one compute node) --> the instance will switch into ERROR state --> then attempt to recover using 'nova reset-state' --> the instance remains in status ERROR.
I don't expect this to work. If it failed because it couldn't be scheduled, there's no actual instance anywhere to recover. The only think you should be able to do with it is delete it. We should not be returning 500 though.
(In reply to comment #3) > I don't expect this to work. If it failed because it couldn't be scheduled, > there's no actual instance anywhere to recover. The only think you should > be able to do with it is delete it. > > We should not be returning 500 though. What about the scenarios in comment #2 ?
Ping - what's the status here?
Omri, can you please open a bug in launchpad with all the relevant details as this is an upstream issue? Also: once you open the issue, please link the bug here and close this RDO bug with resolution UPSTREAM Thanks.
> 1) Attempt 'nova reset-state' on instance that in status 'Active' --> will > switch the instance to status 'ERROR'. > > 2) After attempt of 'nova migrate' (when there only one compute node) --> the > instance will switch into ERROR state --> then attempt to recover using 'nova > reset-state' --> the instance remains in status ERROR. This is the documented behavior of "nova reset-state". That command explicitly puts instance into the ERROR state, unless you provide the "--active" flag.
closed, no need for needinfo.