RDO tickets are now tracked in Jira https://issues.redhat.com/projects/RDO/issues/
Bug 960024 - Nova: nova reset-state returns :code": 500, "details": "No valid host was found.
Summary: Nova: nova reset-state returns :code": 500, "details": "No valid host was found.
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: RDO
Classification: Community
Component: openstack-nova
Version: unspecified
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Omri Hochman
QA Contact: Ami Jeain
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-05-06 12:56 UTC by Omri Hochman
Modified: 2018-05-02 10:52 UTC (History)
11 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2015-03-18 14:42:22 UTC
Embargoed:


Attachments (Terms of Use)
compute.log (1.78 MB, text/x-log)
2013-05-06 13:10 UTC, Omri Hochman
no flags Details

Description Omri Hochman 2013-05-06 12:56:59 UTC
Nova: nova reset-state returns :code": 500, "details": "No valid host was found.


Environment: (RDO :  All-in-one) 
-------------------------------
openstack-cinder-2013.1-2.el6.noarch
openstack-nova-compute-2013.1-2.el6.noarch
openstack-nova-cert-2013.1-2.el6.noarch
openstack-nova-network-2013.1-2.el6.noarch
openstack-nova-api-2013.1-2.el6.noarch


Description: 
-------------
I attempted to recover instances that switched into ERROR (using : 'nova reset-state') and got code": 500, "details": "No valid host was found.

The original reason that the Instances switched into an ERROR state, was due to lack of physical resources on the compute node - But, before I attempted to perform nova 'nova reset-state', I deleted few running instances, to make sure I have enough resources to recover the instances in status error. 

Scenario: 
----------
1) Attempt to boot instances multiple instances till there are not enough resources on compute node and instances get into 'ERROR' state.
2) Delete few running instances in-order to have enough resources to run/recover instances in ERROR state. 
3) Attempt to 'nova reset-state  <Instance_in_error_status ID>' 

Results: 
---------
Even if there are free resources on the compute node -  nova reset-state fails over :code": 500, "details": "No valid host was found.


nova --debug reset-state:
---------------------------
[root@puma03 ~(keystone_admin)]# nova --debug reset-state fb650165-e928-4cc0-ac2d-030ae4901d84

REQ: curl -i http://10.35.160.15:35357/v2.0/tokens -X POST -H "Content-Type: application/json" -H "Accept: application/json" -H "User-Agent: python-novaclient" -d '{"auth": {"tenantName": "admin", "passwordCredentials": {"username": "admin", "password": "6823ab1090104ba8"}}}'

INFO (connectionpool:203) Starting new HTTP connection (1): 10.35.160.15
DEBUG (connectionpool:295) "POST /v2.0/tokens HTTP/1.1" 200 2730
RESP: [200] {'date': 'Mon, 06 May 2013 12:47:56 GMT', 'content-type': 'application/json', 'content-length': '2730', 'vary': 'X-Auth-Token'}
RESP BODY: {"access": {"token": {"issued_at": "2013-05-06T12:47:56.381048", "expires": "2013-05-07T12:47:56Z", "id": "3706ca9b9dd04f82ba3cf7eb7965ea17", "tenant": {"description": "admin tenant", "enabled": true, "id": "17b6e92789f443d8aac2a296c4050c60", "name": "admin"}}, "serviceCatalog": [{"endpoints": [{"adminURL": "http://10.35.160.15:8774/v2/17b6e92789f443d8aac2a296c4050c60", "region": "RegionOne", "internalURL": "http://10.35.160.15:8774/v2/17b6e92789f443d8aac2a296c4050c60", "id": "d4ae188a27974a2cbe377cf41bea8597", "publicURL": "http://10.35.160.15:8774/v2/17b6e92789f443d8aac2a296c4050c60"}], "endpoints_links": [], "type": "compute", "name": "nova"}, {"endpoints": [{"adminURL": "http://10.35.160.15:8080", "region": "RegionOne", "internalURL": "http://10.35.160.15:8080", "id": "bb11d78c8af34b4b996fbfe8a5f6b343", "publicURL": "http://10.35.160.15:8080"}], "endpoints_links": [], "type": "s3", "name": "swift_s3"}, {"endpoints": [{"adminURL": "http://10.35.160.15:9292", "region": "RegionOne", "internalURL": "http://10.35.160.15:9292", "id": "3974ce6830d44def9f479928e3030a7e", "publicURL": "http://10.35.160.15:9292"}], "endpoints_links": [], "type": "image", "name": "glance"}, {"endpoints": [{"adminURL": "http://10.35.160.15:8776/v1/17b6e92789f443d8aac2a296c4050c60", "region": "RegionOne", "internalURL": "http://10.35.160.15:8776/v1/17b6e92789f443d8aac2a296c4050c60", "id": "1b8e5dae239046af846c89f5d767c2ad", "publicURL": "http://10.35.160.15:8776/v1/17b6e92789f443d8aac2a296c4050c60"}], "endpoints_links": [], "type": "volume", "name": "cinder"}, {"endpoints": [{"adminURL": "http://10.35.160.15:8773/services/Admin", "region": "RegionOne", "internalURL": "http://10.35.160.15:8773/services/Cloud", "id": "cc24e1b277084c82a3fb362ab453c3ed", "publicURL": "http://10.35.160.15:8773/services/Cloud"}], "endpoints_links": [], "type": "ec2", "name": "nova_ec2"}, {"endpoints": [{"adminURL": "http://10.35.160.15:8080/", "region": "RegionOne", "internalURL": "http://10.35.160.15:8080/v1/AUTH_17b6e92789f443d8aac2a296c4050c60", "id": "e92a3e939aeb4a77a3294ec9ead3024e", "publicURL": "http://10.35.160.15:8080/v1/AUTH_17b6e92789f443d8aac2a296c4050c60"}], "endpoints_links": [], "type": "object-store", "name": "swift"}, {"endpoints": [{"adminURL": "http://10.35.160.15:35357/v2.0", "region": "RegionOne", "internalURL": "http://10.35.160.15:5000/v2.0", "id": "8b8650a8f46a4e31bfad7744bb4ac00c", "publicURL": "http://10.35.160.15:5000/v2.0"}], "endpoints_links": [], "type": "identity", "name": "keystone"}], "user": {"username": "admin", "roles_links": [], "id": "0933909f78974a0c91042bbd2d5c3c19", "roles": [{"name": "admin"}], "name": "admin"}, "metadata": {"is_admin": 0, "roles": ["8c1b9dbe10dd4c569a4a03511acd7fa0"]}}}


REQ: curl -i http://10.35.160.15:8774/v2/17b6e92789f443d8aac2a296c4050c60/servers/fb650165-e928-4cc0-ac2d-030ae4901d84 -X GET -H "X-Auth-Project-Id: admin" -H "User-Agent: python-novaclient" -H "Accept: application/json" -H "X-Auth-Token: 3706ca9b9dd04f82ba3cf7eb7965ea17"

INFO (connectionpool:203) Starting new HTTP connection (1): 10.35.160.15
DEBUG (connectionpool:295) "GET /v2/17b6e92789f443d8aac2a296c4050c60/servers/fb650165-e928-4cc0-ac2d-030ae4901d84 HTTP/1.1" 200 1495
RESP: [200] {'date': 'Mon, 06 May 2013 12:47:56 GMT', 'x-compute-request-id': 'req-df2810ec-0cd7-4088-aea3-993d07b60fbb', 'content-type': 'application/json', 'content-length': '1495'}
RESP BODY: {"server": {"status": "ERROR", "updated": "2013-05-06T11:33:18Z", "hostId": "", "OS-EXT-SRV-ATTR:host": null, "addresses": {}, "links": [{"href": "http://10.35.160.15:8774/v2/17b6e92789f443d8aac2a296c4050c60/servers/fb650165-e928-4cc0-ac2d-030ae4901d84", "rel": "self"}, {"href": "http://10.35.160.15:8774/17b6e92789f443d8aac2a296c4050c60/servers/fb650165-e928-4cc0-ac2d-030ae4901d84", "rel": "bookmark"}], "key_name": "oskey", "image": {"id": "65fa81cc-e139-4c45-ae61-a72db44ae4e3", "links": [{"href": "http://10.35.160.15:8774/17b6e92789f443d8aac2a296c4050c60/images/65fa81cc-e139-4c45-ae61-a72db44ae4e3", "rel": "bookmark"}]}, "OS-EXT-STS:task_state": null, "OS-EXT-STS:vm_state": "error", "OS-EXT-SRV-ATTR:instance_name": "instance-00000052", "OS-EXT-SRV-ATTR:hypervisor_hostname": null, "flavor": {"id": "2", "links": [{"href": "http://10.35.160.15:8774/17b6e92789f443d8aac2a296c4050c60/flavors/2", "rel": "bookmark"}]}, "id": "fb650165-e928-4cc0-ac2d-030ae4901d84", "security_groups": [{"name": "default"}], "OS-EXT-AZ:availability_zone": null, "user_id": "0933909f78974a0c91042bbd2d5c3c19", "name": "fedo~-fb650165-e928-4cc0-ac2d-030ae4901d84", "created": "2013-05-06T11:33:05Z", "tenant_id": "17b6e92789f443d8aac2a296c4050c60", "OS-DCF:diskConfig": "MANUAL", "accessIPv4": "", "accessIPv6": "", "fault": {"message": "NoValidHost", "code": 500, "details": "No valid host was found. \n", "created": "2013-05-06T11:33:18Z"}, "OS-EXT-STS:power_state": 0, "config_drive": "", "metadata": {}}}


REQ: curl -i http://10.35.160.15:8774/v2/17b6e92789f443d8aac2a296c4050c60/servers/fb650165-e928-4cc0-ac2d-030ae4901d84/action -X POST -H "X-Auth-Project-Id: admin" -H "User-Agent: python-novaclient" -H "Content-Type: application/json" -H "Accept: application/json" -H "X-Auth-Token: 3706ca9b9dd04f82ba3cf7eb7965ea17" -d '{"os-resetState": {"state": "error"}}'

INFO (connectionpool:203) Starting new HTTP connection (1): 10.35.160.15
DEBUG (connectionpool:295) "POST /v2/17b6e92789f443d8aac2a296c4050c60/servers/fb650165-e928-4cc0-ac2d-030ae4901d84/action HTTP/1.1" 202 0
RESP: [202] {'date': 'Mon, 06 May 2013 12:47:56 GMT', 'content-length': '0', 'content-type': 'text/html; charset=UTF-8'}
RESP BODY:

Comment 1 Omri Hochman 2013-05-06 13:10:02 UTC
Created attachment 744138 [details]
compute.log

Comment 2 Omri Hochman 2013-05-06 13:11:24 UTC
I would like add here two more 'nova reset-state' scenarios, that should be investigated::

1) Attempt 'nova reset-state' on instance that in status 'Active' --> will switch the instance to status 'ERROR'.

2) After attempt of 'nova migrate' (when there only one compute node) --> the instance will switch into ERROR state --> then attempt to recover using 'nova reset-state' --> the instance remains in status ERROR.

Comment 3 Russell Bryant 2013-05-06 15:09:34 UTC
I don't expect this to work.  If it failed because it couldn't be scheduled, there's no actual instance anywhere to recover.  The only think you should be able to do with it is delete it.

We should not be returning 500 though.

Comment 4 Omri Hochman 2013-05-07 14:21:56 UTC
(In reply to comment #3)
> I don't expect this to work.  If it failed because it couldn't be scheduled,
> there's no actual instance anywhere to recover.  The only think you should
> be able to do with it is delete it.
> 
> We should not be returning 500 though.

What about the scenarios in comment #2 ?

Comment 5 Kashyap Chamarthy 2013-12-11 17:41:15 UTC
Ping - what's the status here?

Comment 6 Kashyap Chamarthy 2014-01-15 14:48:04 UTC
Omri, can you please open a bug in launchpad with all the relevant details as this is an upstream issue?

Also: once you open the issue, please link the bug here and close this RDO bug with resolution UPSTREAM

Thanks.

Comment 7 Lars Kellogg-Stedman 2015-03-18 14:42:22 UTC
> 1) Attempt 'nova reset-state' on instance that in status 'Active' --> will
> switch the instance to status 'ERROR'.
> 
> 2) After attempt of 'nova migrate' (when there only one compute node) --> the
> instance will switch into ERROR state --> then attempt to recover using 'nova
> reset-state' --> the instance remains in status ERROR.

This is the documented behavior of "nova reset-state".  That command explicitly puts instance into the ERROR state, unless you provide the "--active" flag.

Comment 8 Amit Ugol 2018-05-02 10:52:05 UTC
closed, no need for needinfo.


Note You need to log in before you can comment on or make changes to this bug.