Bug 1325443

Summary: Heat stack fails with failed volume create, however cinder volume actually created it successfully
Product: Red Hat OpenStack Reporter: Jeremy <jmelvin>
Component: openstack-heatAssignee: Zane Bitter <zbitter>
Status: CLOSED EOL QA Contact: Amit Ugol <augol>
Severity: high Docs Contact:
Priority: high    
Version: 6.0 (Juno)CC: aschultz, jmelvin, jpanda19, mburns, rhel-osp-director-maint, sbaker, shardy, srevivo, therve
Target Milestone: ---Keywords: ZStream
Target Release: 6.0 (Juno)Flags: zbitter: needinfo? (jmelvin)
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-07-28 16:48:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Jeremy 2016-04-08 20:29:36 UTC
Description of problem:Heat stack fails with failed volume create, however cinder volume actually created it successfully 



[root@sfisvlnwh001 ~(keystone_sfdr)]# heat resource-list 8d4850b7-712b-4126-a862-3bb561a73058 | grep -iv comp
+-----------------------+--------------------------------------+------------------------------+-----------------+----------------------+
| resource_name         | physical_resource_id                 | resource_type                | resource_status | updated_time         |
+-----------------------+--------------------------------------+------------------------------+-----------------+----------------------+
| cdhrmRootDisk         | e78886ce-0ea5-499b-b8ef-a8ae6fc49da9 | OS::Cinder::Volume           | CREATE_FAILED   | 2016-04-07T21:42:16Z |
| cdhzk0attachopt       | fc804396-93dd-4941-a30b-0c6c076df939 | OS::Cinder::VolumeAttachment | CREATE_FAILED   | 2016-04-07T21:54:54Z |
| cdhauxfoyerattachopt  | ae01037c-9920-4631-8a48-fbf23e425005 | OS::Cinder::VolumeAttachment | CREATE_FAILED   | 2016-04-07T21:58:31Z |
| cdhauxrmattachhome    | 6cfc589a-1b7b-452d-9d72-963deed17c7b | OS::Cinder::VolumeAttachment | CREATE_FAILED   | 2016-04-07T21:59:51Z |
| cdhzk2                | 0b7460d1-6751-493e-b945-a3a21e86d73c | OS::Nova::Server             | DELETE_FAILED   | 2016-04-07T22:01:38Z |
| cdhnn                 | 3f057eb2-d7d8-4cc9-8cbe-4a796a759f3c | OS::Nova::Server             | DELETE_FAILED   | 2016-04-07T22:02:41Z |
+-----------------------+--------------------------------------+------------------------------+-----------------+----------------------+

sfisvlnwh003.statefarm-dss.com  /heat/heat-api.log 
"CREATE_FAILED", "updated_time": "2016-04-07T21:42:16Z", "required_by": ["cdhrm"], "resource_status_reason": "CREATE aborted", "physical_resource_id": "e78886ce-0ea5-499b-b8ef-a8ae6fc49da9", "resource_type": "OS::Cinder::Volume"}, {"resource_name": "cdhzk2", "links": [{"href": "http://10.61.14.126:8004/v1/14d3e5a9d6e


[root@sfisvlnwh001 ~(keystone_sfdr)]# cinder show  e78886ce-0ea5-499b-b8ef-a8ae6fc49da9 |grep status
 status                |                                                                                                                                   available        

/cinder/volume.log
2016-04-07 16:08:54.694 2609 INFO cinder.volume.flows.manager.create_volume [req-f8e400b0-a4d3-4774-ac9c-c3e7d0aee3f9 2784e495ac8746069f62440e171b452e 14d3e5a9d6ea45c9a675de6b756bae26 - - -] Volume volume-e78886ce-0ea5-499b-b8ef-a8ae6fc49da9 (e78886ce-0ea5-499b-b8ef-a8ae6fc49da9): created successfully

[root@sfiappnwh003 ~]# lvs | grep e7888
  volume-e78886ce-0ea5-499b-b8ef-a8ae6fc49da9 cinder-volumes -wi-a-----  50.00g                                                    
[root@sfiappnwh003 ~]#


Version-Release number of selected component (if applicable):
openstack-heat-api-2014.2.3-9.el7ost.noarch (controler)
openstack-cinder-2014.2.3-11.el7ost.noarch (compute)

How reproducible:
unknown i've done the deploy 3 times. 1/3 created the volumes. But that attempt also failed to attach volumes. Which I was able to do manually after the deploy successfuly.

Steps to Reproduce:
1.deploy heat stack
2.note volume create resource fails
3.

Actual results:
fail 

Expected results:
stack created successfully

Additional info:

I have already increased the heat stack timeout, however the stack still fails the same way. The volume create fails well before the 1hour mark of the stack deploy. 


Sosreports coming soon.

Comment 6 Thomas Hervé 2016-04-12 14:23:03 UTC
The real issue is with the 2 servers that are in DELETE_FAILED. You can see in the logs: 

Went to status ERROR due to "Message: No valid host was found. , Code: 500"

So there is a problem in Nova, or not enough capacity. It goes to delete failed because Heat tries to clean them up, and Nova fails to return properly on that call too.

Heat doesn't behave properly, so I'll see if I can reproduce that, but the core of the problem is Nova failing to create the servers.

Comment 9 Jeremy 2016-04-12 19:30:01 UTC
Thomas,

The instances look like they failed because a different volume was not available to be attached on the compute node. So I still don't understand why these other 2 volumes were not created.

sosreport-ctl002/var/log/nova/nova-conductor.log:2016-04-07 16:03:50.488 9893 ERROR nova.scheduler.utils [req-190c2537-09f1-4b08-954c-9c1969889e25 None] [instance: 3f057eb2-d7d8-4cc9-8cbe-4a796a759f3c] Error from last host: sfiappnwh002.statefarm-dss.com (node sfiappnwh002.statefarm-dss.com): [u'Traceback (most recent call last):\n', u'  File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2095, in _do_build_and_run_instance\n    filter_properties)\n', u'  File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2226, in _build_and_run_instance\n    instance_uuid=instance.uuid, reason=six.text_type(e))\n', u"RescheduledException: Build of instance 3f057eb2-d7d8-4cc9-8cbe-4a796a759f3c was re-scheduled: iSCSI device not found at [u'/dev/disk/by-path/ip-10.61.1.12:3260-iscsi-iqn.2010-10.org.openstack:volume-0508d31d-f887-4605-9bd6-d790ba47a33f-lun-0']\n"]
sosreport-ctl002/var/log/nova/nova-conductor.log:2016-04-07 16:03:50.723 9893 WARNING nova.scheduler.driver [req-190c2537-09f1-4b08-954c-9c1969889e25 None] [instance: 3f057eb2-d7d8-4cc9-8cbe-4a796a759f3c] NoValidHost exception with message: 'No valid host was found.'



sosreport-ctl001/var/log/nova/nova-conductor.log:2016-04-07 16:02:47.334 35886 ERROR nova.scheduler.utils [req-6cead5e6-7a59-4f85-9cc2-9d36dd612368 None] [instance: 0b7460d1-6751-493e-b945-a3a21e86d73c] Error from last host: sfiappnwh009.statefarm-dss.com (node sfiappnwh009.statefarm-dss.com): [u'Traceback (most recent call last):\n', u'  File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2095, in _do_build_and_run_instance\n    filter_properties)\n', u'  File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2226, in _build_and_run_instance\n    instance_uuid=instance.uuid, reason=six.text_type(e))\n', u"RescheduledException: Build of instance 0b7460d1-6751-493e-b945-a3a21e86d73c was re-scheduled: iSCSI device not found at [u'/dev/disk/by-path/ip-10.61.1.21:3260-iscsi-iqn.2010-10.org.openstack:volume-b95fa227-5c34-4f92-b6b2-32fb6e2dbcec-lun-0']\n"]
sosreport-ctl001/var/log/nova/nova-conductor.log:2016-04-07 16:02:47.410 35886 WARNING nova.scheduler.driver [req-6cead5e6-7a59-4f85-9cc2-9d36dd612368 None] [instance: 0b7460d1-6751-493e-b945-a3a21e86d73c] NoValidHost exception with message: 'No valid host was found.'

Comment 11 Thomas Hervé 2016-04-13 08:05:46 UTC
OK, that make much more sense with the template, thanks. I was wrong indeed.

The issue I see is that Heat is trying to create the server cdhnn *before* the volume cdhnnRootDisk is available. Looking at the timestamps, the volume is available at 16:08:54 but we try to create the server at 16:02:41.

I have no idea how that can happen: the reference to the volume in the block_device_mapping section ought to create a dependency between the 2 resources, so that the server is only created once the volume is available. 

As a short term measure, I would try to explicitly add a DependsOn on rdhnnRootDisk to see if that makes a difference.

Comment 13 Jyotiranjan Panda 2018-01-11 08:35:34 UTC
I also got this error...


Resource CREATE failed: ResourceInError: resources.VNFM_Instance: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500

Comment 14 Jyotiranjan Panda 2018-01-11 08:36:13 UTC
I also got this error...


Resource CREATE failed: ResourceInError: resources.VNFM_Instance: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500