| Summary: | Heat stack fails with failed volume create, however cinder volume actually created it successfully | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Jeremy <jmelvin> |
| Component: | openstack-heat | Assignee: | Zane Bitter <zbitter> |
| Status: | CLOSED EOL | QA Contact: | Amit Ugol <augol> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 6.0 (Juno) | CC: | aschultz, jmelvin, jpanda19, mburns, rhel-osp-director-maint, sbaker, shardy, srevivo, therve |
| Target Milestone: | --- | Keywords: | ZStream |
| Target Release: | 6.0 (Juno) | Flags: | zbitter:
needinfo?
(jmelvin) |
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-07-28 16:48:47 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
The real issue is with the 2 servers that are in DELETE_FAILED. You can see in the logs: Went to status ERROR due to "Message: No valid host was found. , Code: 500" So there is a problem in Nova, or not enough capacity. It goes to delete failed because Heat tries to clean them up, and Nova fails to return properly on that call too. Heat doesn't behave properly, so I'll see if I can reproduce that, but the core of the problem is Nova failing to create the servers. Thomas, The instances look like they failed because a different volume was not available to be attached on the compute node. So I still don't understand why these other 2 volumes were not created. sosreport-ctl002/var/log/nova/nova-conductor.log:2016-04-07 16:03:50.488 9893 ERROR nova.scheduler.utils [req-190c2537-09f1-4b08-954c-9c1969889e25 None] [instance: 3f057eb2-d7d8-4cc9-8cbe-4a796a759f3c] Error from last host: sfiappnwh002.statefarm-dss.com (node sfiappnwh002.statefarm-dss.com): [u'Traceback (most recent call last):\n', u' File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2095, in _do_build_and_run_instance\n filter_properties)\n', u' File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2226, in _build_and_run_instance\n instance_uuid=instance.uuid, reason=six.text_type(e))\n', u"RescheduledException: Build of instance 3f057eb2-d7d8-4cc9-8cbe-4a796a759f3c was re-scheduled: iSCSI device not found at [u'/dev/disk/by-path/ip-10.61.1.12:3260-iscsi-iqn.2010-10.org.openstack:volume-0508d31d-f887-4605-9bd6-d790ba47a33f-lun-0']\n"] sosreport-ctl002/var/log/nova/nova-conductor.log:2016-04-07 16:03:50.723 9893 WARNING nova.scheduler.driver [req-190c2537-09f1-4b08-954c-9c1969889e25 None] [instance: 3f057eb2-d7d8-4cc9-8cbe-4a796a759f3c] NoValidHost exception with message: 'No valid host was found.' sosreport-ctl001/var/log/nova/nova-conductor.log:2016-04-07 16:02:47.334 35886 ERROR nova.scheduler.utils [req-6cead5e6-7a59-4f85-9cc2-9d36dd612368 None] [instance: 0b7460d1-6751-493e-b945-a3a21e86d73c] Error from last host: sfiappnwh009.statefarm-dss.com (node sfiappnwh009.statefarm-dss.com): [u'Traceback (most recent call last):\n', u' File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2095, in _do_build_and_run_instance\n filter_properties)\n', u' File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2226, in _build_and_run_instance\n instance_uuid=instance.uuid, reason=six.text_type(e))\n', u"RescheduledException: Build of instance 0b7460d1-6751-493e-b945-a3a21e86d73c was re-scheduled: iSCSI device not found at [u'/dev/disk/by-path/ip-10.61.1.21:3260-iscsi-iqn.2010-10.org.openstack:volume-b95fa227-5c34-4f92-b6b2-32fb6e2dbcec-lun-0']\n"] sosreport-ctl001/var/log/nova/nova-conductor.log:2016-04-07 16:02:47.410 35886 WARNING nova.scheduler.driver [req-6cead5e6-7a59-4f85-9cc2-9d36dd612368 None] [instance: 0b7460d1-6751-493e-b945-a3a21e86d73c] NoValidHost exception with message: 'No valid host was found.' OK, that make much more sense with the template, thanks. I was wrong indeed. The issue I see is that Heat is trying to create the server cdhnn *before* the volume cdhnnRootDisk is available. Looking at the timestamps, the volume is available at 16:08:54 but we try to create the server at 16:02:41. I have no idea how that can happen: the reference to the volume in the block_device_mapping section ought to create a dependency between the 2 resources, so that the server is only created once the volume is available. As a short term measure, I would try to explicitly add a DependsOn on rdhnnRootDisk to see if that makes a difference. I also got this error... Resource CREATE failed: ResourceInError: resources.VNFM_Instance: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500 I also got this error... Resource CREATE failed: ResourceInError: resources.VNFM_Instance: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500 |
Description of problem:Heat stack fails with failed volume create, however cinder volume actually created it successfully [root@sfisvlnwh001 ~(keystone_sfdr)]# heat resource-list 8d4850b7-712b-4126-a862-3bb561a73058 | grep -iv comp +-----------------------+--------------------------------------+------------------------------+-----------------+----------------------+ | resource_name | physical_resource_id | resource_type | resource_status | updated_time | +-----------------------+--------------------------------------+------------------------------+-----------------+----------------------+ | cdhrmRootDisk | e78886ce-0ea5-499b-b8ef-a8ae6fc49da9 | OS::Cinder::Volume | CREATE_FAILED | 2016-04-07T21:42:16Z | | cdhzk0attachopt | fc804396-93dd-4941-a30b-0c6c076df939 | OS::Cinder::VolumeAttachment | CREATE_FAILED | 2016-04-07T21:54:54Z | | cdhauxfoyerattachopt | ae01037c-9920-4631-8a48-fbf23e425005 | OS::Cinder::VolumeAttachment | CREATE_FAILED | 2016-04-07T21:58:31Z | | cdhauxrmattachhome | 6cfc589a-1b7b-452d-9d72-963deed17c7b | OS::Cinder::VolumeAttachment | CREATE_FAILED | 2016-04-07T21:59:51Z | | cdhzk2 | 0b7460d1-6751-493e-b945-a3a21e86d73c | OS::Nova::Server | DELETE_FAILED | 2016-04-07T22:01:38Z | | cdhnn | 3f057eb2-d7d8-4cc9-8cbe-4a796a759f3c | OS::Nova::Server | DELETE_FAILED | 2016-04-07T22:02:41Z | +-----------------------+--------------------------------------+------------------------------+-----------------+----------------------+ sfisvlnwh003.statefarm-dss.com /heat/heat-api.log "CREATE_FAILED", "updated_time": "2016-04-07T21:42:16Z", "required_by": ["cdhrm"], "resource_status_reason": "CREATE aborted", "physical_resource_id": "e78886ce-0ea5-499b-b8ef-a8ae6fc49da9", "resource_type": "OS::Cinder::Volume"}, {"resource_name": "cdhzk2", "links": [{"href": "http://10.61.14.126:8004/v1/14d3e5a9d6e [root@sfisvlnwh001 ~(keystone_sfdr)]# cinder show e78886ce-0ea5-499b-b8ef-a8ae6fc49da9 |grep status status | available /cinder/volume.log 2016-04-07 16:08:54.694 2609 INFO cinder.volume.flows.manager.create_volume [req-f8e400b0-a4d3-4774-ac9c-c3e7d0aee3f9 2784e495ac8746069f62440e171b452e 14d3e5a9d6ea45c9a675de6b756bae26 - - -] Volume volume-e78886ce-0ea5-499b-b8ef-a8ae6fc49da9 (e78886ce-0ea5-499b-b8ef-a8ae6fc49da9): created successfully [root@sfiappnwh003 ~]# lvs | grep e7888 volume-e78886ce-0ea5-499b-b8ef-a8ae6fc49da9 cinder-volumes -wi-a----- 50.00g [root@sfiappnwh003 ~]# Version-Release number of selected component (if applicable): openstack-heat-api-2014.2.3-9.el7ost.noarch (controler) openstack-cinder-2014.2.3-11.el7ost.noarch (compute) How reproducible: unknown i've done the deploy 3 times. 1/3 created the volumes. But that attempt also failed to attach volumes. Which I was able to do manually after the deploy successfuly. Steps to Reproduce: 1.deploy heat stack 2.note volume create resource fails 3. Actual results: fail Expected results: stack created successfully Additional info: I have already increased the heat stack timeout, however the stack still fails the same way. The volume create fails well before the 1hour mark of the stack deploy. Sosreports coming soon.