1296502 – Volume migration is failed with "Failed to schedule_migrate_volume_t o_host: No valid host was found"

Bug 1296502 - Volume migration is failed with "Failed to schedule_migrate_volume_t o_host: No valid host was found"

Summary: Volume migration is failed with "Failed to schedule_migrate_volume_t o_host: ...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-cinder
Sub Component:
Version:	6.0 (Juno)
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	6.0 (Juno)
Assignee:	Tom Barron
QA Contact:	nlevinki
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1296501 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-01-07 11:52 UTC by lkuchlan
Modified:	2016-04-27 03:59 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-02-05 12:28:50 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
logs (14.96 MB, application/x-tar) 2016-01-07 11:52 UTC, lkuchlan	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Launchpad	1531830	0	None	None	None	2016-01-07 11:56:38 UTC

Description lkuchlan 2016-01-07 11:52:04 UTC

Created attachment 1112424 [details]
logs

Description of problem:
Volume migration is failed with "Failed to schedule_migrate_volume_t o_host: No valid host was found" 
Another issue is that there is not an indication to the failure from CLI 

* I used 2 Netapp-nfs as back ends 

Version-Release number of selected component (if applicable):
openstack-cinder-2014.2.3-11.el7ost.noarch
python-cinder-2014.2.3-11.el7ost.noarch
python-cinderclient-1.1.1-1.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Migrate a volume to a new back end using "migrate" command   


Actual results:
The migration is failed with "Failed to schedule_migrate_volume_t o_host: No valid host was found"
There is not indication to the failure from CLI 

Expected results:
The migrate operation should succeed 
If the migration has failed should be an indication to the failure from CLI

Comment 2 Sean Cohen 2016-01-07 13:54:50 UTC

*** Bug 1296501 has been marked as a duplicate of this bug. ***

Comment 3 Sergey Gotliv 2016-01-24 11:18:54 UTC

it seems like you have a permissions issue with one of your shares (see the log snippet below and then after several retries scheduler is giving up to find an another backend to create a target volume. 

4069d1393e3c4d21b23756c08fd4aaae - - -] Exception creating vol volume-fe24f1ed-f826-4ffd-8411-91aa8db516f7 on share 10.35.64.11:/vol/vol_rhos_auto_iscsi. Details: Unexpected error while running command.
Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf truncate -s 1G /var/lib/cinder/mnt/c5c8033a9dac8f5331e8791240305d43/volume-fe24f1ed-f826-4ffd-8411-91aa8db516f7
Exit code: 1
Stdout: u''
Stderr: '/bin/truncate: cannot open \xe2\x80\x98/var/lib/cinder/mnt/c5c8033a9dac8f5331e8791240305d43/volume-fe24f1ed-f826-4ffd-8411-91aa8db516f7\xe2\x80\x99 for writing: Permission denied\n'
2016-01-07 10:05:55.217 3027 WARNING cinder.volume.manager [req-f5de438d-b884-411b-b5e1-66cfd14d7c3e cd8c93bf3ee346da9763687eaeec8d6c 4069d1393e3c4d21b23756c08fd4aaae - - -] Task 'cinder.volume.flows.manager.create_volume.CreateVolumeFromSpecTask;volume:create' (8d546527-9023-44fc-8d84-8e381734e6cf) transitioned into state 'FAILURE'
2016-01-07 10:05:55.217 3027 TRACE cinder.volume.manager Traceback (most recent call last):
2016-01-07 10:05:55.217 3027 TRACE cinder.volume.manager   File "/usr/lib/python2.7/site-packages/taskflow/engines/action_engine/executor.py", line 34, in _execute_task
2016-01-07 10:05:55.217 3027 TRACE cinder.volume.manager     result = task.execute(**arguments)
2016-01-07 10:05:55.217 3027 TRACE cinder.volume.manager   File "/usr/lib/python2.7/site-packages/cinder/volume/flows/manager/create_volume.py", line 624, in execute
2016-01-07 10:05:55.217 3027 TRACE cinder.volume.manager     **volume_spec)
2016-01-07 10:05:55.217 3027 TRACE cinder.volume.manager   File "/usr/lib/python2.7/site-packages/cinder/volume/flows/manager/create_volume.py", line 598, in _create_raw_volume
2016-01-07 10:05:55.217 3027 TRACE cinder.volume.manager     return self.driver.create_volume(volume_ref)
2016-01-07 10:05:55.217 3027 TRACE cinder.volume.manager   File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/netapp/nfs.py", line 1403, in create_volume
2016-01-07 10:05:55.217 3027 TRACE cinder.volume.manager     data=msg % (volume['name']))
2016-01-07 10:05:55.217 3027 TRACE cinder.volume.manager VolumeBackendAPIException: Bad or unexpected response from the storage volume backend API: Volume volume-fe24f1ed-f826-4ffd-8411-91aa8db516f7 could not be created on shares.

Comment 4 Tom Barron 2016-02-05 12:28:27 UTC

The permissions issue on one of the shares at 2016-01-07 10:05:55 is prior to a restart of all the cinder services at 2016-01-07 10:40:48.  The failed migration is nine minutes after the restart.  So the permissions issue isn't the explanation for this failure in volume migration.

That said, both (1) the failure with log message "Failed to schedule_migrate_volume_to_host: No valid host was found" and (2) no indication of the falure from CLI when the migrate command was issued are expected here.

There is no bug.

Dealing with the second issue first, the migrate command interacts with cinder via a REST api, returning immediately with a 202.  It is expected that the client will poll cinder for results. 'cinder list' will show error when the migration fails, and then examination of the logs for errors will tell the story.  (In Mitaka, asynchronous error reporting has been proposed to make this more friendly without breaking the overall REST async command setup).

With respect to the first issue, we can see from the scheduler log that the attempt to migrate volume ac595e14-329e-4961-9dc4-82443330c8a to backend  cougar08.scl.lab.tlv.redhat.com@Netapp-nfs2#10.35.64.11:/vol/nlevinki02 fails because even though there is enough space (capacity filter passes) that backend doesn't match the extra specs in the volume type on this volume (capability filter fails):

2016-01-07 10:49:03.529 11490 DEBUG cinder.scheduler.filters.capacity_filter [req-de0c2eb4-f0a3-49cc-8b7b-af6f512c1848 cd8c93bf3ee346da9763687eaeec8d6c 4069d1393e3c4d21b23756c08fd4aaae - - -] Sufficient free space for volume creation on host cougar08.scl.lab.tlv.redhat.com@Netapp-nfs2#10.35.64.11:/vol/nlevinki02 (requested / avail): 1/94.0 host_passes /usr/lib/python2.7/site-packages/cinder/scheduler/filters/capacity_filter.py:68
2016-01-07 10:49:03.529 11490 DEBUG cinder.openstack.common.scheduler.filters.capabilities_filter [req-de0c2eb4-f0a3-49cc-8b7b-af6f512c1848 cd8c93bf3ee346da9763687eaeec8d6c 4069d1393e3c4d21b23756c08fd4aaae - - -] extra_spec requirement 'netapp-nfs1' does not match 'netapp-nfs2' _satisfies_extra_specs /usr/lib/python2.7/site-packages/cinder/openstack/common/scheduler/filters/capabilities_filter.py:55
2016-01-07 10:49:03.530 11490 DEBUG cinder.openstack.common.scheduler.filters.capabilities_filter [req-de0c2eb4-f0a3-49cc-8b7b-af6f512c1848 cd8c93bf3ee346da9763687eaeec8d6c 4069d1393e3c4d21b23756c08fd4aaae - - -] host 'cougar08.scl.lab.tlv.redhat.com@Netapp-nfs2#10.35.64.11:/vol/nlevinki02': free_capacity_gb: 94.99, pools: None fails resource_type extra_specs requirements host_passes /usr/lib/python2.7/site-packages/cinder/openstack/common/scheduler/filters/capabilities_filter.py:68
2016-01-07 10:49:03.531 11490 DEBUG cinder.scheduler.filter_scheduler [req-de0c2eb4-f0a3-49cc-8b7b-af6f512c1848 cd8c93bf3ee346da9763687eaeec8d6c 4069d1393e3c4d21b23756c08fd4aaae - - -] Filtered [host 'cougar08.scl.lab.tlv.redhat.com@Netapp-nfs1#10.35.64.11:/vol/nlevinki01': free_capacity_gb: 92.99, pools: None] _get_weighted_candidates /usr/lib/python2.7/site-packages/cinder/scheduler/filter_scheduler.py:300
2016-01-07 10:49:03.533 11490 ERROR cinder.scheduler.manager [req-de0c2eb4-f0a3-49cc-8b7b-af6f512c1848 cd8c93bf3ee346da9763687eaeec8d6c 4069d1393e3c4d21b23756c08fd4aaae - - -] Failed to schedule_migrate_volume_to_host: No valid host was found. Cannot place volume ac595e14-329e-4961-9dc4-82443330c8a3 on cougar08.scl.lab.tlv.redhat.com@Netapp-nfs2

The capability filter fails because the volume has an extra spec with key-value 'volume_backend_name': 'netapp-nfs1'.  There is a hint of this in the name of the name of the extra spec requirement that is cited in the failure message, but the proof is found by searching backward in the log for when volume ac595e14-329e-4961-9dc4-82443330c8a3 was created:

2016-01-07 10:43:46.112 11490 DEBUG cinder.scheduler.manager [req-58224d97-16a6-404c-9864-791d620b7366 cd8c93bf3ee346da9763687eaeec8d6c 4069d1393e3c4d21b23756c08fd4aaae - - -] Task 'cinder.scheduler.flows.create_volume.ExtractSchedulerSpecTask;volume:create' (3adb8fce-db60-4633-b592-f76151932be7) transitioned into state 'SUCCESS' with result '{'request_spec': {u'source_replicaid': None, u'volume_properties': {u'status': u'creating', u'volume_type_id': u'fb45f12a-ffc4-4361-a72b-f0fa3eef252e', u'display_name': u'image-volume', u'availability_zone': u'nova', u'consistencygroup_id': None, u'reservations': [u'1125d433-74f1-46b4-89d4-031245130c32', u'1a4c3e6c-fa7c-4d0e-986a-e9d3ebfd0e25', u'e4025765-0e32-4981-82a7-512e03cf0d46', u'057e7b5d-e83b-4185-b85f-469beedaf9ef'], u'volume_admin_metadata': [], u'attach_status': u'detached', u'display_description': None, u'metadata': {}, u'source_replicaid': None, u'encryption_key_id': None, u'volume_metadata': [], u'replication_status': u'disabled', u'snapshot_id': None, u'source_volid': None, u'user_id': u'cd8c93bf3ee346da9763687eaeec8d6c', u'project_id': u'4069d1393e3c4d21b23756c08fd4aaae', u'id': u'ac595e14-329e-4961-9dc4-82443330c8a3', u'size': 1}, u'volume_type': {u'name': u'NetappNfs1', u'qos_specs_id': None, u'deleted': False, u'created_at': u'2016-01-07T08:04:01.000000', u'updated_at': None, u'extra_specs': {u'volume_backend_name': u'netapp-nfs1'}, u'deleted_at': None, u'id': u'fb45f12a-ffc4-4361-a72b-f0fa3eef252e'}, u'image_id': u'f8ad6041-df99-4f25-b31c-919f638019a8', u'snapshot_id': None, u'consistencygroup_id': None, u'source_volid': None, u'volume_id': u'ac595e14-329e-4961-9dc4-82443330c8a3'}}' _task_receiver /usr/lib/python2.7/site-packages/cinder/flow_utils.py:113

Looking inside this log message one can see that the volume was created with a volume type named NetappNfs1 which has extra specs that require volume_backend_name 'netapp-nfs1'.

With this volume type/extra specs, this volume cannot be migrated via the migrate command since that command honors the volume's volume type.

One should be able to use the 'retype' command for a move of this kind.  This is "scenario 2" in the documentation at:

http://docs.openstack.org/developer/cinder/devref/migration.html

Note You need to log in before you can comment on or make changes to this bug.