Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1296502

Summary: Volume migration is failed with "Failed to schedule_migrate_volume_t o_host: No valid host was found"
Product: Red Hat OpenStack Reporter: lkuchlan <lkuchlan>
Component: openstack-cinderAssignee: Tom Barron <tbarron>
Status: CLOSED NOTABUG QA Contact: nlevinki <nlevinki>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.0 (Juno)CC: eharney, scohen, sgotliv, yeylon
Target Milestone: ---Keywords: ZStream
Target Release: 6.0 (Juno)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-02-05 12:28:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs none

Description lkuchlan 2016-01-07 11:52:04 UTC
Created attachment 1112424 [details]
logs

Description of problem:
Volume migration is failed with "Failed to schedule_migrate_volume_t o_host: No valid host was found" 
Another issue is that there is not an indication to the failure from CLI 

* I used 2 Netapp-nfs as back ends 

Version-Release number of selected component (if applicable):
openstack-cinder-2014.2.3-11.el7ost.noarch
python-cinder-2014.2.3-11.el7ost.noarch
python-cinderclient-1.1.1-1.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Migrate a volume to a new back end using "migrate" command   


Actual results:
The migration is failed with "Failed to schedule_migrate_volume_t o_host: No valid host was found"
There is not indication to the failure from CLI 

Expected results:
The migrate operation should succeed 
If the migration has failed should be an indication to the failure from CLI

Comment 2 Sean Cohen 2016-01-07 13:54:50 UTC
*** Bug 1296501 has been marked as a duplicate of this bug. ***

Comment 3 Sergey Gotliv 2016-01-24 11:18:54 UTC
it seems like you have a permissions issue with one of your shares (see the log snippet below and then after several retries scheduler is giving up to find an another backend to create a target volume. 

4069d1393e3c4d21b23756c08fd4aaae - - -] Exception creating vol volume-fe24f1ed-f826-4ffd-8411-91aa8db516f7 on share 10.35.64.11:/vol/vol_rhos_auto_iscsi. Details: Unexpected error while running command.
Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf truncate -s 1G /var/lib/cinder/mnt/c5c8033a9dac8f5331e8791240305d43/volume-fe24f1ed-f826-4ffd-8411-91aa8db516f7
Exit code: 1
Stdout: u''
Stderr: '/bin/truncate: cannot open \xe2\x80\x98/var/lib/cinder/mnt/c5c8033a9dac8f5331e8791240305d43/volume-fe24f1ed-f826-4ffd-8411-91aa8db516f7\xe2\x80\x99 for writing: Permission denied\n'
2016-01-07 10:05:55.217 3027 WARNING cinder.volume.manager [req-f5de438d-b884-411b-b5e1-66cfd14d7c3e cd8c93bf3ee346da9763687eaeec8d6c 4069d1393e3c4d21b23756c08fd4aaae - - -] Task 'cinder.volume.flows.manager.create_volume.CreateVolumeFromSpecTask;volume:create' (8d546527-9023-44fc-8d84-8e381734e6cf) transitioned into state 'FAILURE'
2016-01-07 10:05:55.217 3027 TRACE cinder.volume.manager Traceback (most recent call last):
2016-01-07 10:05:55.217 3027 TRACE cinder.volume.manager   File "/usr/lib/python2.7/site-packages/taskflow/engines/action_engine/executor.py", line 34, in _execute_task
2016-01-07 10:05:55.217 3027 TRACE cinder.volume.manager     result = task.execute(**arguments)
2016-01-07 10:05:55.217 3027 TRACE cinder.volume.manager   File "/usr/lib/python2.7/site-packages/cinder/volume/flows/manager/create_volume.py", line 624, in execute
2016-01-07 10:05:55.217 3027 TRACE cinder.volume.manager     **volume_spec)
2016-01-07 10:05:55.217 3027 TRACE cinder.volume.manager   File "/usr/lib/python2.7/site-packages/cinder/volume/flows/manager/create_volume.py", line 598, in _create_raw_volume
2016-01-07 10:05:55.217 3027 TRACE cinder.volume.manager     return self.driver.create_volume(volume_ref)
2016-01-07 10:05:55.217 3027 TRACE cinder.volume.manager   File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/netapp/nfs.py", line 1403, in create_volume
2016-01-07 10:05:55.217 3027 TRACE cinder.volume.manager     data=msg % (volume['name']))
2016-01-07 10:05:55.217 3027 TRACE cinder.volume.manager VolumeBackendAPIException: Bad or unexpected response from the storage volume backend API: Volume volume-fe24f1ed-f826-4ffd-8411-91aa8db516f7 could not be created on shares.

Comment 4 Tom Barron 2016-02-05 12:28:27 UTC
The permissions issue on one of the shares at 2016-01-07 10:05:55 is prior to a restart of all the cinder services at 2016-01-07 10:40:48.  The failed migration is nine minutes after the restart.  So the permissions issue isn't the explanation for this failure in volume migration.

That said, both (1) the failure with log message "Failed to schedule_migrate_volume_to_host: No valid host was found" and (2) no indication of the falure from CLI when the migrate command was issued are expected here.

There is no bug.

Dealing with the second issue first, the migrate command interacts with cinder via a REST api, returning immediately with a 202.  It is expected that the client will poll cinder for results. 'cinder list' will show error when the migration fails, and then examination of the logs for errors will tell the story.  (In Mitaka, asynchronous error reporting has been proposed to make this more friendly without breaking the overall REST async command setup).

With respect to the first issue, we can see from the scheduler log that the attempt to migrate volume ac595e14-329e-4961-9dc4-82443330c8a to backend  cougar08.scl.lab.tlv.redhat.com@Netapp-nfs2#10.35.64.11:/vol/nlevinki02 fails because even though there is enough space (capacity filter passes) that backend doesn't match the extra specs in the volume type on this volume (capability filter fails):

2016-01-07 10:49:03.529 11490 DEBUG cinder.scheduler.filters.capacity_filter [req-de0c2eb4-f0a3-49cc-8b7b-af6f512c1848 cd8c93bf3ee346da9763687eaeec8d6c 4069d1393e3c4d21b23756c08fd4aaae - - -] Sufficient free space for volume creation on host cougar08.scl.lab.tlv.redhat.com@Netapp-nfs2#10.35.64.11:/vol/nlevinki02 (requested / avail): 1/94.0 host_passes /usr/lib/python2.7/site-packages/cinder/scheduler/filters/capacity_filter.py:68
2016-01-07 10:49:03.529 11490 DEBUG cinder.openstack.common.scheduler.filters.capabilities_filter [req-de0c2eb4-f0a3-49cc-8b7b-af6f512c1848 cd8c93bf3ee346da9763687eaeec8d6c 4069d1393e3c4d21b23756c08fd4aaae - - -] extra_spec requirement 'netapp-nfs1' does not match 'netapp-nfs2' _satisfies_extra_specs /usr/lib/python2.7/site-packages/cinder/openstack/common/scheduler/filters/capabilities_filter.py:55
2016-01-07 10:49:03.530 11490 DEBUG cinder.openstack.common.scheduler.filters.capabilities_filter [req-de0c2eb4-f0a3-49cc-8b7b-af6f512c1848 cd8c93bf3ee346da9763687eaeec8d6c 4069d1393e3c4d21b23756c08fd4aaae - - -] host 'cougar08.scl.lab.tlv.redhat.com@Netapp-nfs2#10.35.64.11:/vol/nlevinki02': free_capacity_gb: 94.99, pools: None fails resource_type extra_specs requirements host_passes /usr/lib/python2.7/site-packages/cinder/openstack/common/scheduler/filters/capabilities_filter.py:68
2016-01-07 10:49:03.531 11490 DEBUG cinder.scheduler.filter_scheduler [req-de0c2eb4-f0a3-49cc-8b7b-af6f512c1848 cd8c93bf3ee346da9763687eaeec8d6c 4069d1393e3c4d21b23756c08fd4aaae - - -] Filtered [host 'cougar08.scl.lab.tlv.redhat.com@Netapp-nfs1#10.35.64.11:/vol/nlevinki01': free_capacity_gb: 92.99, pools: None] _get_weighted_candidates /usr/lib/python2.7/site-packages/cinder/scheduler/filter_scheduler.py:300
2016-01-07 10:49:03.533 11490 ERROR cinder.scheduler.manager [req-de0c2eb4-f0a3-49cc-8b7b-af6f512c1848 cd8c93bf3ee346da9763687eaeec8d6c 4069d1393e3c4d21b23756c08fd4aaae - - -] Failed to schedule_migrate_volume_to_host: No valid host was found. Cannot place volume ac595e14-329e-4961-9dc4-82443330c8a3 on cougar08.scl.lab.tlv.redhat.com@Netapp-nfs2

The capability filter fails because the volume has an extra spec with key-value 'volume_backend_name': 'netapp-nfs1'.  There is a hint of this in the name of the name of the extra spec requirement that is cited in the failure message, but the proof is found by searching backward in the log for when volume ac595e14-329e-4961-9dc4-82443330c8a3 was created:

2016-01-07 10:43:46.112 11490 DEBUG cinder.scheduler.manager [req-58224d97-16a6-404c-9864-791d620b7366 cd8c93bf3ee346da9763687eaeec8d6c 4069d1393e3c4d21b23756c08fd4aaae - - -] Task 'cinder.scheduler.flows.create_volume.ExtractSchedulerSpecTask;volume:create' (3adb8fce-db60-4633-b592-f76151932be7) transitioned into state 'SUCCESS' with result '{'request_spec': {u'source_replicaid': None, u'volume_properties': {u'status': u'creating', u'volume_type_id': u'fb45f12a-ffc4-4361-a72b-f0fa3eef252e', u'display_name': u'image-volume', u'availability_zone': u'nova', u'consistencygroup_id': None, u'reservations': [u'1125d433-74f1-46b4-89d4-031245130c32', u'1a4c3e6c-fa7c-4d0e-986a-e9d3ebfd0e25', u'e4025765-0e32-4981-82a7-512e03cf0d46', u'057e7b5d-e83b-4185-b85f-469beedaf9ef'], u'volume_admin_metadata': [], u'attach_status': u'detached', u'display_description': None, u'metadata': {}, u'source_replicaid': None, u'encryption_key_id': None, u'volume_metadata': [], u'replication_status': u'disabled', u'snapshot_id': None, u'source_volid': None, u'user_id': u'cd8c93bf3ee346da9763687eaeec8d6c', u'project_id': u'4069d1393e3c4d21b23756c08fd4aaae', u'id': u'ac595e14-329e-4961-9dc4-82443330c8a3', u'size': 1}, u'volume_type': {u'name': u'NetappNfs1', u'qos_specs_id': None, u'deleted': False, u'created_at': u'2016-01-07T08:04:01.000000', u'updated_at': None, u'extra_specs': {u'volume_backend_name': u'netapp-nfs1'}, u'deleted_at': None, u'id': u'fb45f12a-ffc4-4361-a72b-f0fa3eef252e'}, u'image_id': u'f8ad6041-df99-4f25-b31c-919f638019a8', u'snapshot_id': None, u'consistencygroup_id': None, u'source_volid': None, u'volume_id': u'ac595e14-329e-4961-9dc4-82443330c8a3'}}' _task_receiver /usr/lib/python2.7/site-packages/cinder/flow_utils.py:113

Looking inside this log message one can see that the volume was created with a volume type named NetappNfs1 which has extra specs that require volume_backend_name 'netapp-nfs1'.

With this volume type/extra specs, this volume cannot be migrated via the migrate command since that command honors the volume's volume type.

One should be able to use the 'retype' command for a move of this kind.  This is "scenario 2" in the documentation at:

http://docs.openstack.org/developer/cinder/devref/migration.html