Description of problem:
Migrating VM volumes from one NAS storage to another NAS storage using cinder retype --migration-policy on-demand. VM has three volumes, but only two are migrated successfully and updated in MariaDB. Third volume is migrated but the MariaDB is not updated with the name_id, volume_type and status.
Version-Release number of selected component (if applicable):
openstack-cinder-2014.1.5-8.el7ost.noarch Mon Jul 25 13:53:43 2016
openstack-cinder-doc-2014.1.5-8.el7ost.noarch Mon Jul 25 13:53:44 2016
python-cinder-2014.1.5-8.el7ost.noarch Mon Jul 25 13:53:43 2016
python-cinderclient-1.0.9-2.el7ost.noarch Mon Jul 25 13:53:41 2016
python-cinderclient-doc-1.0.9-2.el7ost.noarch Mon Jul 25 13:53:44 2016
Executed within script attached to this BZ
commands.getoutput("cinder --os-volume-api-version 2 retype --migration-policy on-demand %s %s" % (volumeid, volume_type))
From the procs fd output we can see that the VMs all three volumes are migrated, however only two are updated in the MariaDB.
All three volumes being migrated and updated in MariaDB.
This behavior is observed in one out of five vms.
Possibly related to this upstream bug, which sounds like the same symptom and has a proposed fix:
Checking the libvirt vs the nova output, we saw a different volumes appear to be attached.
nova shows connected:
Looking at the database, it appears that device that is stuck attaching is the migrated volume that is stuck retyping
MariaDB [cinder]> select * from volumes where id LIKE '%f0721'\G
*************************** 1. row ***************************
created_at: 2017-03-02 05:47:12
updated_at: 2017-03-02 06:12:14
scheduled_at: 2017-03-02 05:47:12
launched_at: 2017-03-02 06:12:13
migration_status: target:6e8693e5-6979-4a7a-8f65-95b0ff8b8bdb <----
I suspect that the failure here is that nova does not correctly communicate with cinder that the volume has been successful attached, although I didnt see any obvious sign of this in the nova logs.
the customer's controller logs are lacking, so i have requested that they upload them again.
Red Hat OpenStack Platform version 5 is now End-of-Life, and as such will not have further updates. See https://access.redhat.com/support/policy/updates/openstack/platform/ for full support lifecycle details.
After much effort and internal testing, we have determined this operation and live volume migration is not stable in the OSP-5 release, nor did we come up with any viable fixes as there was a great deal of re-work done to improve this in later and more recent releases.
It's worth noting at least the suggestions around the keystone timer and other things that could timeout these very long and problematic operations:
expiration time to 2 days executing something like this on each keystone node:
$ openstack-config -set /etc/keystone/keystone.conf token expiration 172800
$ service openstack-keystone restart
That said, the only recommendation here is to do offline migration and look for alternatives to moving data.