Bug 1440700 - Unable to live migrate Nova instance with attached NFS backed Cinder volume
Summary: Unable to live migrate Nova instance with attached NFS backed Cinder volume
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 11.0 (Ocata)
Hardware: Unspecified
OS: Unspecified
medium
urgent
Target Milestone: z3
: 11.0 (Ocata)
Assignee: Alan Bishop
QA Contact: Amit Ugol
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-04-10 10:10 UTC by Marius Cornea
Modified: 2017-10-31 17:37 UTC (History)
11 users (show)

Fixed In Version: openstack-tripleo-heat-templates-6.1.0-1.el7ost, puppet-tripleo-6.5.0-1.el7ost, puppet-cinder-10.3.1-1.el7ost
Doc Type: Bug Fix
Doc Text:
Previously, some cinder volume operations would fail when using the NFS backend. This was because cinder's NFS backend driver implements enhanced NAS security features that are enabled by default. These features require non-standard configuration changes in nova's libvirt, and without these changes, some cinder volume operations would fail. This update introduces TripleO settings to control the NFS driver's NAS secure features, and disables the features by default. As a result, cinder volume operations no longer fail when using the NFS backend.
Clone Of:
Environment:
Last Closed: 2017-10-31 17:37:35 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Cinder config files and logs (107.76 KB, application/x-gzip)
2017-09-11 08:56 UTC, Tzach Shefi
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1688332 0 None None None 2017-05-04 15:53:56 UTC
OpenStack gerrit 462663 0 None MERGED Add support for Cinder "NAS secure" driver params 2021-01-28 09:35:14 UTC
OpenStack gerrit 462665 0 None MERGED Add support for Cinder "NAS secure" driver params 2021-01-28 09:35:14 UTC
OpenStack gerrit 462667 0 None MERGED Add support for Cinder "NAS secure" driver params 2021-01-28 09:35:14 UTC
Red Hat Bugzilla 1503214 0 urgent CLOSED OSP11 -> OSP12 upgrade: unable to migrate instance with cinder volume attached when cinder uses an NFS backend 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHBA-2017:3098 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 11.0 director Bug Fix Advisory 2017-10-31 21:33:28 UTC

Internal Links: 1503214

Description Marius Cornea 2017-04-10 10:10:57 UTC
Description of problem:
Live migrating Nova instances which have NFS backed cinder volume attached fails.

In /var/log/cinder/volume.log:

2017-04-10 09:56:28.895 75343 ERROR oslo_messaging.rpc.server VolumeBackendAPIException: Bad or unexpected response from the storage volume backend API: Driver initialize connection failed (error: Unexpected error while running command.
2017-04-10 09:56:28.895 75343 ERROR oslo_messaging.rpc.server Command: /usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=8 -- env LC_ALL=C qemu-img info /var/lib/cinder/mnt/93dfa45819ccd57c0cb9b93cd07c9128/volume-6a9840ca-c3cd-4903-aa44-9ad751ece627
2017-04-10 09:56:28.895 75343 ERROR oslo_messaging.rpc.server Exit code: 1
2017-04-10 09:56:28.895 75343 ERROR oslo_messaging.rpc.server Stdout: u''
2017-04-10 09:56:28.895 75343 ERROR oslo_messaging.rpc.server Stderr: u"qemu-img: Could not open '/var/lib/cinder/mnt/93dfa45819ccd57c0cb9b93cd07c9128/volume-6a9840ca-c3cd-4903-aa44-9ad751ece627': Could not open '/var/lib/cinder/mnt/93dfa45819ccd57c0cb9b93cd07c9128/volume-6a9840ca-c3cd-4903-aa44-9ad751ece627': Permission denied\n").
2017-04-10 09:56:28.895 75343 ERROR oslo_messaging.rpc.server 
 

Version-Release number of selected component (if applicable):
puppet-cinder-10.3.0-1.el7ost.noarch
python-cinder-10.0.0-4.el7ost.noarch
openstack-cinder-10.0.0-4.el7ost.noarch


How reproducible:
100%

Steps to Reproduce:
1. Deploy OSP11 with NFS backend for Cinder
2. Create volume
3. Launch instance and attach the volume to it
4. Live migrate instance

Actual results:
Live migration doesn't work.

Expected results:
Live migration succeeds.

Additional info:
It appears that the volume file is accessible only to the qemu user/group:

[root@overcloud-controller-0 heat-admin]# ls -l /var/lib/cinder/mnt/93dfa45819ccd57c0cb9b93cd07c9128/volume-6a9840ca-c3cd-4903-aa44-9ad751ece627
-rw-rw----. 1 qemu qemu 1073741824 Apr  9 23:24 /var/lib/cinder/mnt/93dfa45819ccd57c0cb9b93cd07c9128/volume-6a9840ca-c3cd-4903-aa44-9ad751ece627

The same operation works fine on OSP10.

Comment 6 Marius Cornea 2017-04-10 18:10:56 UTC
As a workaround add this environment file to the deployment:

parameter_defaults:
  ControllerExtraConfig:
    cinder::config::cinder_config:
      tripleo_nfs/nas_secure_file_operations:
        value: false

Removing the blocker flag.

Comment 9 Lon Hohberger 2017-09-06 19:59:15 UTC
According to our records, this should be resolved by openstack-tripleo-heat-templates-6.1.0-2.el7ost.  This build is available now.

Comment 10 Lon Hohberger 2017-09-06 19:59:20 UTC
According to our records, this should be resolved by puppet-tripleo-6.5.0-5.el7ost.  This build is available now.

Comment 11 Lon Hohberger 2017-09-06 19:59:24 UTC
According to our records, this should be resolved by puppet-cinder-10.3.1-1.el7ost.  This build is available now.

Comment 12 Tzach Shefi 2017-09-11 08:55:23 UTC
Alan I hit an issue, failed to verify, got stuck on Cinder create didn't even reach migration yet. 

Versions:
openstack-tripleo-heat-templates-6.2.0-3.el7ost.noarch
puppet-tripleo-6.5.0-8.el7ost.noarch
puppet-cinder-10.3.1-1.el7ost.noarch

This is the file I added to overcloud_deploy, to enable nfs for Cinder (and glance by mistake, not needed for this bug)

parameter_defaults:
  CinderEnableIscsiBackend: false
  CinderEnableRbdBackend: false
  CinderEnableNfsBackend: true
  CinderNfsMountOptions: 'retry=1'
  CinderNfsServers: '10.35.160.111:/export/ins_cinder'

  GlanceBackend: 'file'
  GlanceNfsEnabled: true
  GlanceNfsShare: '10.35.160.111:/export/ins_glance'

Shares work the deployment finished, Cinder create fails. 

cinder.conf relevant bits:
enabled_backends = tripleo_nfs

[tripleo_nfs]
volume_backend_name=tripleo_nfs
volume_driver=cinder.volume.drivers.nfs.NfsDriver
nfs_shares_config=/etc/cinder/shares-nfs.conf
nfs_mount_options=retry=1
nas_secure_file_operations=False         -> good these are add by default
nas_secure_file_permissions=False


Vol in error state
[stack@undercloud-0 ~]$ cinder list

| 9270f2b6-2bbb-4be8-9d56-ba484a2dd722 | error


Volume.log errors

2017-09-11 08:23:11.243 101043 ERROR cinder.service [-] Manager for service cinder-volume hostgroup@tripleo_nfs is reporting problems, not sending heartbeat. Service will appear "down".
2017-09-11 08:23:21.252 101043 ERROR cinder.service [-] Manager for service cinder-volume hostgroup@tripleo_nfs is reporting problems, not sending heartbeat. Service will appear "down".
2017-09-11 08:23:29.242 101043 DEBUG oslo_service.periodic_task [req-3fd33190-6c7f-4b47-9649-db0965a0e9b9 - - - - -] Running periodic task VolumeManager._publish_service_capabilities run_periodic_tasks /usr/lib/python2.7/site-packages/oslo_service/periodic_task.py:215
2017-09-11 08:23:29.242 101043 DEBUG oslo_service.periodic_task [req-3fd33190-6c7f-4b47-9649-db0965a0e9b9 - - - - -] Running periodic task VolumeManager._report_driver_status run_periodic_tasks /usr/lib/python2.7/site-packages/oslo_service/periodic_task.py:215
2017-09-11 08:23:29.243 101043 WARNING cinder.volume.manager [req-3fd33190-6c7f-4b47-9649-db0965a0e9b9 - - - - -] Update driver status failed: (config name tripleo_nfs) is uninitialized.
2017-09-11 08:23:31.254 101043 ERROR cinder.service [-] Manager for service cinder-volume hostgroup@tripleo_nfs is reporting problems, not sending heartbeat. Service will appear "down".


# mount | grep 10.35.160   -> only Glance is shown but no Cinder mount

10.35.160.111:/export/ins_glance on /var/lib/glance/images type nfs4 (rw,relatime,context=system_u:object_r:glance_var_lib_t:s0,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.0.0.108,local_lock=none,addr=10.35.160.111)


Mounts on the server are alive
# showmount -e 10.35.160.111
Export list for 10.35.160.111:
/export/ins_cinder *
/export/ins_glance *   


Manual mount to /mnt/cinder fails, odd says it's already mounted but I don't see it

# mount 10.35.160.111:/export/ins_cinder /mnt/cinder
mount.nfs: /mnt/cinder is busy or already mounted

# mount | grep cinder
Nada nothing. 

Figured it's a single backend no need to manually define cinder type-create but maybe. That didn't work, so I created a type and set it's backend that didn't help either

[stack@undercloud-0 ~]$ cinder type-create nfs
| 3f1636c0-94ac-4ebb-9612-a0524d815b07 | nfs  | -           | True      |
[stack@undercloud-0 ~]$ cinder type-key nfs set volume_backend_name=tripleo_nfs
[stack@undercloud-0 ~]$ cinder extra-specs-list 
| 3f1636c0-94ac-4ebb-9612-a0524d815b07 | nfs  | {'volume_backend_name': 'tripleo_nfs'} |

[stack@undercloud-0 ~]$ cinder create 1 --volume-type nfs
Vol also in error state. 


grep -ir 9270f2b6-2bbb-4be8-9d56-ba484a2dd722 /var/log/cinder/


/var/log/cinder/scheduler.log:2017-09-11 08:09:52.170 78621 DEBUG cinder.volume.flows.common [req-1cae1c50-bf66-4a08-9b96-07da3a559872 b9da1699e5524941bcadf3f2393a2792 f780603a47df4840aad0b77583907364 - default default] Setting Volume 9270f2b6-2bbb-4be8-9d56-ba484a2dd722 to error due to: No valid backend was found. No weighed backends available error_out /usr/lib/python2.7/site-packages/cinder/volume/flows/common.py:85
/var/log/cinder/cinder-api.log:2017-09-11 08:09:52.008 109865 DEBUG cinder.volume.api [req-1cae1c50-bf66-4a08-9b96-07da3a559872 b9da1699e5524941bcadf3f2393a2792 f780603a47df4840aad0b77583907364 - default default] Task 'cinder.volume.flows.api.create_volume.EntryCreateTask;volume:create' (7ab55d59-0058-47a7-98dc-58935833e55b) transitioned into state 'SUCCESS' from state 'RUNNING' with result '{'volume': Volume(_name_id=None,admin_metadata=<?>,attach_status='detached',availability_zone='nova',bootable=False,cluster=<?>,cluster_name=None,consistencygroup=<?>,consistencygroup_id=None,created_at=2017-09-11T08:09:51Z,deleted=False,deleted_at=None,display_description=None,display_name=None,ec2_id=None,encryption_key_id=None,glance_metadata=<?>,group=<?>,group_id=None,host=None,id=9270f2b6-2bbb-4be8-9d56-ba484a2dd722,launched_at=None,metadata={},migration_status=None,multiattach=False,previous_status=None,project_id='f780603a47df4840aad0b77583907364',provider_auth=None,provider_geometry=None,provider_id=None,provider_location=None,replication_driver_data=None,replication_extended_status=None,replication_status=None,scheduled_at=None,size=1,snapshot_id=None,snapshots=<?>,source_volid=None,status='creating',terminated_at=None,updated_at=None,user_id='b9da1699e5524941bcadf3f2393a2792',volume_attachment=<?>,volume_type=<?>,volume_type_id=None), 'volume_properties': VolumeProperties(attach_status='detached',availability_zone='nova',cgsnapshot_id=None,consistencygroup_id=None,display_description=None,display_name=None,encryption_key_id=None,group_id=None,group_type_id=<?>,metadata={},multiattach=False,project_id='f780603a47df4840aad0b77583907364',qos_specs=None,replication_status=<?>,reservations=['7e316b1c-f354-4e08-9332-b374c89cde0c','b712d2b8-d0a3-4883-b299-42cfba3eac2c'],size=1,snapshot_id=None,source_replicaid=None,source_volid=None,status='creating',user_id='b9da1699e5524941bcadf3f2393a2792',volume_type_id=None), 'volume_id': '9270f2b6-2bbb-4be8-9d56-ba484a2dd722'}' _task_receiver /usr/lib/python2.7/site-packages/taskflow/listeners/logging.py:183
/var/log/cinder/cinder-api.log:2017-09-11 08:09:52.075 109865 INFO cinder.api.openstack.wsgi [req-1a84f97b-7821-4c19-bd26-8660ecc5a8bd b9da1699e5524941bcadf3f2393a2792 f780603a47df4840aad0b77583907364 - default default] GET http://10.0.0.104:8776/v2/f780603a47df4840aad0b77583907364/volumes/9270f2b6-2bbb-4be8-9d56-ba484a2dd722
/var/log/cinder/cinder-api.log:2017-09-11 08:09:52.147 109865 INFO cinder.api.openstack.wsgi [req-1a84f97b-7821-4c19-bd26-8660ecc5a8bd b9da1699e5524941bcadf3f2393a2792 f780603a47df4840aad0b77583907364 - default default] http://10.0.0.104:8776/v2/f780603a47df4840aad0b77583907364/volumes/9270f2b6-2bbb-4be8-9d56-ba484a2dd722 returned with HTTP 200
/var/log/cinder/cinder-api.log:2017-09-11 08:22:18.431 109865 INFO cinder.api.openstack.wsgi [req-2432adf6-96a9-4e52-81cf-570fd415b608 b9da1699e5524941bcadf3f2393a2792 f780603a47df4840aad0b77583907364 - default default] GET http://10.0.0.104:8776/v2/f780603a47df4840aad0b77583907364/volumes/9270f2b6-2bbb-4be8-9d56-ba484a2dd722
/var/log/cinder/cinder-api.log:2017-09-11 08:22:18.495 109865 INFO cinder.api.openstack.wsgi [req-2432adf6-96a9-4e52-81cf-570fd415b608 b9da1699e5524941bcadf3f2393a2792 f780603a47df4840aad0b77583907364 - default default] http://10.0.0.104:8776/v2/f780603a47df4840aad0b77583907364/volumes/9270f2b6-2bbb-4be8-9d56-ba484a2dd722 returned with HTTP 200


An this smoking gun bit
/var/log/cinder/volume.log:2017-09-11 07:42:09.966 101043 ERROR cinder.volume.drivers.remotefs [req-90bc1c8d-5424-44f3-917b-773bc84dcd38 - - - - -] Exception during mounting NFS mount failed for share 10.35.160.111:/export/ins_cinder. Error - {'pnfs': u"Unexpected error while running command.\nCommand: sudo cinder-rootwrap /etc/cinder/rootwrap.conf mount -t nfs -o retry=1,vers=4,minorversion=1 10.35.160.111:/export/ins_cinder /var/lib/cinder/mnt/47266020eacec99097bdec49f2451d38\nExit code: 32\nStdout: u''\nStderr: u'mount.nfs: /var/lib/cinder/mnt/47266020eacec99097bdec49f2451d38 is busy or already mounted\\n'", 'nfs': u"Unexpected error while running command.\nCommand: sudo cinder-rootwrap /etc/cinder/rootwrap.conf mount -t nfs -o retry=1 10.35.160.111:/export/ins_cinder /var/lib/cinder/mnt/47266020eacec99097bdec49f2451d38\nExit code: 32\nStdout: u''\nStderr: u'mount.nfs: /var/lib/cinder/mnt/47266020eacec99097bdec49f2451d38 is busy or already mounted\\n'"}
/var/log/secure:Sep 11 03:


Let me know should I keep system up for you to access?

Comment 13 Tzach Shefi 2017-09-11 08:56:02 UTC
Created attachment 1324370 [details]
Cinder config files and logs

Comment 14 Tzach Shefi 2017-09-13 13:01:40 UTC
Verified, 

A booted instance with an attached nfs backed Cinder volume was successfully migrated to second compute. 

I retested on same undercloud
Versions:
openstack-tripleo-heat-templates-6.2.0-3.el7ost.noarch
puppet-tripleo-6.5.0-8.el7ost.noarch
puppet-cinder-10.3.1-1.el7ost.noarch

This time I didn't configure Glance nfs
Or Cinder's  nfs_mount_options=retry=1

nfs heat template used-> 
$cat nfs11Cinder.yaml
parameter_defaults:
  CinderEnableIscsiBackend: false
  CinderEnableRbdBackend: false
  CinderEnableNfsBackend: true
  CinderNfsMountOptions: ''
  CinderNfsServers: 'W.X.Y.Z:/export/ins_cinder'



Cinder.conf -> 
[tripleo_nfs]
volume_backend_name=tripleo_nfs
volume_driver=cinder.volume.drivers.nfs.NfsDriver
nfs_shares_config=/etc/cinder/shares-nfs.conf
nfs_mount_options=
nas_secure_file_operations=False
nas_secure_file_permissions=False

Basic Cinder sanity create volume worked, moving on. 

1. Cinder create worked
$ cinder list
+--------------------------------------+-----------+------+------+-------------+----------+-------------+
| ID                                   | Status    | Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+------+------+-------------+----------+-------------+
| f15212c4-94f9-4dad-a40e-253f82412ffa | available | -    | 1    | -           | false    |             |
+--------------------------------------+-----------+------+------+-------------+----------+-------------+
 


2. Boot an instance
$ nova list
+--------------------------------------+-------+--------+------------+-------------+-----------------------------------+
| ID                                   | Name  | Status | Task State | Power State | Networks                          |
+--------------------------------------+-------+--------+------------+-------------+-----------------------------------+
| 8e7d2ee0-e106-45f6-8bc0-5d544035997d | inst1 | ACTIVE | -          | Running     | internal=192.168.0.3, 10.10.10.12 |
+--------------------------------------+-------+--------+------------+-------------+-----------------------------------+

3. Attach vol to instance
nova volume-attach 8e7d2ee0-e106-45f6-8bc0-5d544035997d f15212c4-94f9-4dad-a40e-253f82412ffa auto
+----------+--------------------------------------+
| Property | Value                                |
+----------+--------------------------------------+
| device   | /dev/vdb                             |
| id       | f15212c4-94f9-4dad-a40e-253f82412ffa |
| serverId | 8e7d2ee0-e106-45f6-8bc0-5d544035997d |
| volumeId | f15212c4-94f9-4dad-a40e-253f82412ffa |
+----------+--------------------------------------+

4. Now we see an attached vol. 
cinder list
+--------------------------------------+--------+------+------+-------------+----------+--------------------------------------+
| ID                                   | Status | Name | Size | Volume Type | Bootable | Attached to                          |
+--------------------------------------+--------+------+------+-------------+----------+--------------------------------------+
| f15212c4-94f9-4dad-a40e-253f82412ffa | in-use | -    | 1    | -           | false    | 8e7d2ee0-e106-45f6-8bc0-5d544035997d |
+--------------------------------------+--------+------+------+-------------+----------+--------------------------------------+

5. Migrate instance with an attached volume (verification..:) 
$openstack server migrate inst1

$ nova list
+--------------------------------------+-------+--------+------------------+-------------+-----------------------------------+
| ID                                   | Name  | Status | Task State       | Power State | Networks                          |
+--------------------------------------+-------+--------+------------------+-------------+-----------------------------------+
| 8e7d2ee0-e106-45f6-8bc0-5d544035997d | inst1 | RESIZE | resize_migrating | Running     | internal=192.168.0.3, 10.10.10.12 |
+--------------------------------------+-------+--------+------------------+-------------+-----------------------------------+

$ nova list
+--------------------------------------+-------+---------------+------------+-------------+-----------------------------------+
| ID                                   | Name  | Status        | Task State | Power State | Networks                          |
+--------------------------------------+-------+---------------+------------+-------------+-----------------------------------+
| 8e7d2ee0-e106-45f6-8bc0-5d544035997d | inst1 | VERIFY_RESIZE | -          | Running     | internal=192.168.0.3, 10.10.10.12 |
+--------------------------------------+-------+---------------+------------+-------------+-----------------------------------+

$ nova resize-confirm inst1

6. Post migrate, inst1 alive and has an attached volume -. Argo verified :) 
[stack@undercloud-0 ~]$ nova list
+--------------------------------------+-------+--------+------------+-------------+-----------------------------------+
| ID                                   | Name  | Status | Task State | Power State | Networks                          |
+--------------------------------------+-------+--------+------------+-------------+-----------------------------------+
| 8e7d2ee0-e106-45f6-8bc0-5d544035997d | inst1 | ACTIVE | -          | Running     | internal=192.168.0.3, 10.10.10.12 |
+--------------------------------------+-------+--------+------------+-------------+-----------------------------------+
[stack@undercloud-0 ~]$ cinder list
+--------------------------------------+--------+------+------+-------------+----------+--------------------------------------+
| ID                                   | Status | Name | Size | Volume Type | Bootable | Attached to                          |
+--------------------------------------+--------+------+------+-------------+----------+--------------------------------------+
| f15212c4-94f9-4dad-a40e-253f82412ffa | in-use | -    | 1    | -           | false    | 8e7d2ee0-e106-45f6-8bc0-5d544035997d |
+--------------------------------------+--------+------+------+-------------+----------+--------------------------------------+

Comment 17 errata-xmlrpc 2017-10-31 17:37:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3098


Note You need to log in before you can comment on or make changes to this bug.