Description of problem: Live migrating Nova instances which have NFS backed cinder volume attached fails. In /var/log/cinder/volume.log: 2017-04-10 09:56:28.895 75343 ERROR oslo_messaging.rpc.server VolumeBackendAPIException: Bad or unexpected response from the storage volume backend API: Driver initialize connection failed (error: Unexpected error while running command. 2017-04-10 09:56:28.895 75343 ERROR oslo_messaging.rpc.server Command: /usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=8 -- env LC_ALL=C qemu-img info /var/lib/cinder/mnt/93dfa45819ccd57c0cb9b93cd07c9128/volume-6a9840ca-c3cd-4903-aa44-9ad751ece627 2017-04-10 09:56:28.895 75343 ERROR oslo_messaging.rpc.server Exit code: 1 2017-04-10 09:56:28.895 75343 ERROR oslo_messaging.rpc.server Stdout: u'' 2017-04-10 09:56:28.895 75343 ERROR oslo_messaging.rpc.server Stderr: u"qemu-img: Could not open '/var/lib/cinder/mnt/93dfa45819ccd57c0cb9b93cd07c9128/volume-6a9840ca-c3cd-4903-aa44-9ad751ece627': Could not open '/var/lib/cinder/mnt/93dfa45819ccd57c0cb9b93cd07c9128/volume-6a9840ca-c3cd-4903-aa44-9ad751ece627': Permission denied\n"). 2017-04-10 09:56:28.895 75343 ERROR oslo_messaging.rpc.server Version-Release number of selected component (if applicable): puppet-cinder-10.3.0-1.el7ost.noarch python-cinder-10.0.0-4.el7ost.noarch openstack-cinder-10.0.0-4.el7ost.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy OSP11 with NFS backend for Cinder 2. Create volume 3. Launch instance and attach the volume to it 4. Live migrate instance Actual results: Live migration doesn't work. Expected results: Live migration succeeds. Additional info: It appears that the volume file is accessible only to the qemu user/group: [root@overcloud-controller-0 heat-admin]# ls -l /var/lib/cinder/mnt/93dfa45819ccd57c0cb9b93cd07c9128/volume-6a9840ca-c3cd-4903-aa44-9ad751ece627 -rw-rw----. 1 qemu qemu 1073741824 Apr 9 23:24 /var/lib/cinder/mnt/93dfa45819ccd57c0cb9b93cd07c9128/volume-6a9840ca-c3cd-4903-aa44-9ad751ece627 The same operation works fine on OSP10.
As a workaround add this environment file to the deployment: parameter_defaults: ControllerExtraConfig: cinder::config::cinder_config: tripleo_nfs/nas_secure_file_operations: value: false Removing the blocker flag.
According to our records, this should be resolved by openstack-tripleo-heat-templates-6.1.0-2.el7ost. This build is available now.
According to our records, this should be resolved by puppet-tripleo-6.5.0-5.el7ost. This build is available now.
According to our records, this should be resolved by puppet-cinder-10.3.1-1.el7ost. This build is available now.
Alan I hit an issue, failed to verify, got stuck on Cinder create didn't even reach migration yet. Versions: openstack-tripleo-heat-templates-6.2.0-3.el7ost.noarch puppet-tripleo-6.5.0-8.el7ost.noarch puppet-cinder-10.3.1-1.el7ost.noarch This is the file I added to overcloud_deploy, to enable nfs for Cinder (and glance by mistake, not needed for this bug) parameter_defaults: CinderEnableIscsiBackend: false CinderEnableRbdBackend: false CinderEnableNfsBackend: true CinderNfsMountOptions: 'retry=1' CinderNfsServers: '10.35.160.111:/export/ins_cinder' GlanceBackend: 'file' GlanceNfsEnabled: true GlanceNfsShare: '10.35.160.111:/export/ins_glance' Shares work the deployment finished, Cinder create fails. cinder.conf relevant bits: enabled_backends = tripleo_nfs [tripleo_nfs] volume_backend_name=tripleo_nfs volume_driver=cinder.volume.drivers.nfs.NfsDriver nfs_shares_config=/etc/cinder/shares-nfs.conf nfs_mount_options=retry=1 nas_secure_file_operations=False -> good these are add by default nas_secure_file_permissions=False Vol in error state [stack@undercloud-0 ~]$ cinder list | 9270f2b6-2bbb-4be8-9d56-ba484a2dd722 | error Volume.log errors 2017-09-11 08:23:11.243 101043 ERROR cinder.service [-] Manager for service cinder-volume hostgroup@tripleo_nfs is reporting problems, not sending heartbeat. Service will appear "down". 2017-09-11 08:23:21.252 101043 ERROR cinder.service [-] Manager for service cinder-volume hostgroup@tripleo_nfs is reporting problems, not sending heartbeat. Service will appear "down". 2017-09-11 08:23:29.242 101043 DEBUG oslo_service.periodic_task [req-3fd33190-6c7f-4b47-9649-db0965a0e9b9 - - - - -] Running periodic task VolumeManager._publish_service_capabilities run_periodic_tasks /usr/lib/python2.7/site-packages/oslo_service/periodic_task.py:215 2017-09-11 08:23:29.242 101043 DEBUG oslo_service.periodic_task [req-3fd33190-6c7f-4b47-9649-db0965a0e9b9 - - - - -] Running periodic task VolumeManager._report_driver_status run_periodic_tasks /usr/lib/python2.7/site-packages/oslo_service/periodic_task.py:215 2017-09-11 08:23:29.243 101043 WARNING cinder.volume.manager [req-3fd33190-6c7f-4b47-9649-db0965a0e9b9 - - - - -] Update driver status failed: (config name tripleo_nfs) is uninitialized. 2017-09-11 08:23:31.254 101043 ERROR cinder.service [-] Manager for service cinder-volume hostgroup@tripleo_nfs is reporting problems, not sending heartbeat. Service will appear "down". # mount | grep 10.35.160 -> only Glance is shown but no Cinder mount 10.35.160.111:/export/ins_glance on /var/lib/glance/images type nfs4 (rw,relatime,context=system_u:object_r:glance_var_lib_t:s0,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.0.0.108,local_lock=none,addr=10.35.160.111) Mounts on the server are alive # showmount -e 10.35.160.111 Export list for 10.35.160.111: /export/ins_cinder * /export/ins_glance * Manual mount to /mnt/cinder fails, odd says it's already mounted but I don't see it # mount 10.35.160.111:/export/ins_cinder /mnt/cinder mount.nfs: /mnt/cinder is busy or already mounted # mount | grep cinder Nada nothing. Figured it's a single backend no need to manually define cinder type-create but maybe. That didn't work, so I created a type and set it's backend that didn't help either [stack@undercloud-0 ~]$ cinder type-create nfs | 3f1636c0-94ac-4ebb-9612-a0524d815b07 | nfs | - | True | [stack@undercloud-0 ~]$ cinder type-key nfs set volume_backend_name=tripleo_nfs [stack@undercloud-0 ~]$ cinder extra-specs-list | 3f1636c0-94ac-4ebb-9612-a0524d815b07 | nfs | {'volume_backend_name': 'tripleo_nfs'} | [stack@undercloud-0 ~]$ cinder create 1 --volume-type nfs Vol also in error state. grep -ir 9270f2b6-2bbb-4be8-9d56-ba484a2dd722 /var/log/cinder/ /var/log/cinder/scheduler.log:2017-09-11 08:09:52.170 78621 DEBUG cinder.volume.flows.common [req-1cae1c50-bf66-4a08-9b96-07da3a559872 b9da1699e5524941bcadf3f2393a2792 f780603a47df4840aad0b77583907364 - default default] Setting Volume 9270f2b6-2bbb-4be8-9d56-ba484a2dd722 to error due to: No valid backend was found. No weighed backends available error_out /usr/lib/python2.7/site-packages/cinder/volume/flows/common.py:85 /var/log/cinder/cinder-api.log:2017-09-11 08:09:52.008 109865 DEBUG cinder.volume.api [req-1cae1c50-bf66-4a08-9b96-07da3a559872 b9da1699e5524941bcadf3f2393a2792 f780603a47df4840aad0b77583907364 - default default] Task 'cinder.volume.flows.api.create_volume.EntryCreateTask;volume:create' (7ab55d59-0058-47a7-98dc-58935833e55b) transitioned into state 'SUCCESS' from state 'RUNNING' with result '{'volume': Volume(_name_id=None,admin_metadata=<?>,attach_status='detached',availability_zone='nova',bootable=False,cluster=<?>,cluster_name=None,consistencygroup=<?>,consistencygroup_id=None,created_at=2017-09-11T08:09:51Z,deleted=False,deleted_at=None,display_description=None,display_name=None,ec2_id=None,encryption_key_id=None,glance_metadata=<?>,group=<?>,group_id=None,host=None,id=9270f2b6-2bbb-4be8-9d56-ba484a2dd722,launched_at=None,metadata={},migration_status=None,multiattach=False,previous_status=None,project_id='f780603a47df4840aad0b77583907364',provider_auth=None,provider_geometry=None,provider_id=None,provider_location=None,replication_driver_data=None,replication_extended_status=None,replication_status=None,scheduled_at=None,size=1,snapshot_id=None,snapshots=<?>,source_volid=None,status='creating',terminated_at=None,updated_at=None,user_id='b9da1699e5524941bcadf3f2393a2792',volume_attachment=<?>,volume_type=<?>,volume_type_id=None), 'volume_properties': VolumeProperties(attach_status='detached',availability_zone='nova',cgsnapshot_id=None,consistencygroup_id=None,display_description=None,display_name=None,encryption_key_id=None,group_id=None,group_type_id=<?>,metadata={},multiattach=False,project_id='f780603a47df4840aad0b77583907364',qos_specs=None,replication_status=<?>,reservations=['7e316b1c-f354-4e08-9332-b374c89cde0c','b712d2b8-d0a3-4883-b299-42cfba3eac2c'],size=1,snapshot_id=None,source_replicaid=None,source_volid=None,status='creating',user_id='b9da1699e5524941bcadf3f2393a2792',volume_type_id=None), 'volume_id': '9270f2b6-2bbb-4be8-9d56-ba484a2dd722'}' _task_receiver /usr/lib/python2.7/site-packages/taskflow/listeners/logging.py:183 /var/log/cinder/cinder-api.log:2017-09-11 08:09:52.075 109865 INFO cinder.api.openstack.wsgi [req-1a84f97b-7821-4c19-bd26-8660ecc5a8bd b9da1699e5524941bcadf3f2393a2792 f780603a47df4840aad0b77583907364 - default default] GET http://10.0.0.104:8776/v2/f780603a47df4840aad0b77583907364/volumes/9270f2b6-2bbb-4be8-9d56-ba484a2dd722 /var/log/cinder/cinder-api.log:2017-09-11 08:09:52.147 109865 INFO cinder.api.openstack.wsgi [req-1a84f97b-7821-4c19-bd26-8660ecc5a8bd b9da1699e5524941bcadf3f2393a2792 f780603a47df4840aad0b77583907364 - default default] http://10.0.0.104:8776/v2/f780603a47df4840aad0b77583907364/volumes/9270f2b6-2bbb-4be8-9d56-ba484a2dd722 returned with HTTP 200 /var/log/cinder/cinder-api.log:2017-09-11 08:22:18.431 109865 INFO cinder.api.openstack.wsgi [req-2432adf6-96a9-4e52-81cf-570fd415b608 b9da1699e5524941bcadf3f2393a2792 f780603a47df4840aad0b77583907364 - default default] GET http://10.0.0.104:8776/v2/f780603a47df4840aad0b77583907364/volumes/9270f2b6-2bbb-4be8-9d56-ba484a2dd722 /var/log/cinder/cinder-api.log:2017-09-11 08:22:18.495 109865 INFO cinder.api.openstack.wsgi [req-2432adf6-96a9-4e52-81cf-570fd415b608 b9da1699e5524941bcadf3f2393a2792 f780603a47df4840aad0b77583907364 - default default] http://10.0.0.104:8776/v2/f780603a47df4840aad0b77583907364/volumes/9270f2b6-2bbb-4be8-9d56-ba484a2dd722 returned with HTTP 200 An this smoking gun bit /var/log/cinder/volume.log:2017-09-11 07:42:09.966 101043 ERROR cinder.volume.drivers.remotefs [req-90bc1c8d-5424-44f3-917b-773bc84dcd38 - - - - -] Exception during mounting NFS mount failed for share 10.35.160.111:/export/ins_cinder. Error - {'pnfs': u"Unexpected error while running command.\nCommand: sudo cinder-rootwrap /etc/cinder/rootwrap.conf mount -t nfs -o retry=1,vers=4,minorversion=1 10.35.160.111:/export/ins_cinder /var/lib/cinder/mnt/47266020eacec99097bdec49f2451d38\nExit code: 32\nStdout: u''\nStderr: u'mount.nfs: /var/lib/cinder/mnt/47266020eacec99097bdec49f2451d38 is busy or already mounted\\n'", 'nfs': u"Unexpected error while running command.\nCommand: sudo cinder-rootwrap /etc/cinder/rootwrap.conf mount -t nfs -o retry=1 10.35.160.111:/export/ins_cinder /var/lib/cinder/mnt/47266020eacec99097bdec49f2451d38\nExit code: 32\nStdout: u''\nStderr: u'mount.nfs: /var/lib/cinder/mnt/47266020eacec99097bdec49f2451d38 is busy or already mounted\\n'"} /var/log/secure:Sep 11 03: Let me know should I keep system up for you to access?
Created attachment 1324370 [details] Cinder config files and logs
Verified, A booted instance with an attached nfs backed Cinder volume was successfully migrated to second compute. I retested on same undercloud Versions: openstack-tripleo-heat-templates-6.2.0-3.el7ost.noarch puppet-tripleo-6.5.0-8.el7ost.noarch puppet-cinder-10.3.1-1.el7ost.noarch This time I didn't configure Glance nfs Or Cinder's nfs_mount_options=retry=1 nfs heat template used-> $cat nfs11Cinder.yaml parameter_defaults: CinderEnableIscsiBackend: false CinderEnableRbdBackend: false CinderEnableNfsBackend: true CinderNfsMountOptions: '' CinderNfsServers: 'W.X.Y.Z:/export/ins_cinder' Cinder.conf -> [tripleo_nfs] volume_backend_name=tripleo_nfs volume_driver=cinder.volume.drivers.nfs.NfsDriver nfs_shares_config=/etc/cinder/shares-nfs.conf nfs_mount_options= nas_secure_file_operations=False nas_secure_file_permissions=False Basic Cinder sanity create volume worked, moving on. 1. Cinder create worked $ cinder list +--------------------------------------+-----------+------+------+-------------+----------+-------------+ | ID | Status | Name | Size | Volume Type | Bootable | Attached to | +--------------------------------------+-----------+------+------+-------------+----------+-------------+ | f15212c4-94f9-4dad-a40e-253f82412ffa | available | - | 1 | - | false | | +--------------------------------------+-----------+------+------+-------------+----------+-------------+ 2. Boot an instance $ nova list +--------------------------------------+-------+--------+------------+-------------+-----------------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-------+--------+------------+-------------+-----------------------------------+ | 8e7d2ee0-e106-45f6-8bc0-5d544035997d | inst1 | ACTIVE | - | Running | internal=192.168.0.3, 10.10.10.12 | +--------------------------------------+-------+--------+------------+-------------+-----------------------------------+ 3. Attach vol to instance nova volume-attach 8e7d2ee0-e106-45f6-8bc0-5d544035997d f15212c4-94f9-4dad-a40e-253f82412ffa auto +----------+--------------------------------------+ | Property | Value | +----------+--------------------------------------+ | device | /dev/vdb | | id | f15212c4-94f9-4dad-a40e-253f82412ffa | | serverId | 8e7d2ee0-e106-45f6-8bc0-5d544035997d | | volumeId | f15212c4-94f9-4dad-a40e-253f82412ffa | +----------+--------------------------------------+ 4. Now we see an attached vol. cinder list +--------------------------------------+--------+------+------+-------------+----------+--------------------------------------+ | ID | Status | Name | Size | Volume Type | Bootable | Attached to | +--------------------------------------+--------+------+------+-------------+----------+--------------------------------------+ | f15212c4-94f9-4dad-a40e-253f82412ffa | in-use | - | 1 | - | false | 8e7d2ee0-e106-45f6-8bc0-5d544035997d | +--------------------------------------+--------+------+------+-------------+----------+--------------------------------------+ 5. Migrate instance with an attached volume (verification..:) $openstack server migrate inst1 $ nova list +--------------------------------------+-------+--------+------------------+-------------+-----------------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-------+--------+------------------+-------------+-----------------------------------+ | 8e7d2ee0-e106-45f6-8bc0-5d544035997d | inst1 | RESIZE | resize_migrating | Running | internal=192.168.0.3, 10.10.10.12 | +--------------------------------------+-------+--------+------------------+-------------+-----------------------------------+ $ nova list +--------------------------------------+-------+---------------+------------+-------------+-----------------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-------+---------------+------------+-------------+-----------------------------------+ | 8e7d2ee0-e106-45f6-8bc0-5d544035997d | inst1 | VERIFY_RESIZE | - | Running | internal=192.168.0.3, 10.10.10.12 | +--------------------------------------+-------+---------------+------------+-------------+-----------------------------------+ $ nova resize-confirm inst1 6. Post migrate, inst1 alive and has an attached volume -. Argo verified :) [stack@undercloud-0 ~]$ nova list +--------------------------------------+-------+--------+------------+-------------+-----------------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-------+--------+------------+-------------+-----------------------------------+ | 8e7d2ee0-e106-45f6-8bc0-5d544035997d | inst1 | ACTIVE | - | Running | internal=192.168.0.3, 10.10.10.12 | +--------------------------------------+-------+--------+------------+-------------+-----------------------------------+ [stack@undercloud-0 ~]$ cinder list +--------------------------------------+--------+------+------+-------------+----------+--------------------------------------+ | ID | Status | Name | Size | Volume Type | Bootable | Attached to | +--------------------------------------+--------+------+------+-------------+----------+--------------------------------------+ | f15212c4-94f9-4dad-a40e-253f82412ffa | in-use | - | 1 | - | false | 8e7d2ee0-e106-45f6-8bc0-5d544035997d | +--------------------------------------+--------+------+------+-------------+----------+--------------------------------------+
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3098