Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1503214 - OSP11 -> OSP12 upgrade: unable to migrate instance with cinder volume attached when cinder uses an NFS backend
OSP11 -> OSP12 upgrade: unable to migrate instance with cinder volume attache...
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates (Show other bugs)
12.0 (Pike)
x86_64 Linux
urgent Severity urgent
: rc
: 12.0 (Pike)
Assigned To: Ollie Walsh
Marius Cornea
: Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-10-17 11:19 EDT by Marius Cornea
Modified: 2018-02-05 14:15 EST (History)
26 users (show)

See Also:
Fixed In Version: puppet-tripleo-7.4.3-5.el7ost openstack-tripleo-heat-templates-7.0.3-6.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-12-13 17:15:46 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
compute sosreport (10.24 MB, application/x-xz)
2017-10-17 11:31 EDT, Marius Cornea
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Launchpad 1730533 None None None 2017-11-06 19:36 EST
OpenStack gerrit 518548 None None None 2017-11-09 07:22 EST
OpenStack gerrit 518554 None None None 2017-11-09 07:23 EST
Red Hat Product Errata RHEA-2017:3462 normal SHIPPED_LIVE Red Hat OpenStack Platform 12.0 Enhancement Advisory 2018-02-15 20:43:25 EST

  None (edit)
Description Marius Cornea 2017-10-17 11:19:20 EDT
Description of problem:
OSP11 -> OSP12 upgrade: unable to live migrate instance with cinder volume attached when cinder uses an NFS backend 

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-7.0.2-0.20171007062244.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy OSP11 with NFS storage backend for Cinder and Glance

source ~/stackrc
export THT=/usr/share/openstack-tripleo-heat-templates/

openstack overcloud deploy --templates $THT \
-r ~/openstack_deployment/roles/roles_data.yaml \
-e $THT/environments/network-isolation.yaml \
-e $THT/environments/network-management.yaml \
-e ~/openstack_deployment/environments/nodes.yaml \
-e ~/openstack_deployment/environments/network-environment.yaml \
-e ~/openstack_deployment/environments/disk-layout.yaml \
-e ~/openstack_deployment/environments/scheduler_hints_env.yaml \
-e ~/openstack_deployment/environments/ips-from-pool-all.yaml \
-e ~/openstack_deployment/environments/neutron-settings.yaml \
-e ~/openstack_deployment/environments/nfs-storage.yaml \

[stack@undercloud-0 ~]$ cat  ~/openstack_deployment/environments/nfs-storage.yaml 
parameter_defaults:
  CinderEnableIscsiBackend: false
  CinderEnableRbdBackend: false
  CinderEnableNfsBackend: true
  NovaEnableRbdBackend: false

  CinderNfsMountOptions: 'rw,sync'
  CinderNfsServers: '10.0.0.254:/srv/nfs/cinder'

  GlanceBackend: 'file'
  GlanceNfsEnabled: true
  GlanceNfsShare: '10.0.0.254:/srv/nfs/glance'

2. On the overcloud spawn an instance, create a cinder volume and attach it to the instance:

(overcloud) [stack@undercloud-0 ~]$ openstack server show 7c99d3ce-174e-4456-ade5-603538ddcd62
+-------------------------------------+-----------------------------------------------------------------------------------------------------+
| Field                               | Value                                                                                               |
+-------------------------------------+-----------------------------------------------------------------------------------------------------+
| OS-DCF:diskConfig                   | MANUAL                                                                                              |
| OS-EXT-AZ:availability_zone         | nova                                                                                                |
| OS-EXT-SRV-ATTR:host                | compute-r00-00.redhat.local                                                                         |
| OS-EXT-SRV-ATTR:hypervisor_hostname | compute-r00-00.redhat.local                                                                         |
| OS-EXT-SRV-ATTR:instance_name       | instance-00000005                                                                                   |
| OS-EXT-STS:power_state              | Running                                                                                             |
| OS-EXT-STS:task_state               | None                                                                                                |
| OS-EXT-STS:vm_state                 | active                                                                                              |
| OS-SRV-USG:launched_at              | 2017-10-17T11:45:39.000000                                                                          |
| OS-SRV-USG:terminated_at            | None                                                                                                |
| accessIPv4                          |                                                                                                     |
| accessIPv6                          |                                                                                                     |
| addresses                           | stack-4e-tenant_net_ext_tagged-4c5dxaeejyka-private_network-hcrf7lquthxh=10.10.10.10, 172.16.18.142 |
| config_drive                        |                                                                                                     |
| created                             | 2017-10-17T11:45:28Z                                                                                |
| flavor                              | v1-1G-5G (88303552-0ee2-47fe-8def-27771c2951d6)                                                     |
| hostId                              | 8e1b9e16d10b7ce4f930a3b327b93e00c65be4e52854150c333b5986                                            |
| id                                  | 7c99d3ce-174e-4456-ade5-603538ddcd62                                                                |
| image                               | Fedora (2fb4ad83-2ba3-464a-9c5b-03e29a1b77fc)                                                       |
| key_name                            | userkey                                                                                             |
| name                                | st--4e-instance-tu2664t5uf5y-my_instance-k3em7xci6qbi                                               |
| progress                            | 0                                                                                                   |
| project_id                          | 36228a3612bd48f89eaca2cdcd42d658                                                                    |
| properties                          |                                                                                                     |
| security_groups                     | name='server_security_group'                                                                        |
| status                              | ACTIVE                                                                                              |
| updated                             | 2017-10-17T13:47:37Z                                                                                |
| user_id                             | 72e1014d77af4fb48a2dc80cbb9d1777                                                                    |
| volumes_attached                    | id='66587ea1-d9fe-4ca8-b788-f0e3e573841e'                                                           |
+-------------------------------------+-----------------------------------------------------------------------------------------------------+

3. Upgrade undercloud to OSP12

4. Run major-upgrade-composable-steps-docker upgrade step to OSP12

5. Upgrade one of the compute nodes from the environment:

upgrade-non-controller.sh --upgrade compute-r01-01

6. Reboot the node that has been upgraded in step5

6. Migrate the instance created in step1 from the next compute node that is going to be upgraded:

nova host-evacuate-live --block-migrate compute-r00-00.redhat.local

Actual results:
Migration cannot be performed, the instance cannot be migrated from compute-r00-00.redhat.local

Expected results:
Migration succeeds.

Additional info:
Checking /var/log/nova/nova-compute.log on compute-r00-00.redhat.local we can see:

2017-10-17 15:02:20.967 19336 ERROR nova.virt.libvirt.driver [req-4a28ab1f-1f1a-4e5e-baa6-070d36a98f48 72e1014d77af4fb48a2dc80cbb9d1777 36228a3612bd48f89eaca2cdcd42d658 - - -] [instance: 7c99d3ce-174e-4456-ade5-603538ddcd62] Live Migration failure: Cannot access storage file '/var/lib/nova/mnt/93dfa45819ccd57c0cb9b93cd07c9128/volume-66587ea1-d9fe-4ca8-b788-f0e3e573841e' (as uid:107, gid:107): No such file or directory
2017-10-17 15:02:21.220 19336 ERROR nova.virt.libvirt.driver [req-4a28ab1f-1f1a-4e5e-baa6-070d36a98f48 72e1014d77af4fb48a2dc80cbb9d1777 36228a3612bd48f89eaca2cdcd42d658 - - -] [instance: 7c99d3ce-174e-4456-ade5-603538ddcd62] Migration operation has aborted

Checking the mounts we can see the mount is available:

[root@compute-r00-00 heat-admin]# grep cinder /proc/mounts 
10.0.0.254:/srv/nfs/cinder /var/lib/nova/mnt/93dfa45819ccd57c0cb9b93cd07c9128 nfs4 rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.0.0.133,local_lock=none,addr=10.0.0.254 0 0

[root@compute-r00-00 heat-admin]# ls -l /var/lib/nova/mnt/93dfa45819ccd57c0cb9b93cd07c9128/volume-66587ea1-d9fe-4ca8-b788-f0e3e573841e
-rw-rw-rw-. 1 qemu qemu 1073741824 Oct 17 11:49 /var/lib/nova/mnt/93dfa45819ccd57c0cb9b93cd07c9128/volume-66587ea1-d9fe-4ca8-b788-f0e3e573841e

Attaching the sosreport from the compute node where the instance can't be migrated from.
Comment 1 Marius Cornea 2017-10-17 11:31 EDT
Created attachment 1339780 [details]
compute sosreport
Comment 2 Marius Cornea 2017-10-17 11:31:50 EDT
I can provide a reproducing environment, please let me know when someone can look into this issue. Thanks!
Comment 5 Eric Harney 2017-10-24 11:56:39 EDT
You need to set nas_secure_file_operations=False and nas_secure_file_permissions=False in the backend's section of cinder.conf, and then this should work.
Comment 6 Marius Cornea 2017-10-24 12:42:34 EDT
(In reply to Eric Harney from comment #5)
> You need to set nas_secure_file_operations=False and
> nas_secure_file_permissions=False in the backend's section of cinder.conf,
> and then this should work.

If I understand correctly these should be set as defaults in OSP11 by the patches attached to bug 1440700 which is verified. Is this a regression introduced by OSP12?
Comment 7 Alan Bishop 2017-10-24 13:08:01 EDT
I suspect the problem is same as bug #1491597. I tried to look at the setup but it looks like it's been redployed.

Marius, can you try this without Glance using NFS for its backend?
Comment 9 Alan Bishop 2017-10-26 11:40:05 EDT
I'd like someone form DFG:Compute to take a look at this. There may be a
storage issue, but I need help connecting the dots.

I signed onto Marius's system in comment #8 and here's what I found.

- The nas_secure settings are correct (set False).

- The "Live Migration failure: Cannot access storage file '<blah>' ...: No
  such file or directory" error is seen on the compute node that's still
  running OSP-11 (happens to be compute-r01-01).

- As noted in the BZ description, the specified volume _is_ present, so the
  "No such file or directory" error seems confusing.

- Then I looked on compute-r00-00, which is now running OSP-12. In the
  nova_compute container, there are no entries at all in
  /var/lib/nova/mnt. But there's an entry in the container's nova-compute.log:

/var/log/containers/nova/nova-compute.log:2017-10-25 18:44:21.714 1 WARNING
nova.virt.libvirt.volume.mount [req-204643d9-8df7-42ab-a786-e5d0c2148ac1
f73300d9077b4fde91f601b5465ed9d0 c3ff2f236e3649409fb67f9fb87ecd2b - - -]
Request to remove attachment (volume-<blah>) from
/var/lib/nova/mnt/93dfa45819ccd57c0cb9b93cd07c9128, but we don't think it's in
use.

- I don't see any NFS mounts on the OSP-12 compute node, or logs indicating it
  tried to mount something and failed.
Comment 27 errata-xmlrpc 2017-12-13 17:15:46 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462

Note You need to log in before you can comment on or make changes to this bug.