Bug 1503214

Summary: OSP11 -> OSP12 upgrade: unable to migrate instance with cinder volume attached when cinder uses an NFS backend
Product: Red Hat OpenStack Reporter: Marius Cornea <mcornea>
Component: openstack-tripleo-heat-templatesAssignee: Ollie Walsh <owalsh>
Status: CLOSED ERRATA QA Contact: Marius Cornea <mcornea>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 12.0 (Pike)CC: abishop, berrange, cschwede, dasmith, dbecker, eglynn, gszasz, jschluet, kchamart, lyarwood, maandre, mbooth, mburns, mcornea, morazi, ohochman, owalsh, rhel-osp-director-maint, sbauza, scohen, sferdjao, sgordon, srevivo, vromanso
Target Milestone: rcKeywords: Triaged
Target Release: 12.0 (Pike)   
Hardware: x86_64   
OS: Linux   
Fixed In Version: puppet-tripleo-7.4.3-5.el7ost openstack-tripleo-heat-templates-7.0.3-6.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-13 22:15:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Description Flags
compute sosreport none

Description Marius Cornea 2017-10-17 15:19:20 UTC
Description of problem:
OSP11 -> OSP12 upgrade: unable to live migrate instance with cinder volume attached when cinder uses an NFS backend 

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Deploy OSP11 with NFS storage backend for Cinder and Glance

source ~/stackrc
export THT=/usr/share/openstack-tripleo-heat-templates/

openstack overcloud deploy --templates $THT \
-r ~/openstack_deployment/roles/roles_data.yaml \
-e $THT/environments/network-isolation.yaml \
-e $THT/environments/network-management.yaml \
-e ~/openstack_deployment/environments/nodes.yaml \
-e ~/openstack_deployment/environments/network-environment.yaml \
-e ~/openstack_deployment/environments/disk-layout.yaml \
-e ~/openstack_deployment/environments/scheduler_hints_env.yaml \
-e ~/openstack_deployment/environments/ips-from-pool-all.yaml \
-e ~/openstack_deployment/environments/neutron-settings.yaml \
-e ~/openstack_deployment/environments/nfs-storage.yaml \

[stack@undercloud-0 ~]$ cat  ~/openstack_deployment/environments/nfs-storage.yaml 
  CinderEnableIscsiBackend: false
  CinderEnableRbdBackend: false
  CinderEnableNfsBackend: true
  NovaEnableRbdBackend: false

  CinderNfsMountOptions: 'rw,sync'
  CinderNfsServers: ''

  GlanceBackend: 'file'
  GlanceNfsEnabled: true
  GlanceNfsShare: ''

2. On the overcloud spawn an instance, create a cinder volume and attach it to the instance:

(overcloud) [stack@undercloud-0 ~]$ openstack server show 7c99d3ce-174e-4456-ade5-603538ddcd62
| Field                               | Value                                                                                               |
| OS-DCF:diskConfig                   | MANUAL                                                                                              |
| OS-EXT-AZ:availability_zone         | nova                                                                                                |
| OS-EXT-SRV-ATTR:host                | compute-r00-00.redhat.local                                                                         |
| OS-EXT-SRV-ATTR:hypervisor_hostname | compute-r00-00.redhat.local                                                                         |
| OS-EXT-SRV-ATTR:instance_name       | instance-00000005                                                                                   |
| OS-EXT-STS:power_state              | Running                                                                                             |
| OS-EXT-STS:task_state               | None                                                                                                |
| OS-EXT-STS:vm_state                 | active                                                                                              |
| OS-SRV-USG:launched_at              | 2017-10-17T11:45:39.000000                                                                          |
| OS-SRV-USG:terminated_at            | None                                                                                                |
| accessIPv4                          |                                                                                                     |
| accessIPv6                          |                                                                                                     |
| addresses                           | stack-4e-tenant_net_ext_tagged-4c5dxaeejyka-private_network-hcrf7lquthxh=, |
| config_drive                        |                                                                                                     |
| created                             | 2017-10-17T11:45:28Z                                                                                |
| flavor                              | v1-1G-5G (88303552-0ee2-47fe-8def-27771c2951d6)                                                     |
| hostId                              | 8e1b9e16d10b7ce4f930a3b327b93e00c65be4e52854150c333b5986                                            |
| id                                  | 7c99d3ce-174e-4456-ade5-603538ddcd62                                                                |
| image                               | Fedora (2fb4ad83-2ba3-464a-9c5b-03e29a1b77fc)                                                       |
| key_name                            | userkey                                                                                             |
| name                                | st--4e-instance-tu2664t5uf5y-my_instance-k3em7xci6qbi                                               |
| progress                            | 0                                                                                                   |
| project_id                          | 36228a3612bd48f89eaca2cdcd42d658                                                                    |
| properties                          |                                                                                                     |
| security_groups                     | name='server_security_group'                                                                        |
| status                              | ACTIVE                                                                                              |
| updated                             | 2017-10-17T13:47:37Z                                                                                |
| user_id                             | 72e1014d77af4fb48a2dc80cbb9d1777                                                                    |
| volumes_attached                    | id='66587ea1-d9fe-4ca8-b788-f0e3e573841e'                                                           |

3. Upgrade undercloud to OSP12

4. Run major-upgrade-composable-steps-docker upgrade step to OSP12

5. Upgrade one of the compute nodes from the environment:

upgrade-non-controller.sh --upgrade compute-r01-01

6. Reboot the node that has been upgraded in step5

6. Migrate the instance created in step1 from the next compute node that is going to be upgraded:

nova host-evacuate-live --block-migrate compute-r00-00.redhat.local

Actual results:
Migration cannot be performed, the instance cannot be migrated from compute-r00-00.redhat.local

Expected results:
Migration succeeds.

Additional info:
Checking /var/log/nova/nova-compute.log on compute-r00-00.redhat.local we can see:

2017-10-17 15:02:20.967 19336 ERROR nova.virt.libvirt.driver [req-4a28ab1f-1f1a-4e5e-baa6-070d36a98f48 72e1014d77af4fb48a2dc80cbb9d1777 36228a3612bd48f89eaca2cdcd42d658 - - -] [instance: 7c99d3ce-174e-4456-ade5-603538ddcd62] Live Migration failure: Cannot access storage file '/var/lib/nova/mnt/93dfa45819ccd57c0cb9b93cd07c9128/volume-66587ea1-d9fe-4ca8-b788-f0e3e573841e' (as uid:107, gid:107): No such file or directory
2017-10-17 15:02:21.220 19336 ERROR nova.virt.libvirt.driver [req-4a28ab1f-1f1a-4e5e-baa6-070d36a98f48 72e1014d77af4fb48a2dc80cbb9d1777 36228a3612bd48f89eaca2cdcd42d658 - - -] [instance: 7c99d3ce-174e-4456-ade5-603538ddcd62] Migration operation has aborted

Checking the mounts we can see the mount is available:

[root@compute-r00-00 heat-admin]# grep cinder /proc/mounts /var/lib/nova/mnt/93dfa45819ccd57c0cb9b93cd07c9128 nfs4 rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=,local_lock=none,addr= 0 0

[root@compute-r00-00 heat-admin]# ls -l /var/lib/nova/mnt/93dfa45819ccd57c0cb9b93cd07c9128/volume-66587ea1-d9fe-4ca8-b788-f0e3e573841e
-rw-rw-rw-. 1 qemu qemu 1073741824 Oct 17 11:49 /var/lib/nova/mnt/93dfa45819ccd57c0cb9b93cd07c9128/volume-66587ea1-d9fe-4ca8-b788-f0e3e573841e

Attaching the sosreport from the compute node where the instance can't be migrated from.

Comment 1 Marius Cornea 2017-10-17 15:31:03 UTC
Created attachment 1339780 [details]
compute sosreport

Comment 2 Marius Cornea 2017-10-17 15:31:50 UTC
I can provide a reproducing environment, please let me know when someone can look into this issue. Thanks!

Comment 5 Eric Harney 2017-10-24 15:56:39 UTC
You need to set nas_secure_file_operations=False and nas_secure_file_permissions=False in the backend's section of cinder.conf, and then this should work.

Comment 6 Marius Cornea 2017-10-24 16:42:34 UTC
(In reply to Eric Harney from comment #5)
> You need to set nas_secure_file_operations=False and
> nas_secure_file_permissions=False in the backend's section of cinder.conf,
> and then this should work.

If I understand correctly these should be set as defaults in OSP11 by the patches attached to bug 1440700 which is verified. Is this a regression introduced by OSP12?

Comment 7 Alan Bishop 2017-10-24 17:08:01 UTC
I suspect the problem is same as bug #1491597. I tried to look at the setup but it looks like it's been redployed.

Marius, can you try this without Glance using NFS for its backend?

Comment 9 Alan Bishop 2017-10-26 15:40:05 UTC
I'd like someone form DFG:Compute to take a look at this. There may be a
storage issue, but I need help connecting the dots.

I signed onto Marius's system in comment #8 and here's what I found.

- The nas_secure settings are correct (set False).

- The "Live Migration failure: Cannot access storage file '<blah>' ...: No
  such file or directory" error is seen on the compute node that's still
  running OSP-11 (happens to be compute-r01-01).

- As noted in the BZ description, the specified volume _is_ present, so the
  "No such file or directory" error seems confusing.

- Then I looked on compute-r00-00, which is now running OSP-12. In the
  nova_compute container, there are no entries at all in
  /var/lib/nova/mnt. But there's an entry in the container's nova-compute.log:

/var/log/containers/nova/nova-compute.log:2017-10-25 18:44:21.714 1 WARNING
nova.virt.libvirt.volume.mount [req-204643d9-8df7-42ab-a786-e5d0c2148ac1
f73300d9077b4fde91f601b5465ed9d0 c3ff2f236e3649409fb67f9fb87ecd2b - - -]
Request to remove attachment (volume-<blah>) from
/var/lib/nova/mnt/93dfa45819ccd57c0cb9b93cd07c9128, but we don't think it's in

- I don't see any NFS mounts on the OSP-12 compute node, or logs indicating it
  tried to mount something and failed.

Comment 27 errata-xmlrpc 2017-12-13 22:15:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.