1503214 – OSP11 -> OSP12 upgrade: unable to migrate instance with cinder volume attached when cinder uses an NFS backend

Bug 1503214 - OSP11 -> OSP12 upgrade: unable to migrate instance with cinder volume attached when cinder uses an NFS backend

Summary: OSP11 -> OSP12 upgrade: unable to migrate instance with cinder volume attache...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-tripleo-heat-templates
Sub Component:
Version:	12.0 (Pike)
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	rc
Target Release:	12.0 (Pike)
Assignee:	Ollie Walsh
QA Contact:	Marius Cornea
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-10-17 15:19 UTC by Marius Cornea
Modified:	2023-02-22 23:02 UTC (History)
CC List:	24 users (show)
Fixed In Version:	puppet-tripleo-7.4.3-5.el7ost openstack-tripleo-heat-templates-7.0.3-6.el7ost
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-12-13 22:15:46 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
compute sosreport (10.24 MB, application/x-xz) 2017-10-17 15:31 UTC, Marius Cornea	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Launchpad	1730533	None	None	None	2017-11-07 00:36:41 UTC
OpenStack gerrit	518548	None	MERGED	Unset MountFlags in docker.service systemd directives	2020-02-07 05:08:50 UTC
OpenStack gerrit	518554	None	MERGED	Set bind mount propegatation to shared for /var/lib/nova.	2020-02-07 05:08:50 UTC
Red Hat Bugzilla	1440700	medium	CLOSED	Unable to live migrate Nova instance with attached NFS backed Cinder volume	2021-02-22 00:41:40 UTC
Red Hat Product Errata	RHEA-2017:3462	normal	SHIPPED_LIVE	Red Hat OpenStack Platform 12.0 Enhancement Advisory	2018-02-16 01:43:25 UTC

Internal Links: 1440700

Description Marius Cornea 2017-10-17 15:19:20 UTC

Description of problem:
OSP11 -> OSP12 upgrade: unable to live migrate instance with cinder volume attached when cinder uses an NFS backend 

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-7.0.2-0.20171007062244.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy OSP11 with NFS storage backend for Cinder and Glance

source ~/stackrc
export THT=/usr/share/openstack-tripleo-heat-templates/

openstack overcloud deploy --templates $THT \
-r ~/openstack_deployment/roles/roles_data.yaml \
-e $THT/environments/network-isolation.yaml \
-e $THT/environments/network-management.yaml \
-e ~/openstack_deployment/environments/nodes.yaml \
-e ~/openstack_deployment/environments/network-environment.yaml \
-e ~/openstack_deployment/environments/disk-layout.yaml \
-e ~/openstack_deployment/environments/scheduler_hints_env.yaml \
-e ~/openstack_deployment/environments/ips-from-pool-all.yaml \
-e ~/openstack_deployment/environments/neutron-settings.yaml \
-e ~/openstack_deployment/environments/nfs-storage.yaml \

[stack@undercloud-0 ~]$ cat  ~/openstack_deployment/environments/nfs-storage.yaml 
parameter_defaults:
  CinderEnableIscsiBackend: false
  CinderEnableRbdBackend: false
  CinderEnableNfsBackend: true
  NovaEnableRbdBackend: false

  CinderNfsMountOptions: 'rw,sync'
  CinderNfsServers: '10.0.0.254:/srv/nfs/cinder'

  GlanceBackend: 'file'
  GlanceNfsEnabled: true
  GlanceNfsShare: '10.0.0.254:/srv/nfs/glance'

2. On the overcloud spawn an instance, create a cinder volume and attach it to the instance:

(overcloud) [stack@undercloud-0 ~]$ openstack server show 7c99d3ce-174e-4456-ade5-603538ddcd62
+-------------------------------------+-----------------------------------------------------------------------------------------------------+
| Field                               | Value                                                                                               |
+-------------------------------------+-----------------------------------------------------------------------------------------------------+
| OS-DCF:diskConfig                   | MANUAL                                                                                              |
| OS-EXT-AZ:availability_zone         | nova                                                                                                |
| OS-EXT-SRV-ATTR:host                | compute-r00-00.redhat.local                                                                         |
| OS-EXT-SRV-ATTR:hypervisor_hostname | compute-r00-00.redhat.local                                                                         |
| OS-EXT-SRV-ATTR:instance_name       | instance-00000005                                                                                   |
| OS-EXT-STS:power_state              | Running                                                                                             |
| OS-EXT-STS:task_state               | None                                                                                                |
| OS-EXT-STS:vm_state                 | active                                                                                              |
| OS-SRV-USG:launched_at              | 2017-10-17T11:45:39.000000                                                                          |
| OS-SRV-USG:terminated_at            | None                                                                                                |
| accessIPv4                          |                                                                                                     |
| accessIPv6                          |                                                                                                     |
| addresses                           | stack-4e-tenant_net_ext_tagged-4c5dxaeejyka-private_network-hcrf7lquthxh=10.10.10.10, 172.16.18.142 |
| config_drive                        |                                                                                                     |
| created                             | 2017-10-17T11:45:28Z                                                                                |
| flavor                              | v1-1G-5G (88303552-0ee2-47fe-8def-27771c2951d6)                                                     |
| hostId                              | 8e1b9e16d10b7ce4f930a3b327b93e00c65be4e52854150c333b5986                                            |
| id                                  | 7c99d3ce-174e-4456-ade5-603538ddcd62                                                                |
| image                               | Fedora (2fb4ad83-2ba3-464a-9c5b-03e29a1b77fc)                                                       |
| key_name                            | userkey                                                                                             |
| name                                | st--4e-instance-tu2664t5uf5y-my_instance-k3em7xci6qbi                                               |
| progress                            | 0                                                                                                   |
| project_id                          | 36228a3612bd48f89eaca2cdcd42d658                                                                    |
| properties                          |                                                                                                     |
| security_groups                     | name='server_security_group'                                                                        |
| status                              | ACTIVE                                                                                              |
| updated                             | 2017-10-17T13:47:37Z                                                                                |
| user_id                             | 72e1014d77af4fb48a2dc80cbb9d1777                                                                    |
| volumes_attached                    | id='66587ea1-d9fe-4ca8-b788-f0e3e573841e'                                                           |
+-------------------------------------+-----------------------------------------------------------------------------------------------------+

3. Upgrade undercloud to OSP12

4. Run major-upgrade-composable-steps-docker upgrade step to OSP12

5. Upgrade one of the compute nodes from the environment:

upgrade-non-controller.sh --upgrade compute-r01-01

6. Reboot the node that has been upgraded in step5

6. Migrate the instance created in step1 from the next compute node that is going to be upgraded:

nova host-evacuate-live --block-migrate compute-r00-00.redhat.local

Actual results:
Migration cannot be performed, the instance cannot be migrated from compute-r00-00.redhat.local

Expected results:
Migration succeeds.

Additional info:
Checking /var/log/nova/nova-compute.log on compute-r00-00.redhat.local we can see:

2017-10-17 15:02:20.967 19336 ERROR nova.virt.libvirt.driver [req-4a28ab1f-1f1a-4e5e-baa6-070d36a98f48 72e1014d77af4fb48a2dc80cbb9d1777 36228a3612bd48f89eaca2cdcd42d658 - - -] [instance: 7c99d3ce-174e-4456-ade5-603538ddcd62] Live Migration failure: Cannot access storage file '/var/lib/nova/mnt/93dfa45819ccd57c0cb9b93cd07c9128/volume-66587ea1-d9fe-4ca8-b788-f0e3e573841e' (as uid:107, gid:107): No such file or directory
2017-10-17 15:02:21.220 19336 ERROR nova.virt.libvirt.driver [req-4a28ab1f-1f1a-4e5e-baa6-070d36a98f48 72e1014d77af4fb48a2dc80cbb9d1777 36228a3612bd48f89eaca2cdcd42d658 - - -] [instance: 7c99d3ce-174e-4456-ade5-603538ddcd62] Migration operation has aborted

Checking the mounts we can see the mount is available:

[root@compute-r00-00 heat-admin]# grep cinder /proc/mounts 
10.0.0.254:/srv/nfs/cinder /var/lib/nova/mnt/93dfa45819ccd57c0cb9b93cd07c9128 nfs4 rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.0.0.133,local_lock=none,addr=10.0.0.254 0 0

[root@compute-r00-00 heat-admin]# ls -l /var/lib/nova/mnt/93dfa45819ccd57c0cb9b93cd07c9128/volume-66587ea1-d9fe-4ca8-b788-f0e3e573841e
-rw-rw-rw-. 1 qemu qemu 1073741824 Oct 17 11:49 /var/lib/nova/mnt/93dfa45819ccd57c0cb9b93cd07c9128/volume-66587ea1-d9fe-4ca8-b788-f0e3e573841e

Attaching the sosreport from the compute node where the instance can't be migrated from.

Comment 1 Marius Cornea 2017-10-17 15:31:03 UTC

Created attachment 1339780 [details]
compute sosreport

Comment 2 Marius Cornea 2017-10-17 15:31:50 UTC

I can provide a reproducing environment, please let me know when someone can look into this issue. Thanks!

Comment 5 Eric Harney 2017-10-24 15:56:39 UTC

You need to set nas_secure_file_operations=False and nas_secure_file_permissions=False in the backend's section of cinder.conf, and then this should work.

Comment 6 Marius Cornea 2017-10-24 16:42:34 UTC

(In reply to Eric Harney from comment #5)
> You need to set nas_secure_file_operations=False and
> nas_secure_file_permissions=False in the backend's section of cinder.conf,
> and then this should work.

If I understand correctly these should be set as defaults in OSP11 by the patches attached to bug 1440700 which is verified. Is this a regression introduced by OSP12?

Comment 7 Alan Bishop 2017-10-24 17:08:01 UTC

I suspect the problem is same as bug #1491597. I tried to look at the setup but it looks like it's been redployed.

Marius, can you try this without Glance using NFS for its backend?

Comment 9 Alan Bishop 2017-10-26 15:40:05 UTC

I'd like someone form DFG:Compute to take a look at this. There may be a
storage issue, but I need help connecting the dots.

I signed onto Marius's system in comment #8 and here's what I found.

- The nas_secure settings are correct (set False).

- The "Live Migration failure: Cannot access storage file '<blah>' ...: No
  such file or directory" error is seen on the compute node that's still
  running OSP-11 (happens to be compute-r01-01).

- As noted in the BZ description, the specified volume _is_ present, so the
  "No such file or directory" error seems confusing.

- Then I looked on compute-r00-00, which is now running OSP-12. In the
  nova_compute container, there are no entries at all in
  /var/lib/nova/mnt. But there's an entry in the container's nova-compute.log:

/var/log/containers/nova/nova-compute.log:2017-10-25 18:44:21.714 1 WARNING
nova.virt.libvirt.volume.mount [req-204643d9-8df7-42ab-a786-e5d0c2148ac1
f73300d9077b4fde91f601b5465ed9d0 c3ff2f236e3649409fb67f9fb87ecd2b - - -]
Request to remove attachment (volume-<blah>) from
/var/lib/nova/mnt/93dfa45819ccd57c0cb9b93cd07c9128, but we don't think it's in
use.

- I don't see any NFS mounts on the OSP-12 compute node, or logs indicating it
  tried to mount something and failed.

Comment 27 errata-xmlrpc 2017-12-13 22:15:46 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462

Note You need to log in before you can comment on or make changes to this bug.

abishop
berrange
cschwede
dasmith
dbecker
eglynn
gszasz
jschluet
kchamart
lyarwood
maandre
mbooth
mburns
mcornea
morazi
ohochman
owalsh
rhel-osp-director-maint
sbauza
scohen
sferdjao
sgordon
srevivo
vromanso