Hide Forgot
Description of problem: OSP11 -> OSP12 upgrade: unable to live migrate instance with cinder volume attached when cinder uses an NFS backend Version-Release number of selected component (if applicable): openstack-tripleo-heat-templates-7.0.2-0.20171007062244.el7ost.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy OSP11 with NFS storage backend for Cinder and Glance source ~/stackrc export THT=/usr/share/openstack-tripleo-heat-templates/ openstack overcloud deploy --templates $THT \ -r ~/openstack_deployment/roles/roles_data.yaml \ -e $THT/environments/network-isolation.yaml \ -e $THT/environments/network-management.yaml \ -e ~/openstack_deployment/environments/nodes.yaml \ -e ~/openstack_deployment/environments/network-environment.yaml \ -e ~/openstack_deployment/environments/disk-layout.yaml \ -e ~/openstack_deployment/environments/scheduler_hints_env.yaml \ -e ~/openstack_deployment/environments/ips-from-pool-all.yaml \ -e ~/openstack_deployment/environments/neutron-settings.yaml \ -e ~/openstack_deployment/environments/nfs-storage.yaml \ [stack@undercloud-0 ~]$ cat ~/openstack_deployment/environments/nfs-storage.yaml parameter_defaults: CinderEnableIscsiBackend: false CinderEnableRbdBackend: false CinderEnableNfsBackend: true NovaEnableRbdBackend: false CinderNfsMountOptions: 'rw,sync' CinderNfsServers: '10.0.0.254:/srv/nfs/cinder' GlanceBackend: 'file' GlanceNfsEnabled: true GlanceNfsShare: '10.0.0.254:/srv/nfs/glance' 2. On the overcloud spawn an instance, create a cinder volume and attach it to the instance: (overcloud) [stack@undercloud-0 ~]$ openstack server show 7c99d3ce-174e-4456-ade5-603538ddcd62 +-------------------------------------+-----------------------------------------------------------------------------------------------------+ | Field | Value | +-------------------------------------+-----------------------------------------------------------------------------------------------------+ | OS-DCF:diskConfig | MANUAL | | OS-EXT-AZ:availability_zone | nova | | OS-EXT-SRV-ATTR:host | compute-r00-00.redhat.local | | OS-EXT-SRV-ATTR:hypervisor_hostname | compute-r00-00.redhat.local | | OS-EXT-SRV-ATTR:instance_name | instance-00000005 | | OS-EXT-STS:power_state | Running | | OS-EXT-STS:task_state | None | | OS-EXT-STS:vm_state | active | | OS-SRV-USG:launched_at | 2017-10-17T11:45:39.000000 | | OS-SRV-USG:terminated_at | None | | accessIPv4 | | | accessIPv6 | | | addresses | stack-4e-tenant_net_ext_tagged-4c5dxaeejyka-private_network-hcrf7lquthxh=10.10.10.10, 172.16.18.142 | | config_drive | | | created | 2017-10-17T11:45:28Z | | flavor | v1-1G-5G (88303552-0ee2-47fe-8def-27771c2951d6) | | hostId | 8e1b9e16d10b7ce4f930a3b327b93e00c65be4e52854150c333b5986 | | id | 7c99d3ce-174e-4456-ade5-603538ddcd62 | | image | Fedora (2fb4ad83-2ba3-464a-9c5b-03e29a1b77fc) | | key_name | userkey | | name | st--4e-instance-tu2664t5uf5y-my_instance-k3em7xci6qbi | | progress | 0 | | project_id | 36228a3612bd48f89eaca2cdcd42d658 | | properties | | | security_groups | name='server_security_group' | | status | ACTIVE | | updated | 2017-10-17T13:47:37Z | | user_id | 72e1014d77af4fb48a2dc80cbb9d1777 | | volumes_attached | id='66587ea1-d9fe-4ca8-b788-f0e3e573841e' | +-------------------------------------+-----------------------------------------------------------------------------------------------------+ 3. Upgrade undercloud to OSP12 4. Run major-upgrade-composable-steps-docker upgrade step to OSP12 5. Upgrade one of the compute nodes from the environment: upgrade-non-controller.sh --upgrade compute-r01-01 6. Reboot the node that has been upgraded in step5 6. Migrate the instance created in step1 from the next compute node that is going to be upgraded: nova host-evacuate-live --block-migrate compute-r00-00.redhat.local Actual results: Migration cannot be performed, the instance cannot be migrated from compute-r00-00.redhat.local Expected results: Migration succeeds. Additional info: Checking /var/log/nova/nova-compute.log on compute-r00-00.redhat.local we can see: 2017-10-17 15:02:20.967 19336 ERROR nova.virt.libvirt.driver [req-4a28ab1f-1f1a-4e5e-baa6-070d36a98f48 72e1014d77af4fb48a2dc80cbb9d1777 36228a3612bd48f89eaca2cdcd42d658 - - -] [instance: 7c99d3ce-174e-4456-ade5-603538ddcd62] Live Migration failure: Cannot access storage file '/var/lib/nova/mnt/93dfa45819ccd57c0cb9b93cd07c9128/volume-66587ea1-d9fe-4ca8-b788-f0e3e573841e' (as uid:107, gid:107): No such file or directory 2017-10-17 15:02:21.220 19336 ERROR nova.virt.libvirt.driver [req-4a28ab1f-1f1a-4e5e-baa6-070d36a98f48 72e1014d77af4fb48a2dc80cbb9d1777 36228a3612bd48f89eaca2cdcd42d658 - - -] [instance: 7c99d3ce-174e-4456-ade5-603538ddcd62] Migration operation has aborted Checking the mounts we can see the mount is available: [root@compute-r00-00 heat-admin]# grep cinder /proc/mounts 10.0.0.254:/srv/nfs/cinder /var/lib/nova/mnt/93dfa45819ccd57c0cb9b93cd07c9128 nfs4 rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.0.0.133,local_lock=none,addr=10.0.0.254 0 0 [root@compute-r00-00 heat-admin]# ls -l /var/lib/nova/mnt/93dfa45819ccd57c0cb9b93cd07c9128/volume-66587ea1-d9fe-4ca8-b788-f0e3e573841e -rw-rw-rw-. 1 qemu qemu 1073741824 Oct 17 11:49 /var/lib/nova/mnt/93dfa45819ccd57c0cb9b93cd07c9128/volume-66587ea1-d9fe-4ca8-b788-f0e3e573841e Attaching the sosreport from the compute node where the instance can't be migrated from.
Created attachment 1339780 [details] compute sosreport
I can provide a reproducing environment, please let me know when someone can look into this issue. Thanks!
You need to set nas_secure_file_operations=False and nas_secure_file_permissions=False in the backend's section of cinder.conf, and then this should work.
(In reply to Eric Harney from comment #5) > You need to set nas_secure_file_operations=False and > nas_secure_file_permissions=False in the backend's section of cinder.conf, > and then this should work. If I understand correctly these should be set as defaults in OSP11 by the patches attached to bug 1440700 which is verified. Is this a regression introduced by OSP12?
I suspect the problem is same as bug #1491597. I tried to look at the setup but it looks like it's been redployed. Marius, can you try this without Glance using NFS for its backend?
I'd like someone form DFG:Compute to take a look at this. There may be a storage issue, but I need help connecting the dots. I signed onto Marius's system in comment #8 and here's what I found. - The nas_secure settings are correct (set False). - The "Live Migration failure: Cannot access storage file '<blah>' ...: No such file or directory" error is seen on the compute node that's still running OSP-11 (happens to be compute-r01-01). - As noted in the BZ description, the specified volume _is_ present, so the "No such file or directory" error seems confusing. - Then I looked on compute-r00-00, which is now running OSP-12. In the nova_compute container, there are no entries at all in /var/lib/nova/mnt. But there's an entry in the container's nova-compute.log: /var/log/containers/nova/nova-compute.log:2017-10-25 18:44:21.714 1 WARNING nova.virt.libvirt.volume.mount [req-204643d9-8df7-42ab-a786-e5d0c2148ac1 f73300d9077b4fde91f601b5465ed9d0 c3ff2f236e3649409fb67f9fb87ecd2b - - -] Request to remove attachment (volume-<blah>) from /var/lib/nova/mnt/93dfa45819ccd57c0cb9b93cd07c9128, but we don't think it's in use. - I don't see any NFS mounts on the OSP-12 compute node, or logs indicating it tried to mount something and failed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3462