Bug 1665553

Summary: Unable to migrate VM's using ceph storage - Unsafe migration: Migration without shared storage is unsafe
Product: Red Hat Enterprise Linux 7 Reporter: amashah
Component: libvirtAssignee: Michal Privoznik <mprivozn>
Status: CLOSED ERRATA QA Contact: gaojianan <jgao>
Severity: high Docs Contact:
Priority: high    
Version: 7.6CC: adevolder, alitke, amashah, bailey, coli, fdeutsch, fjin, guido.langenbach, hhan, jdenemar, jiyan, jsuchane, juzhang, mprivozn, mtessun, ncredi, pelauter, xianwang, xiaohli, xuzhang, yalzhang
Target Milestone: rcKeywords: Upstream, ZStream
Target Release: 7.6   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-4.5.0-11.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1672178 (view as bug list) Environment:
Last Closed: 2019-08-06 13:14:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1672178    
Attachments:
Description Flags
code coverage 100% none

Description amashah 2019-01-11 18:28:47 UTC
Description of problem:

Unable to migrate VM's using ceph storage

It seems similar to this, but here ceph storage is in use.

-  https://bugzilla.redhat.com/show_bug.cgi?id=1632711


Version-Release number of selected component (if applicable):
4.2.7-5

How reproducible:

Uncertain, resources to test in a test environment are unavailable. The issue is from a customer support case.

Steps to Reproduce:
1. Create a VM
2. VM storage is on ceph storage domain
3. Try to migrate the VM to another host

Actual results:

Unable to migrate VM's


Expected results:

VM's should be able to migrate.


Additional info:

mount info:

~~~
192.168.x.x:/ovirt/data on /rhev/data-center/mnt/192.168.x.x:_ovirt_data type ceph (rw,noatime,name=cephfs,secret=<hidden>,acl,wsize=16777216)
~~~

stat info:

~~~
[root@ovirt04 ~]# stat -f /rhev/data-center/mnt/192.168.x.x:_ovirt_data
   File: "/rhev/data-center/mnt/192.168.x.x:_ovirt_data"
     ID: 16e4002558925fd0 Namelen: 255     Type: ceph
Block size: 4194304    Fundamental block size: 4194304
Blocks: Total: 32740938   Free: 31486538   Available: 31486538
Inodes: Total: 15989285   Free: -1
~~~


error from host:

~~~
2018-12-24 12:44:57,619-0600 ERROR (migsrc/d28597b1) [virt.vm] (vmId='d28597b1-d133-44fd-ab5f-1bc4ff8280e4') Unsafe migration: Migration without shared storage is unsafe (migration:290)
2018-12-24 12:44:57,864-0600 ERROR (migsrc/d28597b1) [virt.vm] (vmId='d28597b1-d133-44fd-ab5f-1bc4ff8280e4') Failed to migrate (migration:455)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 437, in _regular_run
    self._startUnderlyingMigration(time.time())
  File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 509, in _startUnderlyingMigration
    self._perform_with_conv_schedule(duri, muri)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 587, in _perform_with_conv_schedule
    self._perform_migration(duri, muri)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 529, in _perform_migration
    self._migration_flags)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 98, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 130, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1779, in migrateToURI3
    if ret == -1: raise libvirtError ('virDomainMigrateToURI3() failed', dom=self)
libvirtError: Unsafe migration: Migration without shared storage is unsafe
~~~

Comment 3 Martin Tessun 2019-01-14 09:29:34 UTC
Hi Jarda,

this looks like ceph is not correctly identified as a shared storage:
192.168.x.x:/ovirt/data on /rhev/data-center/mnt/192.168.x.x:_ovirt_data type ceph (rw,noatime,name=cephfs,secret=<hidden>,acl,wsize=16777216)

 and the error in libvirt:
libvirtError: Unsafe migration: Migration without shared storage is unsafe

As you can see the ceph storage is shared.

Thanks for looking into this.

Comment 4 Han Han 2019-01-16 03:14:06 UTC
(In reply to Martin Tessun from comment #3)
> Hi Jarda,
> 
> this looks like ceph is not correctly identified as a shared storage:
> 192.168.x.x:/ovirt/data on /rhev/data-center/mnt/192.168.x.x:_ovirt_data
> type ceph (rw,noatime,name=cephfs,secret=<hidden>,acl,wsize=16777216)
> 
>  and the error in libvirt:
> libvirtError: Unsafe migration: Migration without shared storage is unsafe
> 
> As you can see the ceph storage is shared.
> 
> Thanks for looking into this.

Currently libvirt doesn't identify cephfs as shared fs:
  3764 int virFileIsSharedFS(const char *path)                                         
  3765 {                                                                               
  3766     return virFileIsSharedFSType(path,                                          
  3767     ┆   ┆   ┆   ┆   ┆   ┆   ┆   ┆VIR_FILE_SHFS_NFS |                            
  3768     ┆   ┆   ┆   ┆   ┆   ┆   ┆   ┆VIR_FILE_SHFS_GFS2 |                           
  3769     ┆   ┆   ┆   ┆   ┆   ┆   ┆   ┆VIR_FILE_SHFS_OCFS |                           
  3770     ┆   ┆   ┆   ┆   ┆   ┆   ┆   ┆VIR_FILE_SHFS_AFS |                            
  3771     ┆   ┆   ┆   ┆   ┆   ┆   ┆   ┆VIR_FILE_SHFS_SMB |                            
  3772     ┆   ┆   ┆   ┆   ┆   ┆   ┆   ┆VIR_FILE_SHFS_CIFS);                           
  3773 }   

The current shared FSes are nfs,gfs2(including glusterfs.fuse,ocfs,afs,smb,cifs)

Comment 5 Han Han 2019-01-17 02:28:14 UTC
Hi Michal,
I think it time to add cephfs migration support in libvirt since rhv has already supported this storage.

Comment 6 Jiri Denemark 2019-01-17 11:28:01 UTC
*** Bug 1667037 has been marked as a duplicate of this bug. ***

Comment 8 Michal Privoznik 2019-01-24 08:51:25 UTC
Hi,

can you please attach /proc/mounts from the host that has the ceph mounted? Thanks.

Comment 9 Guido Langenbach 2019-01-24 15:55:54 UTC
We have the same problem. This is the output from one of our hosts:

[root@server ~]# cat /proc/mounts 
rootfs / rootfs rw 0 0
sysfs /sys sysfs rw,seclabel,nosuid,nodev,noexec,relatime 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
devtmpfs /dev devtmpfs rw,seclabel,nosuid,size=98508412k,nr_inodes=24627103,mode=755 0 0
securityfs /sys/kernel/security securityfs rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /dev/shm tmpfs rw,seclabel,nosuid,nodev 0 0
devpts /dev/pts devpts rw,seclabel,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /run tmpfs rw,seclabel,nosuid,nodev,mode=755 0 0
tmpfs /sys/fs/cgroup tmpfs ro,seclabel,nosuid,nodev,noexec,mode=755 0 0
cgroup /sys/fs/cgroup/systemd cgroup rw,seclabel,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd 0 0
pstore /sys/fs/pstore pstore rw,nosuid,nodev,noexec,relatime 0 0
efivarfs /sys/firmware/efi/efivars efivarfs rw,nosuid,nodev,noexec,relatime 0 0
cgroup /sys/fs/cgroup/devices cgroup rw,seclabel,nosuid,nodev,noexec,relatime,devices 0 0
cgroup /sys/fs/cgroup/cpu,cpuacct cgroup rw,seclabel,nosuid,nodev,noexec,relatime,cpuacct,cpu 0 0
cgroup /sys/fs/cgroup/blkio cgroup rw,seclabel,nosuid,nodev,noexec,relatime,blkio 0 0
cgroup /sys/fs/cgroup/perf_event cgroup rw,seclabel,nosuid,nodev,noexec,relatime,perf_event 0 0
cgroup /sys/fs/cgroup/memory cgroup rw,seclabel,nosuid,nodev,noexec,relatime,memory 0 0
cgroup /sys/fs/cgroup/cpuset cgroup rw,seclabel,nosuid,nodev,noexec,relatime,cpuset 0 0
cgroup /sys/fs/cgroup/hugetlb cgroup rw,seclabel,nosuid,nodev,noexec,relatime,hugetlb 0 0
cgroup /sys/fs/cgroup/net_cls,net_prio cgroup rw,seclabel,nosuid,nodev,noexec,relatime,net_prio,net_cls 0 0
cgroup /sys/fs/cgroup/freezer cgroup rw,seclabel,nosuid,nodev,noexec,relatime,freezer 0 0
cgroup /sys/fs/cgroup/pids cgroup rw,seclabel,nosuid,nodev,noexec,relatime,pids 0 0
configfs /sys/kernel/config configfs rw,relatime 0 0
/dev/mapper/rhel-root / xfs rw,seclabel,relatime,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota 0 0
selinuxfs /sys/fs/selinux selinuxfs rw,relatime 0 0
systemd-1 /proc/sys/fs/binfmt_misc autofs rw,relatime,fd=34,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=32450 0 0
debugfs /sys/kernel/debug debugfs rw,relatime 0 0
mqueue /dev/mqueue mqueue rw,seclabel,relatime 0 0
hugetlbfs /dev/hugepages hugetlbfs rw,seclabel,relatime 0 0
hugetlbfs /dev/hugepages1G hugetlbfs rw,seclabel,relatime,pagesize=1G 0 0
/dev/mapper/gluster_vg_sda4-gluster_lv_engine /gluster_bricks/engine xfs rw,seclabel,noatime,nodiratime,attr2,inode64,logbsize=256k,sunit=512,swidth=6144,noquota 0 0
/dev/sda2 /boot xfs rw,seclabel,relatime,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota 0 0
/dev/sda1 /boot/efi vfat rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=winnt,errors=remount-ro 0 0
/dev/mapper/gluster_vg_sda4-gluster_lv_isos /gluster_bricks/isos xfs rw,seclabel,noatime,nodiratime,attr2,inode64,logbsize=256k,sunit=512,swidth=6144,noquota 0 0
/dev/mapper/gluster_vg_sda4-gluster_lv_vmstore /gluster_bricks/vmstore xfs rw,seclabel,noatime,nodiratime,attr2,inode64,logbsize=256k,sunit=512,swidth=6144,noquota 0 0
sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0
192.168.1.10:/engine /rhev/data-center/mnt/glusterSD/192.168.1.10:_engine fuse.glusterfs rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072 0 0
fusectl /sys/fs/fuse/connections fusectl rw,relatime 0 0
192.168.1.10:/vmstore /rhev/data-center/mnt/glusterSD/192.168.1.10:_vmstore fuse.glusterfs rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072 0 0
192.168.1.10:/isos /rhev/data-center/mnt/glusterSD/192.168.1.10:_isos fuse.glusterfs rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072 0 0
binfmt_misc /proc/sys/fs/binfmt_misc binfmt_misc rw,relatime 0 0
192.168.1.11:6789:/ /rhev/data-center/mnt/192.168.1.11:6789:_ ceph rw,relatime,name=admin,secret=<hidden>,acl,wsize=16777216 0 0
tmpfs /run/user/1174800007 tmpfs rw,seclabel,nosuid,nodev,relatime,size=19704212k,mode=700,uid=1174800007,gid=1174800007 0 0

Comment 11 Michal Privoznik 2019-01-25 15:36:09 UTC
Patch proposed upstream:

https://www.redhat.com/archives/libvir-list/2019-January/msg01008.html

Comment 12 Michal Privoznik 2019-01-28 14:08:03 UTC
Patch pushed upstream:

commit 6dd2a2ae6386b1d51edcc9a434f56d7f9dc2cb35
Author:     Michal Privoznik <mprivozn>
AuthorDate: Thu Jan 24 09:52:42 2019 +0100
Commit:     Michal Privoznik <mprivozn>
CommitDate: Mon Jan 28 14:56:21 2019 +0100

    virfile: Detect ceph as shared FS
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1665553
    
    Ceph can be mounted just like any other filesystem and in fact is
    a shared and cluster filesystem. The filesystem magic constant
    was taken from kernel sources as it is not in magic.h yet.
    
    Signed-off-by: Michal Privoznik <mprivozn>
    Reviewed-by: Erik Skultety <eskultet>

v5.0.0-130-g6dd2a2ae63

Comment 15 gaojianan 2019-05-20 08:14:28 UTC
Verified on :
libvirt-4.5.0-17.virtcov.el7.x86_64
qemu-kvm-rhev-2.12.0-18.el7_6.6.x86_64

Test VM migraition on mounter ceph-fs directly
1. Mount cephfs on dst and src host:
# mount -t ceph 10.73.224.204:6789:/ /mnt/ceph -o name=admin,secret=AQCAgd5cFyMQLhAAHaz6w+WKy5LvmKjmRAViEg==

2. Prepare a running VM whose image on the /mnt/ceph:
# virsh domblklist demo                                                                                                                         
Target     Source
------------------------------------------------
hda        /mnt/ceph/RHEL-7.5-x86_64-latest.qcow2

# virsh migrate demo --live qemu+ssh://10.66.144.87/system --verbose
Migration: [100 %]

Migrate back:
# virsh migrate demo --live qemu+ssh://10.73.196.199/system --verbose
Migration: [100 %]

Check R/W on VM:
# echo xx>xx
# cat xx
xx

worked as expected

Comment 16 gaojianan 2019-05-20 08:15:55 UTC
Created attachment 1571183 [details]
code coverage 100%

Comment 18 errata-xmlrpc 2019-08-06 13:14:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:2294