Bug 1640465

Summary: [RHHI] Hosted Engine migration fails in gluster storage domain
Product: Red Hat Enterprise Linux 7 Reporter: bipin <bshetty>
Component: libvirtAssignee: Michal Privoznik <mprivozn>
Status: CLOSED ERRATA QA Contact: gaojianan <jgao>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.6CC: akarlsso, amukherj, bshetty, fjin, guillaume.pavese, hannsj_uhl, jdenemar, jsuchane, lsurette, matthew.piechota, mprivozn, msheena, mtessun, nicholas.j.natale, pagranat, rcyriac, sabose, salmy, sasundar, stirabos, xuzhang, yalzhang
Target Milestone: pre-dev-freezeKeywords: Regression, Upstream, ZStream
Target Release: 7.6   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-4.5.0-11.el7 Doc Type: Bug Fix
Doc Text:
Migration of the Hosted Engine virtual machine from one host to another fails because libvirt incorrectly identifies FUSE mounted gluster volumes as being part of a shared file system, and as such does not identify the Hosted Engine virtual machine storage as being in need of migration. There is currently no workaround for this issue.
Story Points: ---
Clone Of:
: 1640467 1641798 (view as bug list) Environment:
Last Closed: 2019-08-06 13:14:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Gluster RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1641798, 1651787, 1657156    
Attachments:
Description Flags
logs_migration_unsafe
none
code coverage , missed 1 line none

Description bipin 2018-10-18 07:02:48 UTC
Description of problem:
======================
While migrating the Hosted Engine to other hosts available in the cluster,the migration fails.

Version-Release number of selected component (if applicable):
============================================================
redhat-release-virtualization-host-4.2-7.3
libvirt-4.5.0-10.el7_6.2.x86_64
ovirt-hosted-engine-ha-2.2.18-1.el7ev.noarch

How reproducible:
================
100%

Steps to Reproduce:
==================
1.Deploy RHHI( Hosted Engine setup with 3 hosts)
2.The hosts are capable of hosting the HE
3.Migrate the HE to any of the hosts within the cluster

Actual results:
==============
The migration fails to other hosts

Expected results:
================
The migration should be successful


Additional info:
===============

This issue looks similar to 1632711. 
Also to note, the application vm's are able to migrate but the Hosted Engine vm fails to migrate. 

PS: The libvirt changes are applied

Comment 4 Sahina Bose 2018-10-18 07:18:32 UTC
version of libvirt used?

Comment 7 Sahina Bose 2018-10-18 07:42:34 UTC
We run into similar issue even after applying build that fixes bug 1635705 - this time only for the Hosted Engine VM (other VMs running on the RHV environment migrate without issues)
Could you take a look

Comment 8 Michal Privoznik 2018-10-18 08:17:39 UTC
(In reply to Sahina Bose from comment #7)
> We run into similar issue even after applying build that fixes bug 1635705 -
> this time only for the Hosted Engine VM (other VMs running on the RHV
> environment migrate without issues)
> Could you take a look

Is the fixed package installed on both source and destination? If so, can you please share output of "virsh domblklist --details" for the domain that is failing and also attach mount table from both the source and destination?

Comment 16 bipin 2018-10-18 11:17:05 UTC
Please find the attached logs

Comment 20 Michal Privoznik 2018-10-18 12:56:09 UTC
Yes, this is a libvirt bug. While fixing bug 1632711 I did not realize that a dir in the path can be a symlink. Although, at this point it would be way easier if vdsm can just pass the VIR_MIGRATE_UNSAFE flag (which suppresses all checks like these).

Comment 25 Michal Privoznik 2018-10-18 15:05:46 UTC
Patch posted upstream:

https://www.redhat.com/archives/libvir-list/2018-October/msg00943.html

Comment 26 Michal Privoznik 2018-10-19 11:19:08 UTC
And I've just pushed the patch upstream:

commit c0790e3a09f57da0bd25c7eac4a35ed6e7e9e858
Author:     Michal Privoznik <mprivozn>
AuthorDate: Thu Oct 18 14:57:19 2018 +0200
Commit:     Michal Privoznik <mprivozn>
CommitDate: Fri Oct 19 13:15:54 2018 +0200

    virfile: Take symlink into account in virFileIsSharedFixFUSE
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1640465
    
    Weirdly enough, there can be symlinks in the path we are trying
    to fix. If it is the case our clever algorithm that finds matches
    against mount table won't work. Canonicalize path at the
    beginning then.
    
    Signed-off-by: Michal Privoznik <mprivozn>
    Reviewed-by: Erik Skultety <eskultet>

v4.8.0-117-gc0790e3a09

Comment 39 Ryan Barry 2018-11-08 11:36:45 UTC
*** Bug 1644679 has been marked as a duplicate of this bug. ***

Comment 40 Polina 2018-11-14 15:05:42 UTC
Created attachment 1505710 [details]
logs_migration_unsafe

I see the same migration error happens for regular VMs not HE environments build 4.3.

hypervisor-lynx18_vdsm.log:2018-10-11 18:47:26,188+0300 ERROR (migsrc/59b67fb0) [virt.vm] (vmId='59b67fb0-0d84-4c80-99a5-506d1bd263b2') migration destination error: Error creating the requested VM (migration:282)
hypervisor-lynx19_vdsm.log:2018-10-11 18:47:33,301+0300 ERROR (migsrc/a7703a0b) [virt.vm] (vmId='a7703a0b-c023-4576-a935-9efbd658cdf2') Unsafe migration: Migration without shared storage is unsafe (migration:282)
hypervisor-lynx19_vdsm.log:2018-10-11 18:47:33,881+0300 ERROR (migsrc/a7703a0b) [virt.vm] (vmId='a7703a0b-c023-4576-a935-9efbd658cdf2') Failed to migrate (migration:450)

attached logs logs_migration_unsafe.tar.gz

Comment 41 Michal Privoznik 2018-11-14 15:54:12 UTC
(In reply to Polina from comment #40)
> attached logs logs_migration_unsafe.tar.gz

libvirt logs are missing. Anyway, I suggest waiting for a build that contains fixes for this bug and re-runing your scenario.

Comment 44 gaojianan 2019-05-14 08:04:10 UTC
Verified on 
libvirt-4.5.0-16.virtcov.el7.x86_64
qemu-kvm-rhev-2.12.0-27.el7.x86_64

1.Scenario 1
1. Mount glusterfs on dst and src host:
# mount -t glusterfs 10.66.4.119:/gvnew /mnt/mount

2. Make symlinks on dst and src hosts:
# ln -s /mnt/mount/glusterfs /var/lib/libvirt/images/glusterfs

3. Prepare a running VM whose image on the symlink dir:
# virsh domblklist jgao                                                                                                                        
Target     Source
------------------------------------------------
vda        /var/lib/libvirt/images/glusterfs/A.qcow2

# virsh -k0 -K0 migrate jgao qemu+ssh://10.66.144.87/system --verbose                   
Migration: [100 %]

Migrate back:
# virsh -k0 -K0 migrate jgao qemu+ssh://10.73.196.199/system --verbose 
Migration: [100 %]

Check R/W on VM:
# echo xx>xx
# cat xx
xx


2.Scenario 2
1. Mount glusterfs on dst and src host:
# mount -t glusterfs 10.66.4.119:/gvnew /mnt/mount

2. Make symlinks on dst and src hosts:
# ln -s /mnt/mount/glusterfs /var/lib/libvirt/images/glusterfs

3. Create a backing chain of relative path:
# cd /var/lib/libvirt/images

# qemu-img create gls.copy -b glusterfs/A.qcow2 -f qcow2 -o backing_fmt=qcow2
Formatting 'gls.copy', fmt=qcow2 size=10737418240 backing_file=glusterfs/A.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16

Set VM image as top layer of backing chain
# virsh domblklist jgao                                                               
Target     Source
------------------------------------------------
vda        /var/lib/libvirt/images/gls.copy


4.Start VM and migration with --copy-storage-inc
# virsh start jgao                                                               
Domain jgao started

# virsh -k0 -K0 migrate jgao qemu+ssh://10.66.144.87/system --verbose --copy-storage-inc
Migration: [100 %]

Migrate back:
# virsh -k0 -K0 migrate jgao qemu+ssh://10.73.196.199/system --verbose --copy-storage-inc
Migration: [100 %]


Works as expected

Comment 45 gaojianan 2019-05-14 08:07:42 UTC
Created attachment 1568313 [details]
code coverage , missed 1 line

Comment 47 errata-xmlrpc 2019-08-06 13:14:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:2294