1484227 – [gfapi] Virtual Machines are not highly available with gluster libgfapi access mechanism (blocked on platform bug 1465810 )

Bug 1484227 - [gfapi] Virtual Machines are not highly available with gluster libgfapi access mechanism (blocked on platform bug 1465810 )

Summary: [gfapi] Virtual Machines are not highly available with gluster libgfapi acces...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	ovirt-engine
Classification:	oVirt
Component:	BLL.Virt
Sub Component:
Version:	4.1.5.2
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high with 1 vote
Target Milestone:	---
Target Release:	---
Assignee:	Michal Skrivanek
QA Contact:	SATHEESARAN
Docs Contact:
URL:
Whiteboard:
Depends On:	1465810
Blocks:	1484660 1633642
TreeView+	depends on / blocked

Reported:	2017-08-23 04:58 UTC by SATHEESARAN
Modified:	2022-03-10 17:10 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2019-10-14 10:27:49 UTC
oVirt Team:	Gluster
Embargoed:
Dependent Products:
Flags:	sbonazzo: ovirt-4.4-

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1322852	0	medium	CLOSED	[RFE] Gluster: add multiple host support in gfapi access, if multiple backup servers are available	2022-02-23 14:33:06 UTC
Red Hat Issue Tracker	RHV-45076	0	None	None	None	2022-03-10 17:10:47 UTC

Internal Links: 1322852

Description SATHEESARAN 2017-08-23 04:58:18 UTC

Description of problem:
-----------------------
Virtual Machines (VMs) created with gfapi access mechanism are not highly available.

VMs depends on the first/primary gluster volfile server to obtain the volfiles.
In the case, when the first/primary server is down, then it leads to the situation where VMs start fails.

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
RHV 4.1.5
glusterfs-3.8.4
vdsm-4.19.28-1
qemu-kvm-rhev-2.9.0-16

How reproducible:
-----------------
Always

Steps to Reproduce:
--------------------
1. Add 3 RHEL 7.4 nodes to a converged 4.1 cluster ( with virt & gluster capability enabled )
2. Enable libgfapi with engine ( LibgfApiSupported=true )
3. Create a gluster replica 3 volume, use it as RHV data domain (GlusterFS Type)
4. Create VMs and install OS
5. Move to first node ( primary volfile server ) to maintenance, with stopping gluster services
6. Stop the VMs
7. Start the VMs

Actual results:
---------------
Unable to start VMs

Expected results:
-----------------
VMs should start even when the first node ( primary volfile server ) is unavailable

Additional info:
----------------
Error message as seen in the events, when starting VM in the absence of first node ( primary volfile server ):
<snip>
VM appvm01 is down with error. Exit message: failed to initialize gluster connection (src=0x7f55080198c0 priv=0x7f550800be30): Transport endpoint is not connected.
</snip>

Comment 1 SATHEESARAN 2017-08-23 05:04:22 UTC

Following is the observation from the XML definition of the VM. There are 'no' additional volfile servers mentioned. So every time, when the VM starts, the QEMU process gets the volfile from the primary volfile server ( in this case, its 10.70.36.73 ) and if its unavailable, QEMU fails to start the VM stating - 'Transport endpoing is not connected'

We should provide the additional mount options passed with GlusterFS storage domain,(i.e) 'backup-volfile-servers' as the fallback hosts, so that QEMU can query those servers too for fetching volfiles.


  <disk type='network' device='disk' snapshot='no'>
      <driver name='qemu' type='raw' cache='none' error_policy='stop' io='threads'/>
      <source protocol='gluster' name='vmstore/051c9cd5-807c-4131-97e7-db306a7b3142/images/98b106c6-b2b3-4a94-8178-b912242567a1/895a694a-11a7-4327-bc13-55ab08805cb3'>
        <host name='10.70.36.73' port='0'/>
      </source>
      <backingStore/>
      <target dev='sda' bus='scsi'/>
      <serial>98b106c6-b2b3-4a94-8178-b912242567a1</serial>
      <boot order='2'/>
      <alias name='scsi0-0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>

Comment 2 SATHEESARAN 2017-08-23 05:06:04 UTC

This issue will be breaking the high availability of virtual machines, in case VMs are stopped and started again, and affects the Red Hat's hyperconverged product ( RHHI 1.1 )

Comment 3 Sahina Bose 2017-08-23 05:27:57 UTC

Denis, can you take a look at this? Shouldn't multiple hosts have been used in the domain xml?

Comment 4 Denis Chaplygin 2017-08-23 07:52:27 UTC

It is not yet supported by the libvirt. Please check the following bug:

https://bugzilla.redhat.com/show_bug.cgi?id=1465810

Comment 5 Michal Skrivanek 2017-08-23 08:13:52 UTC

the bug is currently targeted for 7.5, so defer this one to 4.3?

Comment 6 Sahina Bose 2017-08-23 08:59:23 UTC

(In reply to Michal Skrivanek from comment #5)
> the bug is currently targeted for 7.5, so defer this one to 4.3?

I think we need to work with qemu/libvirt team to see if this bug can make it earlier that 7.5 as VM HA is a critical requirement for RHHI.

Denis, is it possible to choose the online host to pass to domain XML while creating the xml from vdsm?

Comment 7 Denis Chaplygin 2017-08-23 09:05:39 UTC

Manually or automatically?


Atm VDSM just takes first host from the list of of hosts, supplied by the engine. I think we can order that list at the any way we like.

Comment 8 Sahina Bose 2017-08-23 09:32:14 UTC

(In reply to Denis Chaplygin from comment #7)
> Manually or automatically?

Automatically

> 
> 
> Atm VDSM just takes first host from the list of of hosts, supplied by the
> engine. I think we can order that list at the any way we like.

Instead of picking first host, pick the host that's online?

Comment 9 Denis Chaplygin 2017-08-23 09:36:51 UTC

VDSM have no idea about state of other hosts. I can remove hosts, which are in "non UP" state on the engine side.

Comment 10 Denis Chaplygin 2017-09-06 14:20:59 UTC

I investigated that issue and discovered, that host list is not sent by the engine within libvirt xml during VM creation. Instead, it is sent only on storage domain mount and them cached by the vdsm. 

During VM creation vdsm takes that cached list and takes a fist host from it to workarount libvirt bug [1] Unfortunately, in run time vdsm have no idea, whether gluster bricks are alive or not. Even worse, a peer status call uses just a first host from the list, mentioned above and if it is not available, no status data is returned.

Thus, we could either modify vdsm api and engine libvirt builder to send host information during vm creation or wait for libvirt bug to be fixed and just remove that single host workaround.


[1] https://bugzilla.redhat.com/show_bug.cgi?id=1465810

Comment 11 SATHEESARAN 2017-09-26 08:59:44 UTC

(In reply to Denis Chaplygin from comment #10)
> Thus, we could either modify vdsm api and engine libvirt builder to send
> host information during vm creation or wait for libvirt bug to be fixed and
> just remove that single host workaround.
> 
> 
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1465810

Denis,

This options needs to be discussed with wider audience.

But how costly is it to modify vdsm api and engine libvirt builder to send host info during VM creation ?

Comment 13 ΒΑΣΙΛΟΠΟΥΛΟΣ ΓΕΩΡΓΙΟΣ ΕΛΕΥΘΕΡΙΟΣ 2018-08-31 20:19:34 UTC

is there a workaround for this i.e to manualy edit the proper gluster server somewhere? The only thing I could do was to put the gluster domain in maintenance and edit it and change the host there...
Is there a workaround for this? It gets pretty painfull if we have to take down the whole gluster domain just to start a vm...

Comment 15 Gobinda Das 2019-10-14 10:27:49 UTC

Closing this as no action taken from long back.Please reopen if required.

Comment 16 Guillaume Pavese 2019-11-27 10:21:25 UTC

Things seem to be moving in qemu : https://bugzilla.redhat.com/show_bug.cgi?id=1465810
Time to reopen?

Comment 17 Guillaume Pavese 2019-12-24 11:32:40 UTC

Users on ovirt-user report 4x~5x performance improvement with libgfapi

Note You need to log in before you can comment on or make changes to this bug.