Bug 891085 (engine_as_fence_proxy)

Summary: [RFE] [engine]: Add the ability to the engine to serve as a fencing proxy
Product: [oVirt] ovirt-engine Reporter: Tareq Alayan <talayan>
Component: RFEsAssignee: Nobody's working on this, feel free to take it <nobody>
Status: CLOSED DEFERRED QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: ---CC: bsettle, bugs, lpeer, mgoldboi, mtessun, Rhev-m-bugs, srevivo, ykaul
Target Milestone: ---Keywords: FutureFeature
Target Release: ---Flags: ylavi: ovirt-future?
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
: 1373957 (view as bug list) Environment:
Last Closed: 2020-04-01 14:46:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1148638, 1373957    

Description Tareq Alayan 2013-01-01 16:21:15 UTC
Description of problem:

The current implementation of PM proxy selection is based on selection of host from the DC that is 'UP' status.

This implementation is not robust enough since in some cases the host stays in non-responsive status because there is no proxy in UP status available in DC.

After this is implemented: 
FenceProxyDefaultPreferences can look like this RHEVM,CLUSTER,DC

In addition add the ability for RHEVM to check if vdsmd is running & fenece-agents package is installed on localhost (if not, we will ignore that and continue to next option)


additional info:
This issue is derived from https://bugzilla.redhat.com/show_bug.cgi?id=747305

Comment 1 Itamar Heim 2014-05-04 10:26:36 UTC
barak - iirc we did some analysis on this a version or two ago, worth documenting the findings

Comment 2 Barak 2014-09-01 13:00:49 UTC
there were several thoughts here:
- directly execute fence-agents on the engne host 
  cons:
   * different code that handles the fencing entirely
   * was nacked as this is a bad behavior of an application that resides in 
     application server
- install local vdsm on the engine host that will serve as fencing proxy only
  cons:
    * collides with All-In-One
- create a new light weight VDSM that will serve only fencing requests:
  cons:
    * still collides with All-In-One or we'll listen to a different port
    * vdsm is not still ready in terms of modular builds


Post 3.5 this is actually possible once you have all-in-one like deployment (= the engine host is actually an hypervisor) you can set the proxy-selection policy to be other_dc

Comment 3 Yaniv Kaul 2015-11-25 11:29:57 UTC
(In reply to Barak from comment #2)

> - create a new light weight VDSM that will serve only fencing requests:
>   cons:
>     * still collides with All-In-One or we'll listen to a different port
>     * vdsm is not still ready in terms of modular builds

A micro service for fencing sounds like a good idea. Probably dockerized already.

Comment 4 Oved Ourfali 2015-11-25 11:35:35 UTC
(In reply to Yaniv Kaul from comment #3)
> (In reply to Barak from comment #2)
> 
> > - create a new light weight VDSM that will serve only fencing requests:
> >   cons:
> >     * still collides with All-In-One or we'll listen to a different port
> >     * vdsm is not still ready in terms of modular builds
> 
> A micro service for fencing sounds like a good idea. Probably dockerized
> already.

The entire fencing capabilities are based on VDSM and underneath on the fence agents package.

That's why lightweight VDSM was proposed here (dockerized would be nice).
Do we think/want to pursue it in 4.0? Sounds a bit premature for me.

What do you think?

Just adding that in large clusters the use-case is less important, as you'll probably find a host that is up. Unless you lost communication to all hosts in the cluster, and in that case the fencing policy prevents fencing as more than 50% of the hosts are non-responsive (configurable per-cluster).

Therefore, I'm reducing the severity to medium.

Comment 5 Yaniv Kaul 2015-11-25 11:45:32 UTC
(In reply to Oved Ourfali from comment #4)
> (In reply to Yaniv Kaul from comment #3)
> > (In reply to Barak from comment #2)
> > 
> > > - create a new light weight VDSM that will serve only fencing requests:
> > >   cons:
> > >     * still collides with All-In-One or we'll listen to a different port
> > >     * vdsm is not still ready in terms of modular builds
> > 
> > A micro service for fencing sounds like a good idea. Probably dockerized
> > already.
> 
> The entire fencing capabilities are based on VDSM and underneath on the
> fence agents package.
> 
> That's why lightweight VDSM was proposed here (dockerized would be nice).
> Do we think/want to pursue it in 4.0? Sounds a bit premature for me.
> 
> What do you think?

We do want to begin splitting VDSM into micro-services. This one seems like a good candidate, since it's pretty much isolated and should not require and privileges or changes to the docker image / selinux / ...

4.0 - may or may not be. It's certainly not a hot item for 4.0. We just need to begin the container story somewhere.
 
> 
> Just adding that in large clusters the use-case is less important, as you'll
> probably find a host that is up. Unless you lost communication to all hosts
> in the cluster, and in that case the fencing policy prevents fencing as more
> than 50% of the hosts are non-responsive (configurable per-cluster).
> 
> Therefore, I'm reducing the severity to medium.
Indeed.

Comment 6 Oved Ourfali 2016-02-04 13:28:50 UTC
*** Bug 1303111 has been marked as a duplicate of this bug. ***

Comment 9 Michal Skrivanek 2020-03-19 15:42:27 UTC
We didn't get to this bug for more than 2 years, and it's not being considered for the upcoming 4.4. It's unlikely that it will ever be addressed so I'm suggesting to close it.
If you feel this needs to be addressed and want to work on it please remove cond nack and target accordingly.

Comment 10 Michal Skrivanek 2020-04-01 14:46:31 UTC
ok, closing. Please reopen if still relevant/you want to work on it.

Comment 11 Michal Skrivanek 2020-04-01 14:50:26 UTC
ok, closing. Please reopen if still relevant/you want to work on it.