Description of problem: The current implementation of PM proxy selection is based on selection of host from the DC that is 'UP' status. This implementation is not robust enough since in some cases the host stays in non-responsive status because there is no proxy in UP status available in DC. After this is implemented: FenceProxyDefaultPreferences can look like this RHEVM,CLUSTER,DC In addition add the ability for RHEVM to check if vdsmd is running & fenece-agents package is installed on localhost (if not, we will ignore that and continue to next option) additional info: This issue is derived from https://bugzilla.redhat.com/show_bug.cgi?id=747305
barak - iirc we did some analysis on this a version or two ago, worth documenting the findings
there were several thoughts here: - directly execute fence-agents on the engne host cons: * different code that handles the fencing entirely * was nacked as this is a bad behavior of an application that resides in application server - install local vdsm on the engine host that will serve as fencing proxy only cons: * collides with All-In-One - create a new light weight VDSM that will serve only fencing requests: cons: * still collides with All-In-One or we'll listen to a different port * vdsm is not still ready in terms of modular builds Post 3.5 this is actually possible once you have all-in-one like deployment (= the engine host is actually an hypervisor) you can set the proxy-selection policy to be other_dc
(In reply to Barak from comment #2) > - create a new light weight VDSM that will serve only fencing requests: > cons: > * still collides with All-In-One or we'll listen to a different port > * vdsm is not still ready in terms of modular builds A micro service for fencing sounds like a good idea. Probably dockerized already.
(In reply to Yaniv Kaul from comment #3) > (In reply to Barak from comment #2) > > > - create a new light weight VDSM that will serve only fencing requests: > > cons: > > * still collides with All-In-One or we'll listen to a different port > > * vdsm is not still ready in terms of modular builds > > A micro service for fencing sounds like a good idea. Probably dockerized > already. The entire fencing capabilities are based on VDSM and underneath on the fence agents package. That's why lightweight VDSM was proposed here (dockerized would be nice). Do we think/want to pursue it in 4.0? Sounds a bit premature for me. What do you think? Just adding that in large clusters the use-case is less important, as you'll probably find a host that is up. Unless you lost communication to all hosts in the cluster, and in that case the fencing policy prevents fencing as more than 50% of the hosts are non-responsive (configurable per-cluster). Therefore, I'm reducing the severity to medium.
(In reply to Oved Ourfali from comment #4) > (In reply to Yaniv Kaul from comment #3) > > (In reply to Barak from comment #2) > > > > > - create a new light weight VDSM that will serve only fencing requests: > > > cons: > > > * still collides with All-In-One or we'll listen to a different port > > > * vdsm is not still ready in terms of modular builds > > > > A micro service for fencing sounds like a good idea. Probably dockerized > > already. > > The entire fencing capabilities are based on VDSM and underneath on the > fence agents package. > > That's why lightweight VDSM was proposed here (dockerized would be nice). > Do we think/want to pursue it in 4.0? Sounds a bit premature for me. > > What do you think? We do want to begin splitting VDSM into micro-services. This one seems like a good candidate, since it's pretty much isolated and should not require and privileges or changes to the docker image / selinux / ... 4.0 - may or may not be. It's certainly not a hot item for 4.0. We just need to begin the container story somewhere. > > Just adding that in large clusters the use-case is less important, as you'll > probably find a host that is up. Unless you lost communication to all hosts > in the cluster, and in that case the fencing policy prevents fencing as more > than 50% of the hosts are non-responsive (configurable per-cluster). > > Therefore, I'm reducing the severity to medium. Indeed.
*** Bug 1303111 has been marked as a duplicate of this bug. ***
We didn't get to this bug for more than 2 years, and it's not being considered for the upcoming 4.4. It's unlikely that it will ever be addressed so I'm suggesting to close it. If you feel this needs to be addressed and want to work on it please remove cond nack and target accordingly.
ok, closing. Please reopen if still relevant/you want to work on it.