1389209 – [TechPreview] add support for managing qdevice heuristics

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1389209 - [TechPreview] add support for managing qdevice heuristics

Summary: [TechPreview] add support for managing qdevice heuristics

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	pcs
Sub Component:
Version:	7.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Tomas Jelinek
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:	1413573
Blocks:	1280348
TreeView+	depends on / blocked

Reported:	2016-10-27 08:05 UTC by Tomas Jelinek
Modified:	2022-06-27 21:37 UTC (History)
CC List:	9 users (show)
Fixed In Version:	pcs-0.9.162-1.el7
Doc Type:	No Doc Update
Doc Text:	Use documentation from corosync bug 1413573.
Clone Of:
Environment:
Last Closed:	2018-04-10 15:37:49 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
proposed fix + tests (264.95 KB, patch) 2017-11-13 15:40 UTC, Tomas Jelinek	no flags	Details \| Diff
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1413573	medium	CLOSED	[RFE][TechPreview] qdevice: Include support for heuristics	2022-06-27 21:37:01 UTC
Red Hat Issue Tracker	KCSOPP-1481	None	None	None	2022-06-27 21:37:19 UTC
Red Hat Product Errata	RHBA-2018:0866	None	None	None	2018-04-10 15:38:37 UTC

Internal Links: 1413573

Description Tomas Jelinek 2016-10-27 08:05:42 UTC

Qdevice support was added to pcs based on bz1158805. As a follow up we should implement support for managing qdevice heuristics once they are available in underlying cluster components.

Comment 5 Jan Pokorný [poki] 2017-11-09 17:28:51 UTC

Tomáši,

for the sake of user's convenience, my hopes are that one will be
able to specify commands to execute without artificial key qualifiers
like "exec_foo=test -f /var/run/.ultimatelock", but the rest of
the command-line not in -*, --* or exec_*=* (like mere
"test -f /var/run/.ultimatelock") will be interpreted directly
as the heuristic commands, for which the respective exec_* keys
will get assigned automaticall by pcs, like it is done for
various in-CIB identifiers already.

Nonetheless, exposing these keys later on will be needed so that
one can selectively drop particular commands.

Comment 6 Jan Pokorný [poki] 2017-11-09 17:31:14 UTC

re [comment 5]:

Actually, replace "exec_*=*" above with "{exec_*,mode,interval,...}=*",
you get the point.

Comment 7 Tomas Jelinek 2017-11-10 09:13:12 UTC

(In reply to Jan Pokorný from comment #5)
> Tomáši,
> 
> for the sake of user's convenience, my hopes are that one will be
> able to specify commands to execute without artificial key qualifiers
> like "exec_foo=test -f /var/run/.ultimatelock", but the rest of
> the command-line not in -*, --* or exec_*=* (like mere
> "test -f /var/run/.ultimatelock") will be interpreted directly
> as the heuristic commands, for which the respective exec_* keys
> will get assigned automaticall by pcs, like it is done for
> various in-CIB identifiers already.
> 
> Nonetheless, exposing these keys later on will be needed so that
> one can selectively drop particular commands.

They are not artificial. It is how corosync does it and documents it, so pcs can refer to corosync manpage for explanation. Users need to refer to commands using those keys anyway for updating and removing them. Your proposal would make the add command syntax inconsistent with the update command syntax. Moreover it would make it harder to extend the command in the future.

Comment 8 Jan Pokorný [poki] 2017-11-10 13:53:17 UTC

Where only to start:

1. they are indeed artificial from user's POV, they bring nothing
   semantically important, this is just a syntax hack to make it
   fit the key-value way of configuring corosync, it could have
   been sort of an array just as well (with index being the key
   then)

2. yes, they need to refer to them *only* when updating/removing,
   but you have the same situation at various places when configuring
   CIB already (default way of creating location constraints will
   not allow you to provide a custom identifier for that constraint,
   for instance), but you can get them easily if their listing
   does the right job of stating them along

3. "update" syntax is rather a nonsense here, you'll want to add
   a new command instead of the existing one?  fine, drop the old
   one, add new one...

4. you basically say you'll rather sacrifice user's convenience
   in favour of easy implementation, "nice"

Comment 9 Jan Pokorný [poki] 2017-11-10 13:55:55 UTC

Note that the situation would be different if the commands themselves
had some properties attached to them, which is not the case, though.

Comment 11 Jan Pokorný [poki] 2017-11-10 14:43:11 UTC

re [comment 8]:

> 3. "update" syntax is rather a nonsense here, you'll want to add
>   a new command instead of the existing one?  fine, drop the old
>   one, add new one...

Apparently, that's a bad piece of advice as the command to remove
may be the last one (which pcs should guard against properly
using the --force convenience?), so a fixed recipe:
add new one, drop old one, done.

Comment 12 Jan Pokorný [poki] 2017-11-11 11:21:21 UTC

re [comment 11]:

Actually, for the sake of highly desirably atomicity (no extra
reloads and related complications), this idea seems best to me
for situations like that:

  pcs quorum device heuristics [for <model>] replace exec_<id> \
    [exec_<id>...] with <raw-command-or-exec-assignment> \
    [<raw-command-or-exec-assignment>...]

Comment 13 Jan Friesse 2017-11-13 07:52:40 UTC

(In reply to Jan Pokorný from comment #8)
> Where only to start:
> 
> 1. they are indeed artificial from user's POV, they bring nothing
>    semantically important, this is just a syntax hack to make it

I must partly agree and partly disagree. "exec_" part brings nothing, that's for sure. But the name after the underscore gives user ability to name heuristics command as they like and (not yet implemented) refer to it by these name. So in future qdevice-tool will display more detailed statistics like (this is example, precise format yet to define):

Exec  Result (5 sec ago)  Result (10 sec ago)   Result (25 sec ago)
----  ------------------  -------------------   -------------------
test  Pass                Pass                  Fail
foo   Fail                Pass                  Unknown
ping  Unknown             Pass                  Unknown
...

So how exactly should be the name part autogenerated?

>    fit the key-value way of configuring corosync, it could have
>    been sort of an array just as well (with index being the key
>    then)

What would not allow user to have short, easy to remember name of long command and it is exactly the reason why it's not implemented that way. Actually from my POV this is much cleaner then what is happening with nodelist (what I would call syntax hack).

Comment 14 Jan Pokorný [poki] 2017-11-13 10:04:43 UTC

> So in future qdevice-tool will display more detailed statistics like (this is
> example, precise format yet to define):
> 
> Exec  Result (5 sec ago)  Result (10 sec ago)   Result (25 sec ago)
> ----  ------------------  -------------------   -------------------
> test  Pass                Pass                  Fail
> foo   Fail                Pass                  Unknown
> ping  Unknown             Pass                  Unknown

Ah, I see, this was an untold story so far.

Still, some would prefer the command-line shown in full, especially when
checking the output only in case of troubles, it would help to uncover
stale ping target no longer and what not right away.

> So how exactly should be the name part autogenerated?

Excerpt from the command's digest, for instance.


My point was, just as noone is interested in the index column in the table of
a relational database (such as on insert ~ definition of heuristic command)
unless there's a need to update/drop or some other place need to refer to it
unambiguously, except the inambiguity constraint is likely to be relaxed here
as defining two exactly same heuristics does make no sense and should be marked
an error by the configuration tool.  Then, if you'd need to drop particular
heuristic command, you could either provide it in full (style of iproute2's
address removal from the interface) or look the "index" up by the means of
detailed list command.

Current design requires one to provide the "index" unconditionally (yes, it
should be able to do so), and I am asking for the convenient way be available
as well, just as it currently exists with various CIB configuring commands
likewise allowing you to avoid the identifiers that need to be present under
the hood, but in basic scenarios, they are just usability killing burden
on the user, like if the overall configuration wasn't hard to grasp already.

Comment 15 Tomas Jelinek 2017-11-13 15:40:11 UTC

Created attachment 1351610 [details]
proposed fix + tests

Updated commands:
* adding a quorum device:
pcs quorum device add [<generic options>] model <device model> [<model options>] [heuristics <heuristics options including exec_ list>]
* changing a quorum device:
pcs quorum device update [<generic options>] [model <model options>] [heuristics <heuristics options including exec_ list>]
* displaying configuration:
pcs quorum config

New commands:
* remove all heuristics settings:
pcs quorum device heuristics remove

Comment 17 Tomas Jelinek 2017-11-13 15:58:01 UTC

(In reply to Jan Pokorný from comment #8)
> 4. you basically say you'll rather sacrifice user's convenience
>    in favour of easy implementation, "nice"
No, I do not.


(In reply to Jan Pokorný from comment #12)
> re [comment 11]:
> 
>   pcs quorum device heuristics [for <model>] replace exec_<id> \
>     [exec_<id>...] with <raw-command-or-exec-assignment> \
>     [<raw-command-or-exec-assignment>...]

So you do not like
pcs quorum device update heuristics exec_<id>=<command>
because it has this exec_<id> part.
Now your proposed command has it as well. On top of that it presents a whole new syntax inconsistent with the rest of pcs. And how do you cover adding, removing and changing heuristics and other quorum device settings in one command / step with this new command? Because that is what the current command does.


What I see as a possible improvement:
pcs quorum device add [<generic options>] model <device model> [<model options>] [heuristics <heuristics options> exec <id>=<command> [<id>=<command>]...]
which is not that different. And it may cause a confusion due to discrepancy with corosync manpages. Being able to find documentation and map it to the command is from my point of view more important than saving some typing.

Comment 18 Jan Friesse 2017-11-13 16:32:41 UTC

(In reply to Jan Pokorný from comment #14)
> > So in future qdevice-tool will display more detailed statistics like (this is
> > example, precise format yet to define):
> > 
> > Exec  Result (5 sec ago)  Result (10 sec ago)   Result (25 sec ago)
> > ----  ------------------  -------------------   -------------------
> > test  Pass                Pass                  Fail
> > foo   Fail                Pass                  Unknown
> > ping  Unknown             Pass                  Unknown
> 
> Ah, I see, this was an untold story so far.
> 
> Still, some would prefer the command-line shown in full, especially when

That is expected to be somewhere.

> checking the output only in case of troubles, it would help to uncover
> stale ping target no longer and what not right away.

Output of command is nice to have for sure.

> 
> > So how exactly should be the name part autogenerated?
> 
> Excerpt from the command's digest, for instance.

Ugh. Digest may be ok for computer but I don't see it as a good name for "user" to use. I just don't believe there is really majority of people who likes names like 24f77cd2b40d037670548cc08031c36b0269b6c6dc6cc5a9d78be2193cae181a495fb15307c099b8fb6cf352a4cce86276193550b5ece69ce1468997e35281a1 (this is real command, are you able to find out what it is?) more than ping (or ping_localhost).

> 
> 
> My point was, just as noone is interested in the index column in the table of
> a relational database (such as on insert ~ definition of heuristic command)
> unless there's a need to update/drop or some other place need to refer to it
> unambiguously, except the inambiguity constraint is likely to be relaxed here
> as defining two exactly same heuristics does make no sense and should be
> marked
> an error by the configuration tool.  Then, if you'd need to drop particular
> heuristic command, you could either provide it in full (style of iproute2's
> address removal from the interface) or look the "index" up by the means of
> detailed list command.

Full command is not that great idea because what if there are two equal commands (what of course doesn't make too much sense as long as command is not doing some kind of locking to be able to do it's internal ordering = magic)?

> 
> Current design requires one to provide the "index" unconditionally (yes, it
> should be able to do so), and I am asking for the convenient way be available
> as well, just as it currently exists with various CIB configuring commands
> likewise allowing you to avoid the identifiers that need to be present under
> the hood, but in basic scenarios, they are just usability killing burden
> on the user, like if the overall configuration wasn't hard to grasp already.

Comment 19 Jan Pokorný [poki] 2017-11-13 17:33:00 UTC

In [comment 17]:

> (In reply to Jan Pokorný from comment #8)
>> 4. you basically say you'll rather sacrifice user's convenience
>>    in favour of easy implementation, "nice"
>> 
> No, I do not.

That's debatable for simple use cases.

* * *

> So you do not like [...] because it has this exec_<id> part.  Now your
> proposed command has it as well.

Not sure I am understood, in comment [14] I claim it's OK to have it
supported, but that alone is not sufficient for the least effort
configuration:

> Current design requires one to provide the "index" unconditionally
> (yes, it should be able to do so), and I am asking for the convenient
> way be available as well, 

* * *

> On top of that it presents a whole new syntax inconsistent with the
> rest of pcs.

Sure, simply justified with the fact that while you can batch the
changes in CIB easily (though there's a room for simplification, at
least for the purpose of coherent scripting: [bug 1359057]), hence
making desired "replace foo with bar" automic from the perspective
of committed changes, there's no such a provision for corosync
configuration.  For that reason, you'd preferably be able to do that
in one sweep, and then it's OK to bring the new configuration syntax
pattern in.

> And how do you cover adding, removing and changing heuristics and
> other quorum device settings in one command / step with this new
> command? Because that is what the current command does.

Ok, I admit I thought you were originally talking about an update
with the granularity of particular heuristic command in [comment 7].
It would indeed have been a heavy overkill assuming there are no real
properties of the heuristic command, but also one of the very few
reasons to make users face the artificial identifier in the first place.

The current update command could as well follow the idea from
[comment 12]:

  pcs quorum device update [<generic options>] [model <model options>]
    [heuristics <heuristics options>]
    [{ <raw-command-or-exec-assignment>
      | drop <raw-command-or-exec-assignment>
          [<raw-command-or-exec-assignment>...]
      | replace <raw-command-or-exec-assignment>
          [<raw-command-or-exec-assignment>...]
        with <raw-command-or-exec-assignment>
          [<raw-command-or-exec-assignment>...]}]

  where <raw-command-or-exec-assignment> encapsulates
  { exec_<id>=<command> | command }

Example:

  pcs quorum device update heuristics replace 'ping -qc1 lokal' \
    with 'ping -qc1 remotal' 'ping -qc1 kontejneral'

with the obvious meaning.

First branch of heuristics options is used merely for adding new, the
second one for removing existing, and replace-with one for combining
the former.  Note that this first branch could be fully compatible with
with "pcs quorum device add", and it's obvious one cannot drop nor
replace when initially adding the device as such, and it should not
ever happen that "drop" or "replace" (verbatim) are the intended
programs to run -- for systemd enabled systems using stock
corosync-qdevice.service, it would actually mean "/drop" and "/replace"
executables, respectively, and any sane admin hardly ever reaches that.
Ditto for "/exec_*=*" possible ambiguity...

* * *

> What I see as a possible improvement

That doesn't touch the principial problem I raised, and agreed, this
would not be for the better.

Comment 20 Jan Pokorný [poki] 2017-11-13 18:35:53 UTC

re [comment 18]:

>>> So how exactly should be the name part autogenerated?
>> 
>> Excerpt from the command's digest, for instance.
>
> Ugh. Digest may be ok for computer but I don't see it as a good name
> for "user" to use. I just don't believe there is really majority of
> people who likes names like
> 24f77cd2b40d037670548cc08031c36b0269b6c6dc6cc5a9d78be2193cae181a495fb15307c099b8fb6cf352a4cce86276193550b5ece69ce1468997e35281a1

Ugh.  "Excerpt" word must have been missed, just a tiny part of the real
digest was meant, and it can be base64-encoded form of the raw digest
bytes.  Hence rather something like exec_yOhm (for "/usr/bin/ping -qc1
localhost").

> (this is real command, are you able to find out what it is?) more than
> ping (or ping_localhost).

That's what I'd use detailed heuristic commands listing for, but would
rather prefer if raw commands where the default when facing the user in
the first place :)

Even in the logs it would be preferable to have the commands shown in
full, for auditability reasons (to know what got executed when, as the
configuration can change in time while perhaps the identifiers can stay
the same).


(FTR, this is where docker/moby friendliness excel -- need to uniquely
identify the running container instance?  we won't bother the user to
crack her head figuring out more or less transient identifier, we'll
just combine two easy-to-remember wordlist items, adjective + well
known scientist IIRC, and the user will then figure out this artificial
identifier easily when required as an input to other commands;
IMHO, we should strive for these simplifications in user interface
as well where it makes sense, getting beyond throwing the minuscule
internal details on the user at any opportunity)

* * *

>> except the inambiguity constraint is likely to be relaxed here as
>> defining two exactly same heuristics does make no sense and should be
>> marked an error by the configuration tool.  Then, if you'd need to
>> drop particular heuristic command, you could either provide it in
>> full (style of iproute2's address removal from the interface) or look
>> the "index" up by the means of detailed list command.
> 
> Full command is not that great idea because what if there are two
> equal commands

Again, it was likely missed that exactly the same commands defined
as heuristics should be actively prevented on creation, and the
replace/drop logic should be ready to perform a deduplication when
such a case witnessed (e.g. coming from a manual edit).

My reasoning behind that is the documention, corosync-qdevice(8), does
not specify (amongst other things) how exactly the commands are to be
run, so one can expect both parallel and sequential executions (latter
in either deterministic or nedeterministic way), so one cannot rely on
any precise timing, each heuristic command should be idempotent to each
other, and running two identical idempotent commands (possibly in
interleave) makes no practical sense if a single failure is enough to
claim it a global failure for the particular node (is it like this?).
One is still free to exercise a serialized set of actions by enclosing
them into a standalone script to be run as this command, though.

> (what of course doesn't make too much sense as long as
> command is not doing some kind of locking to be able to do it's
> internal ordering = magic)?

Why would anyone go this crazy if you get the serialization
automatically for commands in the non-forking flow of a shell script?


Looks like this whole thing needs some reality-checking.

Comment 21 Jan Pokorný [poki] 2017-11-13 18:43:31 UTC

re [comment 19]:

Missed an iteration in "add" case in the proposed syntax, after fix:

  pcs quorum device update [<generic options>] [model <model options>]
    [heuristics <heuristics options>]
    [{  <raw-command-or-exec-assignment>
        [<raw-command-or-exec-assignment>...]
      | drop <raw-command-or-exec-assignment>
          [<raw-command-or-exec-assignment>...]
      | replace <raw-command-or-exec-assignment>
          [<raw-command-or-exec-assignment>...]
        with <raw-command-or-exec-assignment>
          [<raw-command-or-exec-assignment>...]}]

Comment 22 Jan Pokorný [poki] 2017-11-13 18:55:12 UTC

re [comment 18]:

> (In reply to Jan Pokorný from comment #14)
>>> So in future qdevice-tool will display more detailed statistics like (this is
>>> example, precise format yet to define):
>>> 
>>> Exec  Result (5 sec ago)  Result (10 sec ago)   Result (25 sec ago)
>>> ----  ------------------  -------------------   -------------------
>>> test  Pass                Pass                  Fail
>>> foo   Fail                Pass                  Unknown
>>> ping  Unknown             Pass                  Unknown
>> 
>> Ah, I see, this was an untold story so far.
>> 
>> Still, some would prefer the command-line shown in full, especially when
> 
> That is expected to be somewhere.

You are now talking about this key-derived identifier, right?
I'd say, not necessarily, definitely not necessary for only a single
command configured.

>>> checking the output only in case of troubles, it would help to uncover
>>> stale ping target no longer and what not right away.
> 
> Output of command is nice to have for sure.

Oh, I meant checking output of qdevice-tool (or the like).

Comment 23 Jan Pokorný [poki] 2017-11-13 20:21:58 UTC

Also noticed that one could, in theory, use "heuristics_use_execvp=on"
in options passed directly to corosync-qdevice daemon, which restores
the system(3)-like behaviour wrt. $PATH traversal, and would pose
a risk of legitimately entered unqualified (and only unqualified)
drop/replace/with/exec_*=* binaries being incidentally misinterpreted
in the processing of the proposed syntax [comment 21], but that would
just mangle the heuristic commands configured and could be prevented:

- first and foremost, by not changing that "advanced setting", because
  it "shouldn't be generally used because most of the options are not
  safe to change" as advised in the man page

- by sticking with using exec_foo=bar syntax

- by keeping the sane full-qualification of the executables to run,
  otherwise one can get screwed up easily when adding something into
  /usr/local/{s,}bin (in a default systemd-inited deployment) anyway

- by testing the procedure on the staging cluster first, more so
  when already skating on thin ice (when enabling that option on)

Comment 24 Jan Pokorný [poki] 2017-11-13 22:32:53 UTC

re [comment 21]:

Last fix, s/raw-command-or-exec-assignment/raw-command-or-exec-id/
at some places, hence hopefully a final proposal:

  pcs quorum device update [<generic options>] [model <model options>]
    [heuristics <heuristics options>]
    [{  <raw-command-or-exec-assignment>
        [<raw-command-or-exec-assignment>...]
      | drop <raw-command-or-exec-id>
          [<raw-command-or-exec-id>...]
      | replace <raw-command-or-exec-id>
          [<raw-command-or-exec-id>...]
        with <raw-command-or-exec-assignment>
          [<raw-command-or-exec-assignment>...]}]

Comment 25 Jan Friesse 2017-11-14 08:31:43 UTC

(In reply to Jan Pokorný from comment #20)
> re [comment 18]:
> 
> >>> So how exactly should be the name part autogenerated?
> >> 
> >> Excerpt from the command's digest, for instance.
> >
> > Ugh. Digest may be ok for computer but I don't see it as a good name
> > for "user" to use. I just don't believe there is really majority of
> > people who likes names like
> > 24f77cd2b40d037670548cc08031c36b0269b6c6dc6cc5a9d78be2193cae181a495fb15307c099b8fb6cf352a4cce86276193550b5ece69ce1468997e35281a1
> 
> Ugh.  "Excerpt" word must have been missed, just a tiny part of the real
> digest was meant, and it can be base64-encoded form of the raw digest
> bytes.  Hence rather something like exec_yOhm (for "/usr/bin/ping -qc1
> localhost").

Still don't think exec_yOhm is better than exec_ping_localhost.

> 
> > (this is real command, are you able to find out what it is?) more than
> > ping (or ping_localhost).
> 
> That's what I'd use detailed heuristic commands listing for, but would
> rather prefer if raw commands where the default when facing the user in
> the first place :)
> 
> Even in the logs it would be preferable to have the commands shown in
> full, for auditability reasons (to know what got executed when, as the
> configuration can change in time while perhaps the identifiers can stay
> the same).
> 
> 
> (FTR, this is where docker/moby friendliness excel -- need to uniquely
> identify the running container instance?  we won't bother the user to
> crack her head figuring out more or less transient identifier, we'll
> just combine two easy-to-remember wordlist items, adjective + well
> known scientist IIRC, and the user will then figure out this artificial
> identifier easily when required as an input to other commands;
> IMHO, we should strive for these simplifications in user interface
> as well where it makes sense, getting beyond throwing the minuscule
> internal details on the user at any opportunity)
> 
> * * *
> 
> >> except the inambiguity constraint is likely to be relaxed here as
> >> defining two exactly same heuristics does make no sense and should be
> >> marked an error by the configuration tool.  Then, if you'd need to
> >> drop particular heuristic command, you could either provide it in
> >> full (style of iproute2's address removal from the interface) or look
> >> the "index" up by the means of detailed list command.
> > 
> > Full command is not that great idea because what if there are two
> > equal commands
> 
> Again, it was likely missed that exactly the same commands defined
> as heuristics should be actively prevented on creation, and the
> replace/drop logic should be ready to perform a deduplication when
> such a case witnessed (e.g. coming from a manual edit).

I don't see any reason for doing this.

> 
> My reasoning behind that is the documention, corosync-qdevice(8), does
> not specify (amongst other things) how exactly the commands are to be
> run, so one can expect both parallel and sequential executions (latter
> in either deterministic or nedeterministic way), so one cannot rely on

Yep, agree, this is missing bit of information. I'll add such node.

> any precise timing, each heuristic command should be idempotent to each
> other, and running two identical idempotent commands (possibly in
> interleave) makes no practical sense if a single failure is enough to
> claim it a global failure for the particular node (is it like this?).
> One is still free to exercise a serialized set of actions by enclosing
> them into a standalone script to be run as this command, though.
> 
> > (what of course doesn't make too much sense as long as
> > command is not doing some kind of locking to be able to do it's
> > internal ordering = magic)?
> 
> Why would anyone go this crazy if you get the serialization
> automatically for commands in the non-forking flow of a shell script?

I don't know and I don't care. I'm giving user the tool. If they use tool to cut their finger, it's their problem.

> 
> 
> Looks like this whole thing needs some reality-checking.

Last sentence is really only part where I can fully agree. Let customers/gss/qa test it, and will see what they think. It's TechPreview so we still have space to do incompatible changes.

Comment 26 Jan Pokorný [poki] 2017-11-14 10:43:12 UTC

re [comment 25]:

> Yep, agree, this is missing bit of information. I'll add such node.

Appreciated!  If the user is to make any more sophisticated sequences,
"the mode of executing" is indeed missing.  On the other hand, it might
be good to consider, if for instance the commands cannot be run in
parallel (if not already) in the future and put a future-anticipating
warning about the expected non-interference amongst the commands.

What's also missing is the most important bit -- how will the commands
influence the decisions (what is failure and if single failure wins).

> I don't know and I don't care. I'm giving user the tool.

Yes, it's understandable from "I am myself curious what use cases
will the userbase come up with" (as opposed to "our mission is to
guide the user through the cluster configuration management", which
is the only surface I am scratching here) POV.

> Let customers/gss/qa test it, and will see what they think.
> It's TechPreview so we still have space to do incompatible
> changes.

Except this development mode is upside-down, no upstream first that
could have something to say on the usability before it gets settled,
and once established syntax is really hard to change.

Comment 27 Jan Friesse 2017-11-14 14:42:14 UTC

(In reply to Jan Pokorný from comment #26)
> re [comment 25]:
> 
> > Yep, agree, this is missing bit of information. I'll add such node.
> 
> Appreciated!  If the user is to make any more sophisticated sequences,
> "the mode of executing" is indeed missing.  On the other hand, it might
> be good to consider, if for instance the commands cannot be run in
> parallel (if not already) in the future and put a future-anticipating
> warning about the expected non-interference amongst the commands.
> 
> What's also missing is the most important bit -- how will the commands
> influence the decisions (what is failure and if single failure wins).

From the man page:

"
When  all  commands  finish  successfully
(their return error code is zero) on time, heuristics have passed, otherwise they have failed.
"

How exactly it influences the algorithm is described in "algorithm" section.


> 
> > I don't know and I don't care. I'm giving user the tool.
> 
> Yes, it's understandable from "I am myself curious what use cases
> will the userbase come up with" (as opposed to "our mission is to
> guide the user through the cluster configuration management", which
> is the only surface I am scratching here) POV.

Nice try, but just pure fallacy.

> 
> > Let customers/gss/qa test it, and will see what they think.
> > It's TechPreview so we still have space to do incompatible
> > changes.
> 
> Except this development mode is upside-down, no upstream first that
> could have something to say on the usability before it gets settled,
> and once established syntax is really hard to change.

Actually not. When I say customers, I mean not only downstream customers (so RHEL/Fedora) but also upstream customers (= users of corosync)

Comment 28 Jan Pokorný [poki] 2017-11-14 15:48:47 UTC

Re [comment 27]

> "
> When  all  commands  finish  successfully
> (their return error code is zero) on time, heuristics have passed, otherwise
> they have failed.
> "

Sorry, must have overlooked it this time (was looking just at
exec_NAME now).  Have seen that before.

> How exactly it influences the algorithm is described in "algorithm"
> section.

Fair enough and good to know.

The other missing bit is circumstances under which the command gets
executed -- are environment variables passed over, what are the
privileges of the process launched, stuff like that.

>>> I don't know and I don't care. I'm giving user the tool.
>> 
>> Yes, it's understandable from "I am myself curious what use cases
>> will the userbase come up with" (as opposed to "our mission is to
>> guide the user through the cluster configuration management", which
>> is the only surface I am scratching here) POV.
> 
> Nice try, but just pure fallacy.

I think configuration tool paying attention to what is being
configured is the Right Thing, otherwise it's the fallacy to think
the tool will do substantially better (in form of some assistance)
than manual edit + propagate process.
Re [comment 27]

> "
> When  all  commands  finish  successfully
> (their return error code is zero) on time, heuristics have passed, otherwise
> they have failed.
> "

Sorry, must have overlooked it this time (was looking just at
exec_NAME now).  Have seen that before.

> How exactly it influences the algorithm is described in "algorithm"
> section.

Fair enough and good to know.

The other missing bit is circumstances under which the command gets
executed -- are environment variables passed over, what are the
privileges of the process launched, stuff like that.

>>> I don't know and I don't care. I'm giving user the tool.
>> 
>> Yes, it's understandable from "I am myself curious what use cases
>> will the userbase come up with" (as opposed to "our mission is to
>> guide the user through the cluster configuration management", which
>> is the only surface I am scratching here) POV.
> 
> Nice try, but just pure fallacy.

I think configuration tool paying attention to what is being
configured is the Right Thing, otherwise it's the fallacy to think
the tool will do substantially better (in form of some assistance)
than manual edit + propagate process.
Re [comment 27]

> "
> When  all  commands  finish  successfully
> (their return error code is zero) on time, heuristics have passed, otherwise
> they have failed.
> "

Sorry, must have overlooked it this time (was looking just at
exec_NAME now).  Have seen that before.

> How exactly it influences the algorithm is described in "algorithm"
> section.

Fair enough and good to know.

The other missing bit is circumstances under which the command gets
executed -- are environment variables passed over, what are the
privileges of the process launched, stuff like that.

>>> I don't know and I don't care. I'm giving user the tool.
>> 
>> Yes, it's understandable from "I am myself curious what use cases
>> will the userbase come up with" (as opposed to "our mission is to
>> guide the user through the cluster configuration management", which
>> is the only surface I am scratching here) POV.
> 
> Nice try, but just pure fallacy.

I think configuration tool actually ghpaying attention to what is being
configured is the Right Thing, otherwise it's the fallacy to think
this configurator will do substantially better (in form of some
assistance) than manual edit + propagate process.

>>> Let customers/gss/qa test it, and will see what they think.
>>> It's TechPreview so we still have space to do incompatible
>>> changes.
>> 
>> Except this development mode is upside-down, no upstream first that
>> could have something to say on the usability before it gets settled,
>> and once established syntax is really hard to change.
> 
> Actually not. When I say customers, I mean not only downstream customers (so
> RHEL/Fedora) but also upstream customers (= users of corosync)

But corosync doesn't need to worry about incompatible changes that 
much, as the configuration tool can abstract a lot from the actual
low-level form.

Comment 29 Jan Pokorný [poki] 2017-11-15 20:52:33 UTC

To summarize my points from above discussion concisely for posterity
(my last entry here):

I claim the point of the configuration tool is to make that very act
substantially easier and less error-prone than doing it either by hand
or with a trivial custom automation.  As I identified some points that
could be done and are not (below), I revealed that there's very
little value added with the tool on top of providing unified facade
also for configuring this new subdomain of the overall space:

1. identifiers for the heuristic commands are pedantically required
   even when user could do without them easily in this case, especially
   when configuring just a single heuristic command
   (see final proposal of the syntax in [comment 24] + ambiguity
   discussing bottom of [comment ]9) + heuristics_use_execvp
   discussing [comment 23])

2. no checking for nonsensical redundancy in heuristic commands and
   preventing them (unless overridden with --force?,
   see [comment 20], section starting "except the inambiguity")

3. no guarding for fully qualified executables being specified
   (unless overridden with --force?;
   corollary of [comment 19], "First branch" part, and [comment 23])

4. no action (informative and/or effective drop of heuristics part)
   happening upon target state becoming "no commands defined"
   -- user should at least be informed no heuristic commands remain
   (are configured initially), hence heuristics as such are out of
   any quorum equations (for now)
   (in my now rather subjective view, heuristics is a weak entity,
   i.e., no constituting commands = no heuristics block at all,
   on creation/update, which would also mandate to have at least
   a single command given when configuring heuristics from scratch,
   but that's also reasonable in my eyes)


All these could be uncovered with proper analytical review.  And
without such an assistance (based on intimate knowledge of the subject
under control) provided, the tool remains just a dummy forwarder of
the master's orders.  I admit it's able to do remarkably better at
other places, but here, it's simply not cutting it (and sadly, I don't
believe QEs would ever figure out these, or attempted to reach any
domain-specific critical thinking for that matter, on their own,
but that's another story).

Comment 30 Ondrej Mular 2017-11-16 07:06:54 UTC

After fix:
[root@rhel75-node1 ~]# rpm -q pcs
pcs-0.9.162-1.el7.x86_64

> create quorum device
[root@rhel75-node1 ~]# pcs qdevice setup model net --start
Quorum device 'net' initialized
Starting quorum device...
quorum device started

> add quorum device with heuristics into the cluster
[root@rhel75-node1 ~]# pcs quorum device add model net host=rhel75-node1 algorithm=ffsplit  heuristics 'exec_ping=/usr/bin/ping -1 -c 1 "127.0.0.1"'
Setting up qdevice certificates on nodes...
rhel75-node1: Succeeded
rhel75-node2: Succeeded
Enabling corosync-qdevice...
rhel75-node1: not enabling corosync-qdevice: corosync is not enabled
rhel75-node2: not enabling corosync-qdevice: corosync is not enabled
Sending updated corosync.conf to nodes...
rhel75-node1: Succeeded
rhel75-node2: Succeeded
Corosync configuration reloaded
Starting corosync-qdevice...
rhel75-node1: corosync-qdevice started
rhel75-node2: corosync-qdevice started

> verify that it has been setup correctly
[root@rhel75-node2 ~]# pcs quorum config
Options:
Device:
  votes: 1
  Model: net
    algorithm: ffsplit
    host: rhel75-node1
  Heuristics:
    exec_ping: /usr/bin/ping -1 -c 1 "127.0.0.1"

> update heuristics options
[root@rhel75-node2 ~]# pcs quorum device update heuristics exec_ping= "exec_ls= /usr/bin/test -f /tmp/test"
Sending updated corosync.conf to nodes...
rhel75-node1: Succeeded
rhel75-node2: Succeeded
Corosync configuration reloaded
Reloading qdevice configuration on nodes...
rhel75-node2: corosync-qdevice stopped
rhel75-node1: corosync-qdevice stopped
rhel75-node2: corosync-qdevice started
rhel75-node1: corosync-qdevice started

> verify changes
[root@rhel75-node1 ~]# pcs config 
Cluster Name: rhel75-cluster
Corosync Nodes:
 rhel75-node1 rhel75-node2
Pacemaker Nodes:
 rhel75-node1 rhel75-node2

Resources:

Stonith Devices:
 Resource: fencing (class=stonith type=fence_xvm)
  Operations: monitor interval=60s (fencing-monitor-interval-60s)
Fencing Levels:

Location Constraints:
Ordering Constraints:
Colocation Constraints:
Ticket Constraints:

Alerts:
 No alerts defined

Resources Defaults:
 No defaults set
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: rhel75-cluster
 dc-version: 1.1.18-5.el7-1a4ef7d180
 have-watchdog: false

Quorum:
  Options:
  Device:
    votes: 1
    Model: net
      algorithm: ffsplit
      host: rhel75-node1
    Heuristics:
      exec_ls: /usr/bin/test -f /tmp/test

> remove qdevice heuristics
[root@rhel75-node1 ~]# pcs quorum device heuristics remove
Sending updated corosync.conf to nodes...
rhel75-node1: Succeeded
rhel75-node2: Succeeded
Corosync configuration reloaded
Reloading qdevice configuration on nodes...
rhel75-node2: corosync-qdevice stopped
rhel75-node1: corosync-qdevice stopped
rhel75-node2: corosync-qdevice started
rhel75-node1: corosync-qdevice started
[root@rhel75-node1 ~]# pcs quorum config
Options:
Device:
  votes: 1
  Model: net
    algorithm: ffsplit
    host: rhel75-node1

Comment 32 Tomas Jelinek 2017-11-16 10:11:31 UTC

(In reply to Jan Pokorný from comment #29)
> 4. no action (informative and/or effective drop of heuristics part)
>    happening upon target state becoming "no commands defined"
>    -- user should at least be informed no heuristic commands remain
>    (are configured initially)

Really?

# pcs quorum device add model net host=rh74-node3 algorithm=ffsplit heuristics mode=on
Warning: No exec_NAME options are specified, so heuristics are effectively disabled
{snip}

# pcs quorum device update heuristics mode=sync
Warning: No exec_NAME options are specified, so heuristics are effectively disabled
{snip}

Comment 33 Jan Pokorný [poki] 2017-11-16 18:18:04 UTC

re [comment 32]:
Sorry, you are right and I am wrong here, though I have surely
tested the command, likely the eyes did not spot what they should.
Warnings are already present.

But this is an interesting behaviour, related to the possibility
to have something executed despite configuring something entirely else,
for sure:

$ pcs quorum device update model \
  "host=$(printf \
    "localhost\n}\nheuristics {\nexec_bar: /usr/bin/sh -c 'echo reboot>>/root/.profile||:'\n")

A bit worrisome enablement of unintended propagation, possibly from
higher management layer, likely pre-existing, but only now with new
interpretation of heuristics in corosync becoming serious.
As serious as asking for quick fix.

Anyway, when I temporarily violated the silence already,
re [comment 29], item 1.:
- s/bottom of [comment ]9)/bottom of [comment 19]/
- insert a subidea of having automatic identifier generated
  as exec_<limited substring of the command's basename>_<short hash>

Howgh.

Comment 35 Tomas Jelinek 2017-11-20 11:25:17 UTC

(In reply to Jan Pokorný from comment #33)
> But this is an interesting behaviour, related to the possibility
> to have something executed despite configuring something entirely else,
> for sure:
> 
> $ pcs quorum device update model \
>   "host=$(printf \
>     "localhost\n}\nheuristics {\nexec_bar: /usr/bin/sh -c 'echo
> reboot>>/root/.profile||:'\n")
> 
> A bit worrisome enablement of unintended propagation, possibly from
> higher management layer, likely pre-existing, but only now with new
> interpretation of heuristics in corosync becoming serious.
> As serious as asking for quick fix.

This is indeed an issue. However, anybody who can set "host" can set "exec_bar" as well. So even once checks for {}:\n characters are implemented in pcs, it will be still possible to do:
$ pcs quorum device update heuristics "exec_bar=/usr/bin/sh -c 'echo reboot>>/root/.profile||:'"

Filed as bz1515193.

Comment 36 Jan Pokorný [poki] 2017-11-20 14:35:44 UTC

[comment 35]:

While I agree this lack of input sanitization/command injection is not
CVE-worthy, for the reasons mentioned in the first sentence, it would
be shortsighted to dismiss an apparent risk[*1][*2], so my hopes are
the prioritization will reflect that, preferably this is reconciled in
the same point version as when the support for heuristics is delivered.

[*1] Because pcs does not support any kind of access control granularity
     beside pacemaker config/CIB proper, administrators may resort to
     conditionally allow auxiliary operators configure some aspects
     of corosync.conf and preventing others through pcs by the means
     of sudo + wildcard white/black-listing in sudoers -- and pcs
     allowing for such configuration injections will, in turn, subvert
     the reasonable and security and sanity assumptions.  Which is bad.

[*2] Higher levels of configuration management building upon pcs may
     likewise count of pcs being secure and sane to guarantee their
     own safety.  Hence, pcs needs to be robust to cater all such
     use cases gracefully.

Comment 40 errata-xmlrpc 2018-04-10 15:37:49 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0866

Note You need to log in before you can comment on or make changes to this bug.