Bug 1376856

Summary: pmlogger pmcd-restart persistence breaks pmmgr assumptions
Product: [Fedora] Fedora Reporter: Frank Ch. Eigler <fche>
Component: pcpAssignee: Frank Ch. Eigler <fche>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 29CC: fche, lberk, mgoodwin, nathans, pcp
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-03-05 04:22:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Frank Ch. Eigler 2016-09-16 15:29:58 UTC
With new pmlogger's persistence when it detects its target pmcd dying and coming back, it does a "validating metrics" pass and keeps going.  This sounds nice for some purposes.

However, consider the case where the remote pmcd came back because new PMDAs were installed.  But what if we want to log those new PMDAs?  Now the pmlogger manager (whether the cron script or pmmgr) do not know that anything happened, which means that pmlogconf opportunities are missed.  Which means that auto-logging the new metrics is not done, until the next natural cycle time (day?).

Please add an option to pmlogger to suppress this auto-reconnection, or reject it if the pmda suite appears to have changed.

Comment 1 Nathan Scott 2016-09-16 20:27:09 UTC
| target pmcd dying
[...]
| the remote pmcd came back because new PMDAs were installed

PMDA installation often doesn't cause a pmcd restart (just sighup) - and increasingly will happen less and less as $force_restart gets slowly removed from the code base, so this line of argument is flawed.

This needs to be tackled in pmmgr really, if at all. The same problem also affects pmmgr's use of pmie, and in a much worse way due to the nature of some rules spanning large time intervals.  pmie has always auto-reconnected too, of course, so this is not something that should be blamed on the recent improvements to pmlogger.

Comment 2 Frank Ch. Eigler 2016-09-16 20:33:05 UTC
(In reply to Nathan Scott from comment #1)
> | target pmcd dying
> [...]
> | the remote pmcd came back because new PMDAs were installed
> 
> PMDA installation often doesn't cause a pmcd restart (just sighup) - and
> increasingly will happen less and less as $force_restart gets slowly removed
> from the code base, so this line of argument is flawed.

Whatever shape the argument, the underlying issue is real.  If pmlogconf is the premier way of autoconfiguring loggers, changes such as pmdas coming and going must be reflected in pmlogconf getting run.  Whether this is done by pmmgr (predictably), or by pmlogger_check (unpredictably), it is worse than the new status quo (not at all).


> The same problem also affects pmmgr's use of pmie,

No.  PMIE is hardly configured at all - pmieconf is perfunctory.  And multi-host pmie rules are simply outside pmmgr's host-focused definition.

Comment 4 Nathan Scott 2016-09-18 14:51:19 UTC
(In reply to Frank Ch. Eigler from comment #2)
> > [...]
> > PMDA installation often doesn't cause a pmcd restart (just sighup) - and
> > increasingly will happen less and less as $force_restart gets slowly removed
> > from the code base, so this line of argument is flawed.
> 
> Whatever shape the argument, the underlying issue is real.  If pmlogconf is
> the premier way of autoconfiguring loggers, changes such as pmdas coming and
> going must be reflected in pmlogconf getting run.  Whether this is done by
> pmmgr (predictably), or by pmlogger_check (unpredictably), it is worse than
> the new status quo (not at all).

I don't think you followed completely.  For PMDAs configured in the ideal way (i.e. without a pmcd-restart - i.e. all PMDAs running as $PCP_USER currently) pmcd was not being restarted already.  In this situation, the change is (and always has been) encoded in the PDU exchange between pmcd/pmlogger, pmlogger records a mark record, and there is no restart of pmcd, never was.

So, the assertion this is a new regression is incorrect - the pmmgr assumption that pmcd would be restarted on PMDA install has always been incorrect.

However, this *is* fixable in pmmgr.  If pmmgr maintains a connection to pmcd, it would be able to detect both PMDA reconfiguration messages and loss of connection to pmcd - and could act in the way you feel is desirable here.

Finally, while pmieconf does not actively probe today, there's every reason to think it could become more host-probe-dynamic like pmlogconf in the future.  

You continue to dismiss the situation where people write their own rules, rather than use pmieconf, but in practice for many pmie deployments this is the norm.

Comment 6 Frank Ch. Eigler 2016-09-20 15:29:30 UTC
> I don't think you followed completely.

On the contrary, I followed completely.

> So, the assertion this is a new regression is incorrect - the pmmgr
> assumption that pmcd would be restarted on PMDA install has always been
> incorrect.

Since more cases now trigger this "incorrect" assumption, it is obviously
both "new" and a "regression".

> However, this *is* fixable in pmmgr.  If pmmgr maintains a connection to
> pmcd, it would be able to detect both PMDA reconfiguration messages and loss
> of connection to pmcd - and could act in the way you feel is desirable here.

Yes, that could work, assuming a cooperative libpcp.

It would not help those users stuck with service-pmlogger.  I guess they are
to further fall behind in terms of responsiveness.


> [...]
> You continue to dismiss the situation where people write their own rules,
> rather than use pmieconf, but in practice for many pmie deployments this is
> the norm.

I don't dismiss it in the abstract.  This case is simply not relevant to
pmmgr's normal host-targeted model.  It is off topic.  People with such
pmie rules can run them with service-pmie (or even with service-pmmgr,
linked to a dummy host).  It is off topic.

Comment 7 Fedora End Of Life 2017-02-28 10:19:14 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 26 development cycle.
Changing version to '26'.

Comment 8 Fedora End Of Life 2018-05-03 08:28:40 UTC
This message is a reminder that Fedora 26 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 26. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '26'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 26 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged  change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Comment 9 Jan Kurik 2018-08-14 10:22:12 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 29 development cycle.
Changing version to '29'.

Comment 10 Nathan Scott 2019-03-05 04:22:45 UTC
pmmgr is essentially unmaintained in upstream PCP, and the PCP dev team at Red Hat plans no further work on it due to far more pressing priorities.  Closing out - feel free to contribute upstream, however, and have changes flow back through the usual PCP update channels.