This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1376856 - pmlogger pmcd-restart persistence breaks pmmgr assumptions
pmlogger pmcd-restart persistence breaks pmmgr assumptions
Status: NEW
Product: Fedora
Classification: Fedora
Component: pcp (Show other bugs)
26
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Frank Ch. Eigler
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-09-16 11:29 EDT by Frank Ch. Eigler
Modified: 2017-08-23 11:00 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Frank Ch. Eigler 2016-09-16 11:29:58 EDT
With new pmlogger's persistence when it detects its target pmcd dying and coming back, it does a "validating metrics" pass and keeps going.  This sounds nice for some purposes.

However, consider the case where the remote pmcd came back because new PMDAs were installed.  But what if we want to log those new PMDAs?  Now the pmlogger manager (whether the cron script or pmmgr) do not know that anything happened, which means that pmlogconf opportunities are missed.  Which means that auto-logging the new metrics is not done, until the next natural cycle time (day?).

Please add an option to pmlogger to suppress this auto-reconnection, or reject it if the pmda suite appears to have changed.
Comment 1 Nathan Scott 2016-09-16 16:27:09 EDT
| target pmcd dying
[...]
| the remote pmcd came back because new PMDAs were installed

PMDA installation often doesn't cause a pmcd restart (just sighup) - and increasingly will happen less and less as $force_restart gets slowly removed from the code base, so this line of argument is flawed.

This needs to be tackled in pmmgr really, if at all. The same problem also affects pmmgr's use of pmie, and in a much worse way due to the nature of some rules spanning large time intervals.  pmie has always auto-reconnected too, of course, so this is not something that should be blamed on the recent improvements to pmlogger.
Comment 2 Frank Ch. Eigler 2016-09-16 16:33:05 EDT
(In reply to Nathan Scott from comment #1)
> | target pmcd dying
> [...]
> | the remote pmcd came back because new PMDAs were installed
> 
> PMDA installation often doesn't cause a pmcd restart (just sighup) - and
> increasingly will happen less and less as $force_restart gets slowly removed
> from the code base, so this line of argument is flawed.

Whatever shape the argument, the underlying issue is real.  If pmlogconf is the premier way of autoconfiguring loggers, changes such as pmdas coming and going must be reflected in pmlogconf getting run.  Whether this is done by pmmgr (predictably), or by pmlogger_check (unpredictably), it is worse than the new status quo (not at all).


> The same problem also affects pmmgr's use of pmie,

No.  PMIE is hardly configured at all - pmieconf is perfunctory.  And multi-host pmie rules are simply outside pmmgr's host-focused definition.
Comment 4 Nathan Scott 2016-09-18 10:51:19 EDT
(In reply to Frank Ch. Eigler from comment #2)
> > [...]
> > PMDA installation often doesn't cause a pmcd restart (just sighup) - and
> > increasingly will happen less and less as $force_restart gets slowly removed
> > from the code base, so this line of argument is flawed.
> 
> Whatever shape the argument, the underlying issue is real.  If pmlogconf is
> the premier way of autoconfiguring loggers, changes such as pmdas coming and
> going must be reflected in pmlogconf getting run.  Whether this is done by
> pmmgr (predictably), or by pmlogger_check (unpredictably), it is worse than
> the new status quo (not at all).

I don't think you followed completely.  For PMDAs configured in the ideal way (i.e. without a pmcd-restart - i.e. all PMDAs running as $PCP_USER currently) pmcd was not being restarted already.  In this situation, the change is (and always has been) encoded in the PDU exchange between pmcd/pmlogger, pmlogger records a mark record, and there is no restart of pmcd, never was.

So, the assertion this is a new regression is incorrect - the pmmgr assumption that pmcd would be restarted on PMDA install has always been incorrect.

However, this *is* fixable in pmmgr.  If pmmgr maintains a connection to pmcd, it would be able to detect both PMDA reconfiguration messages and loss of connection to pmcd - and could act in the way you feel is desirable here.

Finally, while pmieconf does not actively probe today, there's every reason to think it could become more host-probe-dynamic like pmlogconf in the future.  

You continue to dismiss the situation where people write their own rules, rather than use pmieconf, but in practice for many pmie deployments this is the norm.
Comment 6 Frank Ch. Eigler 2016-09-20 11:29:30 EDT
> I don't think you followed completely.

On the contrary, I followed completely.

> So, the assertion this is a new regression is incorrect - the pmmgr
> assumption that pmcd would be restarted on PMDA install has always been
> incorrect.

Since more cases now trigger this "incorrect" assumption, it is obviously
both "new" and a "regression".

> However, this *is* fixable in pmmgr.  If pmmgr maintains a connection to
> pmcd, it would be able to detect both PMDA reconfiguration messages and loss
> of connection to pmcd - and could act in the way you feel is desirable here.

Yes, that could work, assuming a cooperative libpcp.

It would not help those users stuck with service-pmlogger.  I guess they are
to further fall behind in terms of responsiveness.


> [...]
> You continue to dismiss the situation where people write their own rules,
> rather than use pmieconf, but in practice for many pmie deployments this is
> the norm.

I don't dismiss it in the abstract.  This case is simply not relevant to
pmmgr's normal host-targeted model.  It is off topic.  People with such
pmie rules can run them with service-pmie (or even with service-pmmgr,
linked to a dummy host).  It is off topic.
Comment 7 Fedora End Of Life 2017-02-28 05:19:14 EST
This bug appears to have been reported against 'rawhide' during the Fedora 26 development cycle.
Changing version to '26'.

Note You need to log in before you can comment on or make changes to this bug.