Bug 1188193

Summary:	pmlogger is spontaneously restarted after systemctl stop
Product:	[Fedora] Fedora	Reporter:	Marius Vollmer <mvollmer>
Component:	pcp	Assignee:	Nathan Scott <nathans>
Status:	CLOSED NOTABUG	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	21	CC:	brolley, fche, lberk, mgoodwin, nathans, pcp, scox
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2015-02-02 15:56:59 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1185740

Description Marius Vollmer 2015-02-02 10:10:46 UTC

Description of problem:

Executing systemctl stop pmlogger stops the primary pmlogger, as expected.  However, some time later a pmlogger process spontaneously reappears.

Steps to Reproduce:
1. systemctl enable pmlogger
2. systemctl start pmlogger
3. systemctl stop pmlogger
4. sleep 1h or so
5. pgrep pmlogger

Actual results:

pgrep finds a pmlogger process (and it is writing to /var/log/pcp/pmlogger/...)

Expected results:

pgrep doesn't find a pmlogger process.

Additional info:

It's probably pmlogger_check.

Comment 1 Frank Ch. Eigler 2015-02-02 15:56:59 UTC

This is expected behavior with the class service-pmlogger.  There are cron jobs
running every 30 mins or so that restart any dead pmloggers, related to the
less-frequent log-rotation cron jobs.  If you wish to disable pmlogger, you must
systemctl disable it, not just systemctl stop it.

(With pmmgr, this would not happen.)

Comment 2 Nathan Scott 2015-02-03 03:09:01 UTC

> There are cron jobs running every 30 mins or so that restart any dead
> pmloggers, related to the less-frequent log-rotation cron jobs.
> If you wish to disable pmlogger, you must
> systemctl disable it, not just systemctl stop it.

FWLIW, this is not quite correct - the /etc/pcp/pmlogger/control file defines a set of expected hosts to be monitored by pmlogger processes.  It is the combination of an entry (or entries) in this file and the pmlogger service enablement state that defines whether the cron and init scripts will start pmlogger(s).

By default, we enable a localhost entry in the pmlogger control file (which is the pmlogger you are observing, Marius) but sysadmins can and do certainly add remote monitoring for other hosts too.  IOW, please take care if/when disabling this service.

cheers.

Comment 3 Marius Vollmer 2015-02-03 07:57:51 UTC

(In reply to Frank Ch. Eigler from comment #1)
> This is expected behavior with the class service-pmlogger.

It is not, however, expected behavior of a systemd unit.

When switching data collection on/off via Cockpit, we will both start/stop and enable/disable the pmlogger service, so this bug will not affect us much there.

But we probably also want to point out problems with pmlogger, and it would be nice to use the normal mechanisms for that: systemd unit status, including some lines from the journal.  We would just point to the generic systemd UI for this and be done.

Comment 4 Frank Ch. Eigler 2015-02-03 12:19:55 UTC

This is an example of the general class of problem I was pointing out at

https://github.com/cockpit-project/cockpit/pull/1689#issuecomment-71824146

whereby the system pmlogger.service does more stuff than you
need/expect.  The system pmmgr gives you more control, and
a cockpit-specific pmmgr would give you complete control.

Comment 5 Nathan Scott 2015-02-03 22:20:52 UTC

> It is not, however, expected behavior of a systemd unit.

There's a mismatch between what some of the PCP services (pmlogger, pmie, and pmmgr) do and the facilities systemd provides - PCP needs to be able to control services monitoring multiple hosts, and systemd unit files have only a notion of localhost service.

> whereby the system pmlogger.service does more stuff than you need/expect.

pmmgr has many of the same issues (in fact, it also tries to do all of pmie service management in addition to pmlogger, so one could argue in reverse that it does far more than you need/expect relative to the regular pmlogger scripts - *shrug*).

Anyway, fundamentally, controlling distributed services is a hard problem that doesn't really fit well into either the old-school init or systemd models, and that's the root issue here I think (and yeah, I understand the Cockpit folks are interested in the localhost case only so far).

Comment 6 Marius Vollmer 2015-02-05 11:50:13 UTC

(In reply to Frank Ch. Eigler from comment #4)
> whereby the system pmlogger.service does more stuff than you
> need/expect.

Arguably, this is a case of pmlogger doing less than I expect.

Comment 7 Frank Ch. Eigler 2015-02-08 16:42:55 UTC

Marius, the thing is that the "service pmlogger" in general does
more than localhost logging, and for whatever historical reasons,
it has cron jobs to back up exit-prone individual subtasks.  If
you want to piggyback on "service pmlogger", you need to control
*both* explicit and implicit restarts.

With "service pmmgr" (systemwide pmmgr), no cron jobs are used,
so shutdown/restart works more like what you expect, but again
systemwide pmmgr in general does more than localhost logging.

A private pmmgr-based service would let you opt out of those
general cases and give you full control (and still some help
in terms of log rotation etc).

(A private pmlogger-based service is probably too much work.)

Each choice has pros & cons.

Comment 8 Nathan Scott 2015-02-09 00:41:55 UTC

> [...] exit-prone individual subtasks

This is fixable BTW, and increasingly it looks like something we
should tackle (pmlogger reconnect) - kenj is hacking in the area
currently, so this will likely soon become a reality.

> A private pmmgr-based service would let you opt out of those
> general cases and give you full control (and still some help
> in terms of log rotation etc).

Private pmlogger-based setups are possible too (see -c option to
pmlogger_check and friends) & without the need for more daemons.

> (A private pmlogger-based service is probably too much work.)

Its approx the same as pmmgr, but in principle I agree - both are
more work than necessary.  pmmgr also misses out on local-context
opportunities (IOW when operating with no pmcd, and pmlogger as
the only PCP daemon) that the GSS folks are interested in, which
may well be of interest to the Cockpit folks also.

cheers.

Comment 9 Marius Vollmer 2015-02-09 09:03:28 UTC

(In reply to Frank Ch. Eigler from comment #7)
> Marius, the thing is that the "service pmlogger" in general does
> more than localhost logging, and for whatever historical reasons,
> it has cron jobs to back up exit-prone individual subtasks.

I understand.  I appreciate that it is not trivial to put multiple, independent processes behind a single systemd service, each with their own independent success/failure state.

> If you want to piggyback on "service pmlogger", you need to control
> *both* explicit and implicit restarts.

I wouldn't call it piggybacking.  We want to do the right thing, not the easy thing, and we want to help pcp do the right thing as well, for everyone.

(The easy thing is the old "cockpit-logger" plus "cockpit-logger-janitor" services which just reuse the pmlogger binary and control it with very little extra code on top.)

> With "service pmmgr" (systemwide pmmgr), no cron jobs are used,
> so shutdown/restart works more like what you expect, but again
> systemwide pmmgr in general does more than localhost logging.

I still have to seriously look at pmmgr.

> A private pmmgr-based service would let you opt out of those
> general cases and give you full control (and still some help
> in terms of log rotation etc).

We want to opt into the more complex case, so that a knowledgeable person can configure Cockpits use of PCP along with his/her other needs.

We would only go back to our own private pmlogger service if we can't get a good enough user experience out of the system pmlogger without too many workarounds. 

I think we are still mostly good (since we will always enable/start and disable/stop pmlogger.service at the same time), but catching pmlogger failures is awkward and we can't in good faith point people to the pmlogger.service UI because it will not do what they expect.

Anyway, this is off-topic for this bug report, sory for rambling on.  I'll try to summarize this more coherently later.

> (A private pmlogger-based service is probably too much work.)

(I think we did it with "cockpit-logger.service", no?  That was about one day of work after learning enough about pmlogger.  Less work than grabling with the system pmlogger, actually. :-)