1185764 – pmlogger.service status does not reflect reality

Bug 1185764 - pmlogger.service status does not reflect reality

Summary: pmlogger.service status does not reflect reality

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	pcp
Sub Component:
Version:	21
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Nathan Scott
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1185740
TreeView+	depends on / blocked

Reported:	2015-01-26 08:59 UTC by Marius Vollmer
Modified:	2016-11-25 13:33 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2015-12-02 07:59:39 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Marius Vollmer 2015-01-26 08:59:47 UTC

Description of problem:

When pmlooger.service fails to start a working pmlogger process, this is not reflected in the systemd unit state.  There aren't any journal entries either.

Steps to Reproduce:
1. Misconfigure the system so that pmlogger will fail to start
   # systemctl stop pmcd
2. Start pmlogger
   # systemctl start pmlogger

Actual results:

systemctl status reports pmlogger as running, but pmlogger has actually failed.

# systemctl status pmlogger
● pmlogger.service - Performance Metrics Archive Logger
   Loaded: loaded (/usr/lib/systemd/system/pmlogger.service; enabled)
   Active: active (exited) since Mon 2015-01-26 10:57:04 EET; 2s ago
     Docs: man:pmlogger(1)
  Process: 9912 ExecStop=/usr/share/pcp/lib/pmlogger stop (code=exited, status=0/SUCCESS)
  Process: 10090 ExecStart=/usr/share/pcp/lib/pmlogger start (code=exited, status=0/SUCCESS)
 Main PID: 10090 (code=exited, status=0/SUCCESS)

# cat /var/log/pcp/pmlogger/f21.cockpit.lan/pmlogger.log
Log for pmlogger on f21.cockpit.lan started Mon Jan 26 10:57:05 2015

pmlogger: Cannot connect to PMCD on host "local:": Connection refused

Log finished Mon Jan 26 10:57:05 2015

Expected results:

systemctl status pmlogger should report pmlogger as failed.

Comment 1 Nathan Scott 2015-01-28 04:19:36 UTC

The situation is more convoluted than reflected here, I think.  pmlogger can be configured to monitor (potentially many) remote systems and does not necessarily have to be configured to record from the local host.  It's not always just a case of chkconfig pmlogger on, service start, and one daemon results - multiple loggers or none at all may need to be started (depends on the contents of the /etc/pcp/pmlogger/control configuration file).

In summary, "its complicated".  There are cron scripts active which verify that the pmloggers that are meant to be running, are running, based on the contents of the control file - so if the unfortunate case arises whereby pmlogger is wanting to monitor locally, and no local pmcd is started yet, the situation will resolve itself in due course.

We can improve the situation further however, there's some upstream work being considered that would make this problem scenario go away entirely (some pre-cursor work to enabling pmlogger automatic-pmcd-reconnection).  I had not considered that work in light of this problem though (so, thanks!) - perhaps we should be prioritising that work more highly.

cheers.

Comment 2 Frank Ch. Eigler 2015-01-28 19:10:34 UTC

see also http://oss.sgi.com/bugzilla/show_bug.cgi?id=1096

Comment 3 Marius Vollmer 2015-02-24 12:17:45 UTC

> The situation is more convoluted than reflected here, I think.  pmlogger can
> be configured to monitor (potentially many) remote systems and does not 
> necessarily have to be configured to record from the local host.

Are you saying that it is impossible to say what the status of pmlogger.service is because it might consist of multiple processes that can each fail independently?

I think http://oss.sgi.com/bugzilla/show_bug.cgi?id=1096 is a most excellent list of improvements.

Comment 4 Marius Vollmer 2015-02-24 12:29:09 UTC

In addition to the case where pmlogger.service is active but no pmlogger process is running, it is also possible that pmlogger.service is inactive but there is in fact a pmlogger process running.

Steps from bug 1188193:

1. systemctl enable pmlogger
2. systemctl start pmlogger
3. systemctl stop pmlogger
4. sleep 1h or so
5. pgrep pmlogger

Comment 5 Frank Ch. Eigler 2015-02-24 14:11:33 UTC

Marius, in this case, note that the pmlogger service is still -enabled-,
and so the periodic cron jobs feel entitled to restart/keep-running
pmlogger jobs listed in the control file.

Comment 6 Mark Goodwin 2015-02-24 22:46:09 UTC

observation - perhaps it would help if pmlogger.service was split out into pmlogger.service and pmlogger-farm.service (or some such name). The former would just deal with a single pmlogger monitoring the localhost (aka primary pmlogger). The latter would manage logging one or more remote hosts (if enabled).

Thoughts?

Comment 7 Marius Vollmer 2015-02-25 07:45:19 UTC

(In reply to Frank Ch. Eigler from comment #5)
> Marius, in this case, note that the pmlogger service is still -enabled-,
> and so the periodic cron jobs feel entitled to restart/keep-running
> pmlogger jobs listed in the control file.

Yeah, I know.  If that is how people expect PCP to work, fine, but the current integration with systemd is still useless and arguably harmful since it adds confusion and frustration for someone who knows systemd but not pcp.  IMO.

If the cron job feels entitled to start pmlogger, it should do that via "systemctl start pmlogger" or "service pmlogger start" and not behind their backs.

Comment 8 Marius Vollmer 2015-02-25 07:51:35 UTC

(In reply to Mark Goodwin from comment #6)
> observation - perhaps it would help if pmlogger.service was split out into
> pmlogger.service and pmlogger-farm.service (or some such name).

I think so.  Ignoring any compatibility concerns and without any knowledge how pmlogger is actually configured in detail, I would try to use unit templates and instantiate one per pmlogger process.

Frank has filed http://oss.sgi.com/bugzilla/show_bug.cgi?id=1096 so I assumed you know about these issues.  Why are we even discussing this beyond "patches welcome"?

I would be willing to spend a few days producing and testing some patches for http://oss.sgi.com/bugzilla/show_bug.cgi?id=1096.  Are you willing to take them?

Comment 9 Nathan Scott 2015-02-25 20:29:24 UTC

(In reply to Mark Goodwin from comment #6)
> observation - perhaps it would help if pmlogger.service was split out into
> pmlogger.service and pmlogger-farm.service (or some such name). The former
> would just deal with a single pmlogger monitoring the localhost (aka primary
> pmlogger). The latter would manage logging one or more remote hosts (if
> enabled).
> 
> Thoughts?

This issue goes away once we have pmlogger local context support and default logger using that, doesn't it Mark?  (IOW, there is no dependence on a running pmcd at all then, for the default logger, and no dependence between start scripts, etc, etc).  It would be a good idea to bump that work up the priority list & not complicate the scripts/configuration futher by splitting 'em, I think.

cheers.

Comment 10 Marius Vollmer 2015-02-26 07:39:48 UTC

> This issue goes away once we have pmlogger local context support and default
> logger using that, doesn't it Mark? 

Can a local context use all pmdas?

Comment 11 Nathan Scott 2015-02-26 08:47:04 UTC

(In reply to Marius Vollmer from comment #10)
> > This issue goes away once we have pmlogger local context support and default
> > logger using that, doesn't it Mark? 
> 
> Can a local context use all pmdas?

All DSO PMDAs (which is usually the most important ones, like the kernel PMDAs) ... but not all PMDAs.

Comment 12 Fedora End Of Life 2015-11-04 13:26:16 UTC

This message is a reminder that Fedora 21 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 21. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '21'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 21 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 13 Fedora End Of Life 2015-12-02 07:59:47 UTC

Fedora 21 changed to end-of-life (EOL) status on 2015-12-01. Fedora 21 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 14 Stef Walter 2016-11-25 13:33:02 UTC

Hey Marius, this is a RHEL bug too right? Maybe we should file it there?

Note You need to log in before you can comment on or make changes to this bug.