A somewhat repeatable failure affects my f19 workstation,
wherein if pmcd is started during normal system boot,
pmdasystemd can often fail (see logs below). If pmcd is
restarted later, no further problems appear.
Log for pmdasystemd on very.elastic.org started Thu Mar 6 07:48:06 2014
[Thu Mar 6 07:48:17] pmdasystemd(1002) Critical: __pmdaSetupPDU: PMDA pmdasystemd send creds: Broken pipe
[Thu Mar 6 07:48:17] pmdasystemd(1002) Error: PMDA Initialisation Failed
Log finished Thu Mar 6 07:48:17 2014
[Thu Mar 6 07:48:09] pmcd(987) Warning: pduread: timeout (after 3.000 sec) while attempting to read 12 bytes out of 12 in HDR on fd=17
pmcd: error at initial PDU exchange with systemd PMDA: Timeout waiting for a response from PMCD
journalctl -b extracts:
Mar 06 07:47:40 very.elastic.org systemd-journal: Journal started
Mar 06 07:47:42 very.elastic.org systemd-journal: Journal stopped
Mar 06 07:47:48 very.elastic.org systemd-journal: Allowing runtime journal files to grow to 197.6M.
Mar 06 07:47:48 very.elastic.org systemd-journald: Received SIGTERM
Mar 06 07:47:48 very.elastic.org systemd-journal: Journal started
Mar 06 07:48:05 very.elastic.org systemd-journal: Allowing system journal files to grow to 4.0G.
Mar 06 07:48:09 very.elastic.org systemd-journal: Forwarding to syslog missed 90 messages.
Mar 06 07:48:11 very.elastic.org ntpd: 0.0.0.0 c61c 0c clock_step -0.546935 s
Mar 06 07:48:10 very.elastic.org ntpd: 0.0.0.0 c614 04 freq_mode
Mar 06 07:48:10 very.elastic.org systemd: Time has been changed
Mar 06 07:48:11 very.elastic.org ntpd: 0.0.0.0 c618 08 no_sys_peer
It appears to be during this clock-changing, journald-restarty time that
pmcd happens to start pmdasystemd. For reasons unknown, pmdasystemd
appears to pmcd to take longer than 3.0s to reply to some early init
packet, and in accordance with http://oss.sgi.com/bugzilla/show_bug.cgi?id=1036
forever gives up on the pmda. :-(
I'm not sure what the best fix might be. Delaying systemd journal
connections, switching to multithreading, ... or letting pmcd wait
a little longer during startup?
Possibly a situation where PM_ERR_PMDANOTREADY might help us?
See src/pmdas/rpm/rpm.c::notready() for an example, or pmdasample.
pcpfans.git branch fche/systemd-slow contains commit 82a8aa97,
which should fix this problem. I've been unable to reproduce
the original problem on a test computer. (Unfortunately,
since the problem is intermittent and relies on systemd/journald
doing some mysterious concurrent activity, this is not proof
that the problem is gone.)
Thanks for giving it a spin Frank.
If the problem is observed again, it would be really helpful to have the tail of pmcd.log and the pmda's systemd.log files attached to the bug.
Ken, certainly will report if/when the problem reappears.
Log extracts from the original failures are here at comment #0.
There are 2 different timeout scenarios, and I think the connect_pmcd() change will almost certainly fix one of them (the one documented in Comment 0 above).
But if the PMDA fails early in life again it will be important to know if this is the _other_ scenario, or my brave assertion is false.
Hence the request for the logs if a new failure is observed.
pcp-3.9.4-1.fc20 has been submitted as an update for Fedora 20.
pcp-3.9.4-1.fc19 has been submitted as an update for Fedora 19.
pcp-3.9.4-1.el6 has been submitted as an update for Fedora EPEL 6.
* should fix your issue,
* was pushed to the Fedora EPEL 6 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=epel-testing pcp-3.9.4-1.el6'
as soon as you are able to.
Please go to the following url:
then log in and leave karma (feedback).
pcp-3.9.4-1.el6 has been pushed to the Fedora EPEL 6 stable repository. If problems still persist, please make note of it in this bug report.
pcp-3.9.5-1.fc19 has been submitted as an update for Fedora 19.
pcp-3.9.5-1.fc20 has been submitted as an update for Fedora 20.
pcp-3.9.5-1.fc19 has been pushed to the Fedora 19 stable repository. If problems still persist, please make note of it in this bug report.
pcp-3.9.5-1.fc20 has been pushed to the Fedora 20 stable repository. If problems still persist, please make note of it in this bug report.