Description of problem: localhost:~> date ; pmcollectl -f ./foo -c 3 -i 5s ; date ; Thu Oct 8 16:40:11 EEST 2015 Thu Oct 8 16:40:26 EEST 2015 localhost:~> pmval kernel.all.nprocs -U $(awk '/^Archive:/ {print $3}' foo) metric: kernel.all.nprocs archive: 6NxYlW.localhost.localdomain host: localhost.localdomain start: Thu Oct 8 16:40:11 2015 end: Thu Oct 8 16:40:21 2015 semantics: instantaneous value units: none samples: all 16:40:16.776 538 16:40:21.772 536 pmval: pmFetchArchive: End of PCP archive log zsh: exit 1 pmval kernel.all.nprocs -U $(awk '/^Archive:/ {print $3}' foo) localhost:~> This might be more of an issue when starting recording with e.g. 5m or 10m interval when an issue is ongoing. Seen with both pmcollectl and pmrep.
I wonder if this is a product of how pmsubsys.py handles get_stats() and getting the difference between metric fetches for rate interpolation (which aiui, requires two sets of metrics with different, non-zero timestamps). It could be that pmcollectl is collecting three samples, however due to pmsubsys.py, (which requires rate calculation/conversion), only outputs/returns two values (skipping the first). While required for pmcollectl to properly format complete output in live situations, I can see how it would be advantageous to report all the values for metrics of 'instantaneous value' types in archives. If this is indeed the case, I'm not entirely sure if/how to change pmcollectl (or perhaps pmsubsys.py) to fix this.
Hmm, this is an interesting one. Lukas is on the right path, except the "problem" is actually in pmlogger! Turns out (news to me) that pmlogger interprets the 'log every N interval' to be starting at the time the first interval elapsed, and not starting at the time pmlogger starts. IOW, pmlogger (which is doing the recording here, below the pmRecordControl API) does not issue a fetch until the first interval passes, except for the pmcd.pmlogger.* control metrics which it always fetches & logs immediately. So, if we look at the timestamps associated with the archives being recorded here ("pmdumplog -a" is the best tool for this, it doesn't interpolate or rate convert or anything like that, which the other tools usually will), we see no "kernel.all.nprocs" value until the first interval passes, then two, with the last being right at the pmlogger termination time. It may be this is by-design from pmloggers POV (so that we do not issue a potentially large number of fetches all at once, for every fetch group - not sure) - Ken? We can work around this though if that's the case - from some experimentation, if pmcollectl creates a log-once fetch group in addition to the 'log every N seconds' group for pmlogger, that gives the anticipated behaviour. OTOH, maybe pmlogger behaviour is not as desired - will wait to hear from Ken.
Created attachment 1081847 [details] Force pmlogger to record an initial sample
pmlogger has always been thus ... or rather ever since it was changed to use __pmAFregister() to control the scheduling of fetches and the delay comes from the semantics of __pmAFregister() and is documented in the man page there. And I think pmlogger was changed to use __pmAFregister() in 2004 or thereabouts (I can find one circa 2004 archive that definitely does not have this behaviour). So this is definitely not a recent change in behaviour, and the first time that anyone has noticed in at least a decade. In pmlogger there was no attempt to stagger start and avoid a thundering herd ... pmie does this but pmlogger simply delays the thundering herd for one delta time period! So no tricky performance justification for the current behaviour. Nathan's patch works for the pmcollectl case, but it is probably appropriate to consider if pmlogger should be changed so that all users of pmlogger would see an initial sample soon after the configuration file had been parsed and then the current behaviour would take over, and the pmcollectl patch would not be required. Biggest fallout risk would be in QA I think. I've looked at the pmlogger code and the change there is low risk (and would be the similar to Nathan's pmcollectl patch, because the alternative of changing the __pmAFregister() semantics is just too scary).
(In reply to Ken McDonell from comment #4) > pmlogger has always been thus ... or rather ever since it was changed [...] > > Biggest fallout risk would be in QA I think. I've looked at the pmlogger > code and the change there is low risk (and would be the similar to Nathan's > pmcollectl patch, because the alternative of changing the __pmAFregister() > semantics is just too scary). Heh. Thanks for the details Ken! If it gets out of hand with QA, lemme know and we can go with that simpler patch (no risk, but probably not as nice a fix).
Turns out that it is _easier_ to fix in libpcp by extending the __pmAF*() family with __pmAFsetup() which is like __pmAFregister() but takes an extra argument to specify the offset to the first fetch. Then change pmlogger to call __pmAFsetup() with an offset of 0 after parsing the config file and when receiving a PDU from pmlc to turn logging on or change the logging delta for some metric(s). But ... as foreshadowed, this produces 24 new QA failures and these will need to be triaged (some appear simple, but others will take considerable effort).
Fixed in commit c362945 in my tree (and follow on QA commits).
pcp-3.10.8-1.fc21 has been submitted as an update to Fedora 21. https://bodhi.fedoraproject.org/updates/FEDORA-2015-3fc0cf98a1
pcp-3.10.8-1.fc22 has been submitted as an update to Fedora 22. https://bodhi.fedoraproject.org/updates/FEDORA-2015-b1cbfc4842
pcp-3.10.8-1.fc23.1 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2015-8f41fd0048
pcp-3.10.8-1.el5 has been submitted as an update to Fedora EPEL 5. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2015-60b17897b4
pcp-3.10.8-1.fc23.1 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with $ su -c 'dnf --enablerepo=updates-testing update pcp' You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-8f41fd0048
pcp-3.10.8-1.el5 has been pushed to the Fedora EPEL 5 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with $ su -c 'yum --enablerepo=epel-testing update pcp' You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2015-60b17897b4
pcp-3.10.8-1.fc22 has been pushed to the Fedora 22 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with $ su -c 'dnf --enablerepo=updates-testing update pcp' You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-b1cbfc4842
pcp-3.10.8-1.fc21 has been pushed to the Fedora 21 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with $ su -c 'dnf --enablerepo=updates-testing update pcp' You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-3fc0cf98a1
pcp-3.10.9-1.fc22 has been submitted as an update to Fedora 22. https://bodhi.fedoraproject.org/updates/FEDORA-2015-d08245c076
pcp-3.10.9-1.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2015-2b40815137
pcp-3.10.9-1.el5 has been submitted as an update to Fedora EPEL 5. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2015-2ac90519bc
pcp-3.10.9-1.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with $ su -c 'dnf --enablerepo=updates-testing update pcp' You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-2b40815137
pcp-3.10.9-1.fc22 has been pushed to the Fedora 22 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with $ su -c 'dnf --enablerepo=updates-testing update pcp' You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-d08245c076
pcp-3.10.9-1.el5 has been pushed to the Fedora EPEL 5 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with $ su -c 'yum --enablerepo=epel-testing update pcp' You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2015-2ac90519bc
pcp-3.10.9-1.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.
pcp-3.10.9-1.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report.
pcp-3.11.0-1.el5 has been submitted as an update to Fedora EPEL 5. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-57b7efb2d7
pcp-3.11.0-1.el5 has been pushed to the Fedora EPEL 5 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-57b7efb2d7
pcp-3.11.1-1.el5 has been pushed to the Fedora EPEL 5 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-5b519318e0
pcp-3.11.1-1.el5 has been pushed to the Fedora EPEL 5 stable repository. If problems still persist, please make note of it in this bug report.