Bug 1269921 - pmRecordControl misses the first sample
Summary: pmRecordControl misses the first sample
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: pcp
Version: 22
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Nathan Scott
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-10-08 13:43 UTC by Marko Myllynen
Modified: 2016-04-06 14:56 UTC (History)
8 users (show)

Fixed In Version: pcp-3.10.9-1.fc23 pcp-3.10.9-1.fc22 pcp-3.11.1-1.el5
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-04-06 14:56:00 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Force pmlogger to record an initial sample (750 bytes, patch)
2015-10-12 01:54 UTC, Nathan Scott
no flags Details | Diff

Description Marko Myllynen 2015-10-08 13:43:57 UTC
Description of problem:
localhost:~> date ; pmcollectl -f ./foo -c 3 -i 5s ; date ;
Thu Oct  8 16:40:11 EEST 2015
Thu Oct  8 16:40:26 EEST 2015
localhost:~> pmval kernel.all.nprocs -U $(awk '/^Archive:/ {print $3}' foo)

metric:    kernel.all.nprocs
archive:   6NxYlW.localhost.localdomain
host:      localhost.localdomain
start:     Thu Oct  8 16:40:11 2015
end:       Thu Oct  8 16:40:21 2015
semantics: instantaneous value
units:     none
samples:   all
16:40:16.776        538
16:40:21.772        536

pmval: pmFetchArchive: End of PCP archive log
zsh: exit 1     pmval kernel.all.nprocs -U $(awk '/^Archive:/ {print $3}' foo)
localhost:~> 

This might be more of an issue when starting recording with e.g. 5m or 10m interval when an issue is ongoing.

Seen with both pmcollectl and pmrep.

Comment 1 Lukas Berk 2015-10-08 14:42:33 UTC
I wonder if this is a product of how pmsubsys.py handles get_stats() and getting the difference between metric fetches for rate interpolation (which aiui, requires two sets of metrics with different, non-zero timestamps).

It could be that pmcollectl is collecting three samples, however due to pmsubsys.py, (which requires rate calculation/conversion), only outputs/returns two values (skipping the first).  While required for pmcollectl to properly format complete output in live situations, I can see how it would be advantageous to report all the values for metrics of 'instantaneous value' types in archives.  If this is indeed the case, I'm not entirely sure if/how to change pmcollectl (or perhaps pmsubsys.py) to fix this.

Comment 2 Nathan Scott 2015-10-12 01:52:53 UTC
Hmm, this is an interesting one.  Lukas is on the right path, except the "problem" is actually in pmlogger!  Turns out (news to me) that pmlogger interprets the 'log every N interval' to be starting at the time the first interval elapsed, and not starting at the time pmlogger starts.

IOW, pmlogger (which is doing the recording here, below the pmRecordControl API) does not issue a fetch until the first interval passes, except for the pmcd.pmlogger.* control metrics which it always fetches & logs immediately.

So, if we look at the timestamps associated with the archives being recorded here ("pmdumplog -a" is the best tool for this, it doesn't interpolate or rate convert or anything like that, which the other tools usually will), we see no "kernel.all.nprocs" value until the first interval passes, then two, with the last being right at the pmlogger termination time.

It may be this is by-design from pmloggers POV (so that we do not issue a potentially large number of fetches all at once, for every fetch group - not sure) - Ken?

We can work around this though if that's the case - from some experimentation, if pmcollectl creates a log-once fetch group in addition to the 'log every N seconds' group for pmlogger, that gives the anticipated behaviour.

OTOH, maybe pmlogger behaviour is not as desired - will wait to hear from Ken.

Comment 3 Nathan Scott 2015-10-12 01:54:20 UTC
Created attachment 1081847 [details]
Force pmlogger to record an initial sample

Comment 4 Ken McDonell 2015-10-12 06:22:10 UTC
pmlogger has always been thus ... or rather ever since it was changed to use __pmAFregister() to control the scheduling of fetches and the delay comes from the semantics of __pmAFregister() and is documented in the man page there.

And I think pmlogger was changed to use __pmAFregister() in 2004 or thereabouts (I can find one circa 2004 archive that definitely does not have this behaviour).  So this is definitely not a recent change in behaviour, and the first time that anyone has noticed in at least a decade.

In pmlogger there was no attempt to stagger start and avoid a thundering herd ... pmie does this but pmlogger simply delays the thundering herd for one delta time period!  So no tricky performance justification for the current behaviour.

Nathan's patch works for the pmcollectl case, but it is probably appropriate to consider if pmlogger should be changed so that all users of pmlogger would see an initial sample soon after the configuration file had been parsed and then the current behaviour would take over, and the pmcollectl patch would not be required.

Biggest fallout risk would be in QA I think.  I've looked at the pmlogger code and the change there is low risk (and would be the similar to Nathan's pmcollectl patch, because the alternative of changing the __pmAFregister() semantics is just too scary).

Comment 5 Nathan Scott 2015-10-12 07:43:39 UTC
(In reply to Ken McDonell from comment #4)
> pmlogger has always been thus ... or rather ever since it was changed [...]
> 
> Biggest fallout risk would be in QA I think.  I've looked at the pmlogger
> code and the change there is low risk (and would be the similar to Nathan's
> pmcollectl patch, because the alternative of changing the __pmAFregister()
> semantics is just too scary).

Heh.  Thanks for the details Ken!  If it gets out of hand with QA, lemme know and we can go with that simpler patch (no risk, but probably not as nice a fix).

Comment 6 Ken McDonell 2015-10-12 22:54:18 UTC
Turns out that it is _easier_ to fix in libpcp by extending the __pmAF*() family with __pmAFsetup() which is like __pmAFregister() but takes an extra argument to specify the offset to the first fetch.

Then change pmlogger to call __pmAFsetup() with an offset of 0 after parsing the config file and when receiving a PDU from pmlc to turn logging on or change the logging delta for some metric(s).

But ... as foreshadowed, this produces 24 new QA failures and these will need to be triaged (some appear simple, but others will take considerable effort).

Comment 7 Ken McDonell 2015-10-15 07:22:10 UTC
Fixed in commit c362945 in my tree (and follow on QA commits).

Comment 8 Fedora Update System 2015-10-29 21:51:20 UTC
pcp-3.10.8-1.fc21 has been submitted as an update to Fedora 21. https://bodhi.fedoraproject.org/updates/FEDORA-2015-3fc0cf98a1

Comment 9 Fedora Update System 2015-10-29 21:51:22 UTC
pcp-3.10.8-1.fc22 has been submitted as an update to Fedora 22. https://bodhi.fedoraproject.org/updates/FEDORA-2015-b1cbfc4842

Comment 10 Fedora Update System 2015-10-29 21:51:37 UTC
pcp-3.10.8-1.fc23.1 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2015-8f41fd0048

Comment 11 Fedora Update System 2015-10-29 21:51:39 UTC
pcp-3.10.8-1.el5 has been submitted as an update to Fedora EPEL 5. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2015-60b17897b4

Comment 12 Fedora Update System 2015-11-01 06:57:16 UTC
pcp-3.10.8-1.fc23.1 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report.
If you want to test the update, you can install it with
$ su -c 'dnf --enablerepo=updates-testing update pcp'
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-8f41fd0048

Comment 13 Fedora Update System 2015-11-01 16:48:07 UTC
pcp-3.10.8-1.el5 has been pushed to the Fedora EPEL 5 testing repository. If problems still persist, please make note of it in this bug report.
If you want to test the update, you can install it with
$ su -c 'yum --enablerepo=epel-testing update pcp'
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2015-60b17897b4

Comment 14 Fedora Update System 2015-11-02 00:25:55 UTC
pcp-3.10.8-1.fc22 has been pushed to the Fedora 22 testing repository. If problems still persist, please make note of it in this bug report.
If you want to test the update, you can install it with
$ su -c 'dnf --enablerepo=updates-testing update pcp'
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-b1cbfc4842

Comment 15 Fedora Update System 2015-11-02 02:54:11 UTC
pcp-3.10.8-1.fc21 has been pushed to the Fedora 21 testing repository. If problems still persist, please make note of it in this bug report.
If you want to test the update, you can install it with
$ su -c 'dnf --enablerepo=updates-testing update pcp'
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-3fc0cf98a1

Comment 16 Fedora Update System 2015-12-17 01:55:21 UTC
pcp-3.10.9-1.fc22 has been submitted as an update to Fedora 22. https://bodhi.fedoraproject.org/updates/FEDORA-2015-d08245c076

Comment 17 Fedora Update System 2015-12-17 01:55:55 UTC
pcp-3.10.9-1.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2015-2b40815137

Comment 18 Fedora Update System 2015-12-17 02:18:42 UTC
pcp-3.10.9-1.el5 has been submitted as an update to Fedora EPEL 5. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2015-2ac90519bc

Comment 19 Fedora Update System 2015-12-17 10:27:06 UTC
pcp-3.10.9-1.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report.
If you want to test the update, you can install it with
$ su -c 'dnf --enablerepo=updates-testing update pcp'
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-2b40815137

Comment 20 Fedora Update System 2015-12-17 10:27:23 UTC
pcp-3.10.9-1.fc22 has been pushed to the Fedora 22 testing repository. If problems still persist, please make note of it in this bug report.
If you want to test the update, you can install it with
$ su -c 'dnf --enablerepo=updates-testing update pcp'
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-d08245c076

Comment 21 Fedora Update System 2015-12-17 12:49:30 UTC
pcp-3.10.9-1.el5 has been pushed to the Fedora EPEL 5 testing repository. If problems still persist, please make note of it in this bug report.
If you want to test the update, you can install it with
$ su -c 'yum --enablerepo=epel-testing update pcp'
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2015-2ac90519bc

Comment 22 Fedora Update System 2016-01-05 21:58:19 UTC
pcp-3.10.9-1.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.

Comment 23 Fedora Update System 2016-01-05 22:54:56 UTC
pcp-3.10.9-1.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report.

Comment 24 Fedora Update System 2016-02-03 03:57:21 UTC
pcp-3.11.0-1.el5 has been submitted as an update to Fedora EPEL 5. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-57b7efb2d7

Comment 25 Fedora Update System 2016-02-03 22:17:55 UTC
pcp-3.11.0-1.el5 has been pushed to the Fedora EPEL 5 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-57b7efb2d7

Comment 26 Fedora Update System 2016-03-22 00:18:12 UTC
pcp-3.11.1-1.el5 has been pushed to the Fedora EPEL 5 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-5b519318e0

Comment 27 Fedora Update System 2016-04-06 14:54:32 UTC
pcp-3.11.1-1.el5 has been pushed to the Fedora EPEL 5 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.