Bug 1586051

Summary: Backport instance domain setup problem fix
Product: Red Hat Enterprise Linux 7 Reporter: Lukas Zapletal <lzap>
Component: pcpAssignee: Nathan Scott <nathans>
Status: CLOSED NOTABUG QA Contact: qe-baseos-tools-bugs
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.5CC: brolley, fche, lberk, mbenitez, mgoodwin, mhulan, nathans, patrickm, sjagtap
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-20 08:56:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1579876    
Attachments:
Description Flags
Patch against 3.12.2 none

Description Lukas Zapletal 2018-06-05 12:01:40 UTC
Created attachment 1447846 [details]
Patch against 3.12.2

Hello,

this is a backport request into RHEL 7.5:

pmdammv: resolve an instance domain setup problem
Several people (marko, lzap, tallpaul) have reported this
one, finally got to the bottom of it.  The symptoms are
"Unknown or illegal instance domain identifier" errors on
indom lookups, sometimes.  Root cause was a logic error
in pmdammv indom setup code incorrectly overwriting count
and offset local variables while parsing mappings.

To exercise the fix I've modernized qa/src/indom.c and
used it in new test qa/1422 to tickle the problem using
a canned MMV mapping which is known to expose it.

The original bug report (against Parfait) is this one:

https://github.com/performancecopilot/parfait/issues/53

Upstream patch:
https://github.com/performancecopilot/pcp/commit/816c9cd75a2cad0dc64323897bd811cfd8725810

I have successfully backported the patch and made a build:

https://copr.fedorainfracloud.org/coprs/lzap/pcp/

I am currently testing it. Attaching the patch:

   qa/1422                    |  68 ++++++++++
   qa/1422.out                |  33 +++++
   qa/GNUmakefile             |   4 +-
   qa/GNUmakefile.install     |   3 +-
   qa/group                   |   1 +
   qa/mmv/GNUmakefile         |  16 +++
   qa/mmv/GNUmakefile.install |   6 +
   qa/mmv/KeyboardReader.xz   | Bin 0 -> 1220 bytes
   qa/mmv/pytest.xz           | Bin 0 -> 416 bytes
   qa/src/indom.c             | 253 +++++++++----------------------------
   src/pmdas/mmv/src/mmv.c    |  48 ++++---
   11 files changed, 216 insertions(+), 216 deletions(-)

The binary files are not part of the patch.

Comment 8 Frank Ch. Eigler 2018-06-08 15:29:29 UTC
installing pcp-zeroconf makes some logging configuration changes:
- changing the default PMLOGGER_INTERVAL to 10 seconds in /etc/sysconfig/pmlogger
- making more pmlogconf clauses included in default logs (esp: proc metrics)

I don't think any of the default logger configurations record mmv.* metrics though, so that part would have come from somewhere.  As far as I know, your pmlc instruction would -add- new mmv logging rather than modify a previous mmv logging.   You may need to first send a 'log mandatory off mmv' to kill that ... not sure.

Comment 27 Lukas Zapletal 2018-06-20 08:56:00 UTC
Thanks!

Comment 28 Lukas Zapletal 2018-07-20 13:13:21 UTC
For googlers, the error will be fixed in the next upcoming version of PCP, it will not be backported into 7.5 however.

Comment 29 Lukas Zapletal 2018-07-20 13:15:06 UTC
Just to make things clear - the bug was fixed in PCP version 4.1 or above.