RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1109539 - hinv.map.lvname instance IDs are not persistent
Summary: hinv.map.lvname instance IDs are not persistent
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: pcp
Version: 7.0
Hardware: Unspecified
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Mark Goodwin
QA Contact: Miloš Prchlík
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-06-15 06:15 UTC by Mark Goodwin
Modified: 2015-11-19 11:53 UTC (History)
8 users (show)

Fixed In Version: pcp-3.10.5-2.el7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-11-19 11:53:42 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:2096 0 normal SHIPPED_LIVE pcp bug fix and enhancement update 2015-11-19 10:39:13 UTC

Description Mark Goodwin 2014-06-15 06:15:47 UTC
Description of problem: instid<->instname mapping is not persistent


Version-Release number of selected component (if applicable): pcp-3.5.9-1 on RHEL7.0


How reproducible:  easily

Steps to Reproduce:
1. pminfo -f hinv.map.lvname
2. create a new LV or reboot
3. pminfo -f hinv.map.lvname

Actual results: instid->name mapping changes - it depends on the order of entries returned from readdir(/dev/mapper), which can change when new entries appear or get deleted and/or across a reboot.

Expected results: persistent mapping (especially important for PCP archive logs)

Additional info: need to use the pmdaCache API

Here's an example:

## initial conditions
# pminfo -f hinv.map.lvname
hinv.map.lvname
    inst [1 or "dm-6"] value "virtvg-snap"
    inst [2 or "dm-4"] value "virtvg-basevol"
    inst [3 or "dm-3"] value "virtvg-basevol-real"
    inst [4 or "dm-7"] value "rootvg-home"
    inst [5 or "dm-2"] value "virtvg-temp"
    inst [6 or "dm-5"] value "virtvg-snap-cow"
    inst [7 or "dm-1"] value "rootvg-root"
    inst [8 or "dm-0"] value "rootvg-swap"

## create a new LV (e.g. a snapshot)
# lvcreate -L10G -n newsnap -s virtvg/basevol
# pminfo -f hinv.map.lvname
hinv.map.lvname
    inst [1 or "dm-9"] value "virtvg-newsnap-cow"
    inst [2 or "dm-8"] value "virtvg-newsnap"
    inst [3 or "dm-6"] value "virtvg-snap"
    inst [4 or "dm-4"] value "virtvg-basevol"
    inst [5 or "dm-3"] value "virtvg-basevol-real"
    inst [6 or "dm-7"] value "rootvg-home"
    inst [7 or "dm-2"] value "virtvg-temp"
    inst [8 or "dm-5"] value "virtvg-snap-cow"
    inst [9 or "dm-1"] value "rootvg-root"
    inst [10 or "dm-0"] value "rootvg-swap"

## after a reboot, it changes yet again
# reboot ....
#  pminfo -f hinv.map.lvname
hinv.map.lvname
    inst [1 or "dm-9"] value "virtvg-newsnap"
    inst [2 or "dm-4"] value "virtvg-basevol"
    inst [3 or "dm-6"] value "virtvg-snap"
    inst [4 or "dm-3"] value "virtvg-basevol-real"
    inst [5 or "dm-8"] value "virtvg-newsnap-cow"
    inst [6 or "dm-2"] value "virtvg-temp"
    inst [7 or "dm-7"] value "rootvg-home"
    inst [8 or "dm-5"] value "virtvg-snap-cow"
    inst [9 or "dm-1"] value "rootvg-root"
    inst [10 or "dm-0"] value "rootvg-swap"

Note that dm-* names are not guaranteed to be persistent across a reboot and the order of entries in readdir(/dev/mapper) is not deterministic. The solution is to use the pmdaCache() API. Patch pending ....

Comment 1 Frank Ch. Eigler 2014-06-15 12:33:29 UTC
"Expected results: persistent mapping (especially important for PCP archive logs)"

Why is that?  What breaks if indom values change across reboots?  Can we make
the pcp archive-gluing client side more resilient?

Comment 2 Mark Goodwin 2014-06-15 23:52:00 UTC
(In reply to Frank Ch. Eigler from comment #1)
> "Expected results: persistent mapping (especially important for PCP archive
> logs)"
> 
> Why is that?  What breaks if indom values change across reboots?  Can we make
> the pcp archive-gluing client side more resilient?

values in pmResults are indexed with instid (not instance name), so it's never 
good if an instance domain update changes the mapping. Archives are affected 
across reboots due to pmlogmerge and friends. This is why the pmdaCache API 
feature was introduced, adding persistance via opaque keys. See PMDACACHE(3) for details.

Comment 3 Frank Ch. Eigler 2014-06-16 00:49:22 UTC
I wasn't talking about changing mappings during the lifespan of a pmcd.  But ...

"Archives are affected across reboots due to pmlogmerge and friends."

Doesn't this represent a shortcoming of pmlogmerge?  The individual meta files
contain both instance numbers and names, so pmlogmerge could do some rewriting
as it goes and have the output be self-consistent.

Comment 4 Mark Goodwin 2014-06-16 01:00:19 UTC
(In reply to Frank Ch. Eigler from comment #3)
> I wasn't talking about changing mappings during the lifespan of a pmcd.  But
> ...
> 
> "Archives are affected across reboots due to pmlogmerge and friends."
> 
> Doesn't this represent a shortcoming of pmlogmerge?  The individual meta
> files
> contain both instance numbers and names, so pmlogmerge could do some
> rewriting
> as it goes and have the output be self-consistent.

yes, pmlogrewrite(1) can re-map instance domains prior to log merge. But that's a heck of a hassle requiring manual intervention compared to having the instance domain persistent in the first place!

Comment 5 Frank Ch. Eigler 2014-06-16 10:55:23 UTC
"yes, pmlogrewrite(1) can re-map instance domains prior to log merge. But that's a heck of a hassle requiring manual intervention"

OK, but does it need to be a manual hassle?  Could pmlogmerge detect
mismatching indoms, and do the rewriting itself (or subcontract it to
pmlogrewrite jobs)?

"compared to having the instance domain persistent in the first place!"

If the tools were robust enough, this would be a great optimization.

Comment 6 Nathan Scott 2014-06-17 00:04:16 UTC
This is not simply an archiving problem (so we cannot just fix it there via post-processing, and think we're done here) - it affects all the tools using live mode as well.  In that mode, the protocol simply does not support instance IDs dynamically changing like this.  Think about what happens internally in pmchart, pmie, pmval, etc in live mode, once they've been asked to fetch a named instance.

These requirements are well documented - in the books and man pages, etc.  The pmdaCacheOp(3) man page has extensive discussion (see paragraph #2 - "The semantics of a PCP instance domain require a number of rules to be followed, namely...").

Marks original statement "need to use the pmdaCache API" is spot on, and is the right way to tackle this.

Comment 7 Frank Ch. Eigler 2014-06-17 00:14:17 UTC
> Think about what happens internally in pmchart, pmie, pmval, etc
> in live mode, once they've been asked to fetch a named instance.

In live mode, during a persistent connection to pmcd, this should
not happen though, should it?


> These requirements are well documented - in the books and man pages, etc. 
> The pmdaCacheOp(3) man page has extensive discussion (see paragraph #2 -
> "The semantics of a PCP instance domain require a number of rules to be
> followed, namely...").

Bullet 4 there says:

   4. It is preferable, although not mandatory, for the association
   between and external instance name  and an  internal instance
   identifier to be persistent.  This rule is supported by the
   pmdaCache family of routines.

Note "preferable".
 

> Marks original statement "need to use the pmdaCache API" is spot on, and is
> the right way to tackle this.

Yes, it would certainly solve this problem today.  I was thinking ahead to
other problems such as imperfect PMDAs, lost caches, mismatched archives, etc.

Comment 8 Nathan Scott 2014-06-17 01:06:04 UTC
(In reply to Frank Ch. Eigler from comment #7)
> In live mode, during a persistent connection to pmcd, this should
> not happen though, should it?

It should not (in general), but in this case, yes it can happen.  Have a look at the code.

>    4. It is preferable, although not mandatory, for the association
>    between and external instance name  and an  internal instance
>    identifier to be persistent.  This rule is supported by the
>    pmdaCache family of routines.
> 
> Note "preferable".

Its "preferable" because it depends on how the clients might use the information, and whether or not its possible, or sufficiently important information to ensure this naming persistence - its at the PMDA authors discretion.  Sometimes (as was the case here), a simple implementation might come first, and later be updated to use pmdaCache interfaces when the need for persistence is more widely understood, and more experience gained with the metric(s).

HTH.

Comment 9 Frank Ch. Eigler 2014-06-17 01:32:47 UTC
(In reply to Nathan Scott from comment #8)
> (In reply to Frank Ch. Eigler from comment #7)
> > In live mode, during a persistent connection to pmcd, this should
> > not happen though, should it?
> 
> It should not (in general), but in this case, yes it can happen.  Have a
> look at the code.

You're right.  If I understand it correctly, it's because in src/pmdas/linux
devmapper.c, refresh* computes a whole new instance domain table every time
it's deemed out-of-date.

An ugly little shell pipeline implicates a few other linux indoms that might
be similarly affected:

% cd src/pmdas/linux
 grep 'case.*_INDOM' *.c | uniq | awk '{print $3}' | cut -f1 -d: | while read dom
do; echo -n $dom; dom2=`echo $dom | cut -f1 -d_`
grep -iq "pmdaCache.*"$dom *.c && echo -n " cached" || echo -n " uncached"
grep -iq "refresh.*"$dom2 *.c && echo -n " refreshed" || echo -n " unrefreshed"; echo
done | grep uncached.refreshed
CPU_INDOM uncached refreshed
LOADAVG_INDOM uncached refreshed
NFS_INDOM uncached refreshed
SCSI_INDOM uncached refreshed
LV_INDOM uncached refreshed
SLAB_INDOM uncached refreshed

but that includes things like SCSI, which through its refresh_proc_scsi()
function maintains some invariance by only appending to its indom list,
not recreating it from scratch.  So that list includes false positives.

Comment 12 Mark Goodwin 2014-09-05 23:37:06 UTC
I haven't fixed this directly because the functionality of mapping to the internal non-persistent dm-[0-9]* name from the persistent dm name is now provided by hinv.map.dmname (though in the reverse direction to the mapping provided by hinv.map.lvname, i.e. the persistent dm name is the instance domain and the non-persistent dm-[0-9]* name is the value). This mapping by hinv.map.dmname is available for all dm devices, not just lvm devices.

So can we consider deprecating hinv.map.lvname in favour of hinv.map.dmname? And updating any tools that use the former to use the latter? I think that probably only affects pmatop though I don't know for sure.

Regards
-- Mark

Comment 13 Nathan Scott 2014-09-16 06:30:18 UTC
> So can we consider deprecating hinv.map.lvname in favour of hinv.map.dmname?

Yep, thats my understanding of how we would proceed from the earlier
discussions - leaving the existing metric and indom as-is (un-fixable
in this way, hence we went with a new metric/indom pairing).

> And updating any tools that use the former to use the latter?

*nod*

> I think that probably only affects pmatop though I don't know for sure.

Pretty sure thats the case, yes.  Stan originally added this metric in for the
python tools he wrote - so pmcollectl and pmatop are the only possible users AFAICT.

cheers.

Comment 16 Mark Goodwin 2015-06-09 05:09:42 UTC
upstream patch has been posted for the 3.10.5-1 release

commit ab4eb4adbfec797f51becf3298f5b78f350dcaed
Author: Mark Goodwin <mgoodwin>
Date:   Fri Jun 5 14:23:37 2015 +1000

    Deprecate hinv.nlv and hinv.map.lvname and the LV_NAME instance domain.
    
    The instance domain for hinv.map.lvname is the dm names e.g. dm-1,
    which are not persistent. The values are the logical names, which
    are persistent. These two metrics are now deprecated in favor of
    hinv.map.dmname, which instead uses the persistent names for the
    instance domain, and the dm names as the value of the mapping.
    
    Note also - hinv.map.dmname is a superset of the lvm instances -
    the dmname instance domain also includes non-lvm devices such as
    dm-multipath, dm-cache, etc.  as well as all lvm logical devices.
    
    The only known user of the deprecated hinv.map.lvname metric is the
    old pmatop command, which itself has been deprecated.
    
        modified:   src/pmdas/linux/GNUmakefile
        modified:   src/pmdas/linux/clusters.h
        deleted:    src/pmdas/linux/devmapper.c
        deleted:    src/pmdas/linux/devmapper.h
        modified:   src/pmdas/linux/help
        modified:   src/pmdas/linux/indom.h
        modified:   src/pmdas/linux/pmda.c
        modified:   src/pmdas/linux/root_linux

Comment 18 mbenitez 2015-07-28 14:12:40 UTC
Hello Nathan,
Could you please verify if the fix for this bug made it into the PCP build for RHEL 7.2? If so, please add the bug to errata.
Thanks!
Martha

Comment 19 Nathan Scott 2015-07-29 00:09:48 UTC
Yes, this was included in the rebase, updating the Errata now.

Comment 21 Miloš Prchlík 2015-10-16 17:36:41 UTC
Verified for build pcp-3.10.6-2.el7.

Comment 22 errata-xmlrpc 2015-11-19 11:53:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2096.html


Note You need to log in before you can comment on or make changes to this bug.