Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1788937

Summary: pmda-lustre fails to start as stat data location changed since lustre 2.12
Product: Red Hat Enterprise Linux 8 Reporter: Piyush Bhoot <pbhoot>
Component: pcpAssignee: Mark Goodwin <mgoodwin>
Status: CLOSED ERRATA QA Contact: Jan Kurik <jkurik>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.3CC: agerstmayr, jkurik, mgoodwin, nathans, patrickm
Target Milestone: alphaKeywords: Bugfix, Reopened, Triaged
Target Release: 8.3Flags: pm-rhel: mirror+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: pcp-5.1.1 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-04 03:00:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1792971    
Bug Blocks:    

Description Piyush Bhoot 2020-01-08 12:11:42 UTC
Description of problem:

pmda-lustre fails to start as stat data location changed to since lustre 2.12

New locations

/sys/kernel/debug/lustre/llite/
/sys/kernel/debug/lnet

Version-Release number of selected component (if applicable):
lustre 2.12
pcp-pmda-lustre-4.3.2-3.el7_7.x86_64.rpm 

How reproducible:
Always

Steps to Reproduce:

lustre]# ./Install
Can't open directory: /proc/fs/lustre/llite/
Can't open directory: /proc/fs/lustre/llite/
Updating the Performance Metrics Name Space (PMNS) ...
Terminate PMDA if already installed ...
Updating the PMCD control file, and notifying PMCD ...
Check lustre metrics have appeared ... 1 warnings, 1 metrics and 0 values

Additional info:

1] Install succeeds with new paths

vi /var/lib/pcp/pmdas/lustre/pmdalustre.pl

# llite proc root
#our $LLITE_PATH = "/proc/fs/lustre/llite/";
our $LLITE_PATH = "/sys/kernel/debug/lustre/llite/";

# lnet proc root
#our $LNET_PATH = "/proc/sys/lnet/";
our $LNET_PATH = "/sys/kernel/debug/lnet/";

But there seems more to it as these new paths do not allow non-root access

https://jira.whamcloud.com/browse/LU-11850

2] On side note, I had to edit pmdalustre.pl directly as declaring settings in 
 luster.conf (or even lustre.conf) did not help

cat /var/lib/pcp/pmdas/lustre/luster.conf
LLITE_PATH=/sys/kernel/debug/lustre/llite
LNET_PATH=/sys/kernel/debug/lnet

-------------------------
# Configuration files for overriding the location of LLITE_PATH, etc, mostly for testing purposes
for my $file (pmda_config('PCP_PMDAS_DIR') . '/lustre/luster.conf', 'luster.conf') {
        eval `cat $file` unless ! -f $file;
}
-------------------------

Comment 2 Nathan Scott 2020-01-09 05:36:44 UTC
Assigning to Mark for closer look.

Note the format of luster.conf (shoulda been lustre.conf I think, ugh) is

$LLITE_PATH=/sys/kernel/debug/lustre/llite;

not

LLITE_PATH=/sys/kernel/debug/lustre/llite

(IIRC)

Comment 3 Mark Goodwin 2020-02-19 07:59:43 UTC
Piyush, will it be necessary with this customer to support both lustre-2.10.x and lustre-2.12.x with the same PMDA code? It seems there are two current release streams with RPMS for both v2.10 and v2.12 for el7 available at at https://downloads.whamcloud.com/public/lustre/

Thanks

Comment 4 Nathan Scott 2020-02-19 23:24:49 UTC
For this customer, the requirement is to support the version (of the kernel interface) shipped in RHEL-7.

The userspace Lustre version is not relevant here, as PCP talks to the kernel directly so it's that interface that we know needs to be investigated.  From a purist, upstream PCP POV, the PMDA would ideally support both kernel interfaces though.

Comment 5 Mark Goodwin 2020-02-20 06:02:29 UTC
(In reply to Nathan Scott from comment #4)
> For this customer, the requirement is to support the version (of the kernel
> interface) shipped in RHEL-7.

Lustre server kernel modules are not shipped by Red Hat. See https://access.redhat.com/solutions/47031

> 
> The userspace Lustre version is not relevant here, as PCP talks to the
> kernel directly so it's that interface that we know needs to be
> investigated.  From a purist, upstream PCP POV, the PMDA would ideally
> support both kernel interfaces though.

The lustre server kernel modules for el7 are built using a patch against a 3.10 kernel (patch-3.10.0-lustre.patch in the SRPM). THis patch is different in the latest Lustre maintenance release streams for el7, which are version 2.10.8 and version 2.12.4. So the PCP PMDA will need to be tested with both versions on a RHEL7 VM.

If that works, the path to the kernel statistics interfaces can be set either using (a) exported environment variables for overriding the default paths: $LUSTRE_LLITE_PATH and $LUSTRE_LNET_PATH, set prior to running the Install script. Or (b), a configuration file '/var/lib/pcp/pmdas/lustre/lustre.conf' which can specify these paths too. Variable assignments in the conf file are eval'd by the PMDA, which is written in perl. So the conf file should contain perl code, e.g. for Lustre 2.12.x :

$LLITE_PATH=/sys/kernel/debug/lustre/llite
$LNET_PATH=/sys/kernel/debug/lnet

Other than that, if Lustre 2.12 or later kernel modules are loaded, the PMDA needs to run as the root user in order to access the kernel statistics interfaces. THis is not necessary if running Lustre 2.10.x - the /proc interfaces are non-root accessible.

Comment 6 Nathan Scott 2020-02-20 06:11:06 UTC
(In reply to Mark Goodwin from comment #5)
> (In reply to Nathan Scott from comment #4)
> > For this customer, the requirement is to support the version (of the kernel
> > interface) shipped in RHEL-7.
> 
> Lustre server kernel modules are not shipped by Red Hat. See
> https://access.redhat.com/solutions/47031
> 

Aha - thanks for looking into this.

> > 
> > The userspace Lustre version is not relevant here, as PCP talks to the
> > kernel directly so it's that interface that we know needs to be
> > investigated.  From a purist, upstream PCP POV, the PMDA would ideally
> > support both kernel interfaces though.
> 
> The lustre server kernel modules for el7 are built using a patch against a
> 3.10 kernel (patch-3.10.0-lustre.patch in the SRPM). THis patch is different
> in the latest Lustre maintenance release streams for el7, which are version
> 2.10.8 and version 2.12.4. So the PCP PMDA will need to be tested with both
> versions on a RHEL7 VM.

OK.  Ideally the different kernel interfaces will be captured, and
we can craft suitable QA regression tests for the PMDA.

> If that works, the path to the kernel statistics interfaces can be set
> either using (a) exported environment variables for overriding the default
> paths: $LUSTRE_LLITE_PATH and $LUSTRE_LNET_PATH, set prior to running the
> Install script. Or (b), a configuration file
> '/var/lib/pcp/pmdas/lustre/lustre.conf' which can specify these paths too.
> Variable assignments in the conf file are eval'd by the PMDA, which is

(c) handle both by detecting which kernel interface file path/format is
available at runtime.  we need to keep the PMDA config file though since
it's supported today.

> written in perl. So the conf file should contain perl code, e.g. for Lustre
> 2.12.x :
> 
> $LLITE_PATH=/sys/kernel/debug/lustre/llite
> $LNET_PATH=/sys/kernel/debug/lnet

We don't seem to know yet whether the sysfs files have the same or
different formats - or do we?

> Other than that, if Lustre 2.12 or later kernel modules are loaded, the PMDA
> needs to run as the root user in order to access the kernel statistics
> interfaces. THis is not necessary if running Lustre 2.10.x - the /proc
> interfaces are non-root accessible.

This also can be handled dynamically (preferably), or just always run as root.

Comment 7 Mark Goodwin 2020-02-20 06:23:35 UTC
(In reply to Nathan Scott from comment #6)
> (In reply to Mark Goodwin from comment #5)
> > (In reply to Nathan Scott from comment #4)
> > > For this customer, the requirement is to support the version (of the kernel
> > > interface) shipped in RHEL-7.
> > 
> > Lustre server kernel modules are not shipped by Red Hat. See
> > https://access.redhat.com/solutions/47031
> > 
> 
> Aha - thanks for looking into this.
> 
> > > 
> > > The userspace Lustre version is not relevant here, as PCP talks to the
> > > kernel directly so it's that interface that we know needs to be
> > > investigated.  From a purist, upstream PCP POV, the PMDA would ideally
> > > support both kernel interfaces though.
> > 
> > The lustre server kernel modules for el7 are built using a patch against a
> > 3.10 kernel (patch-3.10.0-lustre.patch in the SRPM). THis patch is different
> > in the latest Lustre maintenance release streams for el7, which are version
> > 2.10.8 and version 2.12.4. So the PCP PMDA will need to be tested with both
> > versions on a RHEL7 VM.
> 
> OK.  Ideally the different kernel interfaces will be captured, and
> we can craft suitable QA regression tests for the PMDA.

yes we can do that


> We don't seem to know yet whether the sysfs files have the same or
> different formats - or do we?

it's not obvious after reading and comparing the two patches. That's why it would be best to install each version and then test.

> 
> > Other than that, if Lustre 2.12 or later kernel modules are loaded, the PMDA
> > needs to run as the root user in order to access the kernel statistics
> > interfaces. THis is not necessary if running Lustre 2.10.x - the /proc
> > interfaces are non-root accessible.
> 
> This also can be handled dynamically (preferably), or just always run as
> root.

The usual way is to call pmda->set_user($OS_USER) where $OS_USER can be specified in the conf file. The code to do that is not in the current PMDA src.

Comment 8 Nathan Scott 2020-02-20 06:30:53 UTC
(In reply to Mark Goodwin from comment #7)
> (In reply to Nathan Scott from comment #6)
> 
> > We don't seem to know yet whether the sysfs files have the same or
> > different formats - or do we?
> 
> it's not obvious after reading and comparing the two patches. That's why it
> would be best to install each version and then test.

OK.  Perhaps Piyush could attach files captured from the sysfs interface for
a quick(er) comparison too.

> > 
> > This also can be handled dynamically (preferably), or just always run as
> > root.
> 
> The usual way is to call pmda->set_user($OS_USER) where $OS_USER can be
> specified in the conf file. The code to do that is not in the current PMDA
> src.

There is no config file by default, so the 'usual way' is to handle setup
in the PMDA code.  The code could test for which kernel interface exists
(if any) and adapt its behaviour on the fly (and either call set_user or
not, depending on the API used).  This approach would be user-friendly and
lead to less installation errors by default.

Thanks Mark.

Comment 9 Nathan Scott 2020-04-30 00:05:43 UTC
As there's no RHEL 7.10 planned, we'll tackle this in RHEL 8 at that next available opportunity.

Comment 15 Jan Kurik 2020-06-17 13:57:11 UTC
Verified on pcp-5.1.1-2.el8 build with lustre-2.12.5 installed.

Comment 18 errata-xmlrpc 2020-11-04 03:00:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (pcp bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4684