Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1878754

Summary:	Segfault in __pmGetArchiveEnd_ctx due to setting 'f' to null and then using it
Product:	Red Hat Enterprise Linux 7	Reporter:	Paulo Andrade <pandrade>
Component:	pcp	Assignee:	Nathan Scott <nathans>
Status:	CLOSED ERRATA	QA Contact:	Jan Kurik <jkurik>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	7.6	CC:	agerstmayr, jijia, jkurik, mgoodwin, nathans, patrickm
Target Milestone:	rc	Keywords:	Bugfix, Triaged, ZStream
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	pcp-4.3.2-13.el7_9	Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:
Clones:	1880403 (view as bug list)		Environment:
Last Closed:	2020-12-15 11:18:17 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1880403

Description Paulo Andrade 2020-09-14 13:14:11 UTC

The segfault is with pcp-4.1.0-4.el7, but reading the code should
also happen at least with pcp-4.3.2-6.el7.

  The problem is in the pattern:

    found = 0;
    sts = PM_ERR_LOGREC;	/* default error condition */
    f = NULL;
    for (vol = lcp->l_maxvol; vol >= lcp->l_minvol; vol--) {
...
	/*
	 * failure at the physical end of file may be related to a truncted
	 * block flush for a growing archive.  Scan temporal index, and use
	 * last entry at or before end of physical file for this volume
	 */
	logend = (int)sizeof(__pmLogLabel) + 2*(int)sizeof(int);
	for (i = lcp->l_numti - 1; i >= 0; i--) {
	    if (lcp->l_ti[i].ti_vol != vol) {
		if (f != acp->ac_mfp) {
		    __pmFclose(f);
		    f = NULL;
		}
		continue;
	    }
	    if (lcp->l_ti[i].ti_log <= physend) {
		logend = lcp->l_ti[i].ti_log;
		break;
	    }
	}
...

	/*
	 * Now chase it forwards from the last index entry ...
	 *
	 * BUG 357003 - pmchart can't read archive file
	 *	turns out the index may point to the _end_ of the last
	 *	valid record, so if not at start of volume, back up one
	 *	record, then scan forwards.
	 */
	__pmFseek(f, (long)logend, SEEK_SET);
...

  Above it is clear the 'continue' in 'for (i = lcp->l_numti - 1; i >= 0; i--)'
might have set 'f' to NULL, and because it continues only the inner for loop,
it will call __pmFseek with a NULL pointer.
  Maybe this condition can only happen with pcp-4.1.0-4.el7, or with corrupted
data, but still, it should have some guard against this NULL pointer dereference.

Comment 2 Nathan Scott 2020-09-14 21:52:06 UTC

Hi Paulo,

Do you have a reproducible test case for this problem that we can use for regression testing the fix?
(looks like they're invoking pmlogextract there - could we get access to the problematic archive?)

Thanks!

Comment 3 Nathan Scott 2020-09-17 06:57:51 UTC

It's been reproduced and fixed upstream now - commit below - no need for that archive now.

Paulo, this bug has been reported against RHEL7 but there are no future el7 ystream updates planned (i.e. no 7.10) - engineering focus is nowadays on RHEL8.

Does the customer need a zstream update for RHEL7?  (urgent?  a business case will need to be made)
If not, I'll move this BZ over to RHEL8 and ensure its included in the next available release.

thanks!

commit 9020728d5d0ee7507ab96baf673c4d3c4ece98a5
Author: Ken McDonell <kenj.au>
Date:   Thu Sep 17 14:29:38 2020 +1000

    src/libpcp/src/logutil.c: refactor __pmGetArchiveEnd_ctx()
    
    GOTO considered helpful.
    
    Rework to deal with customer-reported problem via Nathan.  We were
    dereferencing a NULL FILE pointer.
    
    The logic is a bit convoluted here, but it is all to do with the
    various ways in which an archive can be corrupted.  The common code
    path is unchanged.
    
    Also qa/1280 (new) reproduced the original problem, and qa/417 was
    remade because it deals with corrupted archives and now has slightly
    different output for one test case.

Comment 4 Jia Jian 2020-09-17 07:20:25 UTC

Hi Nathan,

The cu need a zstream update for RHEL7, let you know. Thanks.

Best regards,

Jian Jia

Comment 5 Paulo Andrade 2020-09-17 12:36:18 UTC

Removing needinfo, as we were waiting for reproducer data from customer,
and Jia Jian already provided the extra information :)

Comment 6 Nathan Scott 2020-09-18 00:39:22 UTC

Thanks folks - please follow the process for requesting a zstream fix next, and we'll chat again when the BZ clones are created.

Comment 7 Nathan Scott 2020-09-23 02:15:48 UTC

Paulo - hmm, it looks like you've created a clone of this BZ yourself?  That's likely to hinder the process at this stage, unfortunately - you'll want to close that clone in order to progress this fix further.

The zstream process is described here: https://mojo.redhat.com/docs/DOC-1021938

The clone(s) are the final stage of approval, and are created automatically by release engineering team folks.  There several steps that must be done before that, however (see above doc, it needs to be followed carefully).

cheers.

Comment 8 Paulo Andrade 2020-09-23 12:35:05 UTC

Nathan, thanks for letting me know about my error.

Jia Jian, can you help us with the justification? I only did check the coredump,
and did not directly talk with the customer, but you might add more data before
actually making the request. Below is initial data.

---

> Business justification (customer impact; how would the defect impact customers’ business? why is it not possible to wait for a fix in Y-Stream?):

There is no planned Y-Stream.


> How many customers are impacted by this defect?

Currently only one known case.


> When was the defect introduced in this major version of RHEL? Is it a regression?

It is a logical bug on special conditions or corrupted files that lead to a segfault.


> What is the minimal patch set to fix this bug? (link, git commit,...)

https://github.com/performancecopilot/pcp/commit/9020728d5d0ee7507ab96baf673c4d3c4ece98a5

Comment 10 Paulo Andrade 2020-09-24 11:57:02 UTC

Request zstream update. Customer is running rhel 7.6 plus updates.

> Business justification (customer impact; how would the defect impact customers’ business? why is it not possible to wait for a fix in Y-Stream?):

There is no planned Y-Stream. Customer is a hardware vendor and does extensive
disk tests using pcp tools before selling their products.


> How many customers are impacted by this defect?

Currently only one known case.


> When was the defect introduced in this major version of RHEL? Is it a regression?

It is a logical bug on special conditions or corrupted files that lead to a segfault.


> What is the minimal patch set to fix this bug? (link, git commit,...)

https://github.com/performancecopilot/pcp/commit/9020728d5d0ee7507ab96baf673c4d3c4ece98a5

Comment 11 Nathan Scott 2020-09-28 22:49:49 UTC

Jian, Paulo,

I'm having a hard time seeing this BZ as satisfying the requirements of a zstream - single customer, not alot in the way of business justification, and for a relatively obscure situation (corrupt archive) - its unlikely many other customers are going to run into this issue.

Also with this being requested for 7.6 that means we need to create and test zstream updates for each subsequent release as well (i.e. this is indirectly a request for zstreams from 7.6 thru 7.9, which is additional work for the PCP dev & QE team), because customers can and do upgrade in which case they still need the fix.

Rather than close this request outright - perhaps we can try a different tack and piggyback on other efforts to reduce the costs.  We are likely to produce a 7.9 zstream update for an unrelated issue in PCP (although that one also is not yet approved) - could the customer be convinced to move up to 7.9 in order that we can back-port and test a single zstream update with both fixes?

thanks.

Comment 13 Nathan Scott 2020-09-29 04:48:11 UTC

OK, thanks Jian - please keep us posted and I'll move it forward with the other planned BZ for rhel-7.9.z.
Note this fix will be included in rhel-8.4.0 as well.

Comment 21 errata-xmlrpc 2020-12-15 11:18:17 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (pcp bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5440