RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1818710 - pcp-atop is crashing due to an uninitialized value within a sort comparison routine
Summary: pcp-atop is crashing due to an uninitialized value within a sort comparison r...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: pcp
Version: 7.7
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: 7.9
Assignee: Nathan Scott
QA Contact: Jan Kurik
URL:
Whiteboard:
: 1842480 (view as bug list)
Depends On:
Blocks: 1851849
TreeView+ depends on / blocked
 
Reported: 2020-03-30 07:36 UTC by Nitin Kumar Bansal
Modified: 2023-12-15 17:35 UTC (History)
7 users (show)

Fixed In Version: pcp-4.3.2-12
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1851849 (view as bug list)
Environment:
Last Closed: 2020-09-29 19:25:12 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2020:3869 0 None None None 2020-09-29 19:25:19 UTC

Comment 7 Nathan Scott 2020-04-03 04:34:28 UTC
OK, thanks Divya - I'll continue to look into it.

Comment 10 Nathan Scott 2020-04-07 04:29:52 UTC
Status is I've been unable to reproduce the problem locally since those earlier changes, making life much more difficult in terms of finding a fix.  Do they/you have insights as to what might trigger the problem?  All working find here.  :(

Comment 11 Nathan Scott 2020-04-09 07:09:03 UTC
Divya,

Are you absolutely certain your build included that and the other atop fixes from the previous bug?  For me, with all updates applied my one test case that intermittently tripped the issue has not triggered it since.  Our QE folk have also been trying without success to reproduce the problem.  Valgrind is reporting all memory accesses are safe, and after auditing the code again I cannot see a way we'd be able to incorrectly access memory there.

cheers.

Comment 14 Nathan Scott 2020-04-30 02:00:38 UTC
Hi Divya,

In the absence of valgrind output so far, I've audited the code paths in pcp-atop once more today.  I think I can see another set of code paths that that could be causing the crashes we've seen, and that is where the comparison routines (compcpu, compdsk, etc) are presented with elements from two differently sized arrays, where the smaller one has NULL'd task pointers.  In this case we'd see a NULL pointer passed into the comparison routine, and we'd crash with segv at the points described.

I've pushed an upstream commit to tackle this aspect (details below) - could you prepare a build for the customer with this and see if it resolves the issue?  Thanks!

commit 9e14d91e012fd2f5b395cb83ba2353a1ec4a7e3f
Author: Nathan Scott <nathans>
Date:   Thu Apr 30 11:56:42 2020 +1000

    pcp-atop: resolve other potential null task pointer dereferences
    
    Additional defensive counter measures in sort routines where we
    could potentially dereference null pointers.  Aiming to tackle a
    customer reported issue, which qa/1080 intermittently reproduces.
    
    Related to Red Hat BZ #1818710.

Comment 17 Divya 2020-05-11 07:37:40 UTC
Hello Nathan

Bad news! Even with recent set of patches included, it seems to be crashing at the same point with below backtrace: 

Program terminated with signal 11, Segmentation fault.
#0  compcpu (a=0x19ac838, b=0x19ac840) at showlinux.c:2045
2045		bcpu = (*(struct tstat **)b)->cpu.stime +
(gdb) bt
#0  compcpu (a=0x19ac838, b=0x19ac840) at showlinux.c:2045
#1  0x00007fb4868dde59 in msort_with_tmp (p=0x7ffd4b0d5730, b=0x19ac838, n=2) at msort.c:83
#2  0x00007fb4868ddbc8 in msort_with_tmp (n=2, b=0x19ac838, p=0x7ffd4b0d5730) at msort.c:45
#3  msort_with_tmp (p=0x7ffd4b0d5730, b=0x19ac830, n=3) at msort.c:54
#4  0x00007fb4868ddbc8 in msort_with_tmp (n=3, b=0x19ac830, p=0x7ffd4b0d5730) at msort.c:45
#5  msort_with_tmp (p=0x7ffd4b0d5730, b=0x19ac820, n=5) at msort.c:54
#6  0x00007fb4868ddbc8 in msort_with_tmp (n=5, b=0x19ac820, p=0x7ffd4b0d5730) at msort.c:45
#7  msort_with_tmp (p=0x7ffd4b0d5730, b=0x19ac7f8, n=10) at msort.c:54
#8  0x00007fb4868ddbc8 in msort_with_tmp (n=10, b=0x19ac7f8, p=0x7ffd4b0d5730) at msort.c:45
#9  msort_with_tmp (p=0x7ffd4b0d5730, b=0x19ac7a8, n=20) at msort.c:54
#10 0x00007fb4868ddbc8 in msort_with_tmp (n=20, b=0x19ac7a8, p=0x7ffd4b0d5730) at msort.c:45
#11 msort_with_tmp (p=0x7ffd4b0d5730, b=0x19ac708, n=40) at msort.c:54
#12 0x00007fb4868ddbc8 in msort_with_tmp (n=40, b=0x19ac708, p=0x7ffd4b0d5730) at msort.c:45
#13 msort_with_tmp (p=0x7ffd4b0d5730, b=0x19ac5d0, n=79) at msort.c:54
#14 0x00007fb4868de14c in msort_with_tmp (n=79, b=0x19ac5d0, p=0x7ffd4b0d5730) at msort.c:45
#15 __GI___qsort_r (b=b@entry=0x19ac5d0, n=n@entry=79, s=s@entry=8, cmp=0x419e30 <compcpu>, arg=arg@entry=0x0) at msort.c:297
#16 0x00007fb4868de1f8 in __GI_qsort (b=b@entry=0x19ac5d0, n=n@entry=79, s=s@entry=8, cmp=<optimized out>) at msort.c:308
#17 0x0000000000413dff in generic_samp (curtime=<optimized out>, nsecs=<optimized out>, devtstat=<optimized out>, sstat=<optimized out>, nexit=<optimized out>, noverflow=<optimized out>, flag=<optimized out>)
    at showgeneric.c:645
#18 0x000000000040396f in engine () at atop.c:684
#19 0x0000000000402fc3 in main (argc=4, argv=<optimized out>) at atop.c:477

Comment 18 Nathan Scott 2020-05-11 07:40:58 UTC
Thanks Divya - I'll keep looking.  :(  Were they able to reproduce it with valgrind?

Comment 19 Divya 2020-05-11 07:44:22 UTC
(In reply to Nathan Scott from comment #18)
> Thanks Divya - I'll keep looking.  :(  Were they able to reproduce it with
> valgrind?

Unfortunately no

Comment 23 Jan Kurik 2020-05-17 07:44:22 UTC
All the regression tests have passed.
Switching to VERIFIED and setting flag SanityOnly as I am unable to reproduce this issue.

Comment 24 Nathan Scott 2020-06-02 02:10:30 UTC
It wasn't mentioned here, which led to some accidental confusion, but there is one other relavant commit (already in 7.9)...

commit c22151f463e3e2494850210444a69948dc0fbdd6
Author: Nathan Scott <nathans>
Date:   Tue May 19 14:50:37 2020 +1000

    pcp-atop: resolve a new task pointer segv qa/1080 has encountered
    
    Related to Red Hat BZ #1818710

Comment 25 Nathan Scott 2020-06-02 02:10:45 UTC
*** Bug 1842480 has been marked as a duplicate of this bug. ***

Comment 34 errata-xmlrpc 2020-09-29 19:25:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Low: pcp security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:3869


Note You need to log in before you can comment on or make changes to this bug.