Bug 1818710
Summary: | pcp-atop is crashing due to an uninitialized value within a sort comparison routine | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Nitin Kumar Bansal <nbansal> | |
Component: | pcp | Assignee: | Nathan Scott <nathans> | |
Status: | CLOSED ERRATA | QA Contact: | Jan Kurik <jkurik> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 7.7 | CC: | agerstmayr, alanm, dbasant, jkurik, mgoodwin, nathans, patrickm | |
Target Milestone: | rc | Keywords: | Bugfix, Triaged, ZStream | |
Target Release: | 7.9 | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | pcp-4.3.2-12 | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1851849 (view as bug list) | Environment: | ||
Last Closed: | 2020-09-29 19:25:12 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1851849 |
Comment 7
Nathan Scott
2020-04-03 04:34:28 UTC
Status is I've been unable to reproduce the problem locally since those earlier changes, making life much more difficult in terms of finding a fix. Do they/you have insights as to what might trigger the problem? All working find here. :( Divya, Are you absolutely certain your build included that and the other atop fixes from the previous bug? For me, with all updates applied my one test case that intermittently tripped the issue has not triggered it since. Our QE folk have also been trying without success to reproduce the problem. Valgrind is reporting all memory accesses are safe, and after auditing the code again I cannot see a way we'd be able to incorrectly access memory there. cheers. Hi Divya, In the absence of valgrind output so far, I've audited the code paths in pcp-atop once more today. I think I can see another set of code paths that that could be causing the crashes we've seen, and that is where the comparison routines (compcpu, compdsk, etc) are presented with elements from two differently sized arrays, where the smaller one has NULL'd task pointers. In this case we'd see a NULL pointer passed into the comparison routine, and we'd crash with segv at the points described. I've pushed an upstream commit to tackle this aspect (details below) - could you prepare a build for the customer with this and see if it resolves the issue? Thanks! commit 9e14d91e012fd2f5b395cb83ba2353a1ec4a7e3f Author: Nathan Scott <nathans> Date: Thu Apr 30 11:56:42 2020 +1000 pcp-atop: resolve other potential null task pointer dereferences Additional defensive counter measures in sort routines where we could potentially dereference null pointers. Aiming to tackle a customer reported issue, which qa/1080 intermittently reproduces. Related to Red Hat BZ #1818710. Hello Nathan Bad news! Even with recent set of patches included, it seems to be crashing at the same point with below backtrace: Program terminated with signal 11, Segmentation fault. #0 compcpu (a=0x19ac838, b=0x19ac840) at showlinux.c:2045 2045 bcpu = (*(struct tstat **)b)->cpu.stime + (gdb) bt #0 compcpu (a=0x19ac838, b=0x19ac840) at showlinux.c:2045 #1 0x00007fb4868dde59 in msort_with_tmp (p=0x7ffd4b0d5730, b=0x19ac838, n=2) at msort.c:83 #2 0x00007fb4868ddbc8 in msort_with_tmp (n=2, b=0x19ac838, p=0x7ffd4b0d5730) at msort.c:45 #3 msort_with_tmp (p=0x7ffd4b0d5730, b=0x19ac830, n=3) at msort.c:54 #4 0x00007fb4868ddbc8 in msort_with_tmp (n=3, b=0x19ac830, p=0x7ffd4b0d5730) at msort.c:45 #5 msort_with_tmp (p=0x7ffd4b0d5730, b=0x19ac820, n=5) at msort.c:54 #6 0x00007fb4868ddbc8 in msort_with_tmp (n=5, b=0x19ac820, p=0x7ffd4b0d5730) at msort.c:45 #7 msort_with_tmp (p=0x7ffd4b0d5730, b=0x19ac7f8, n=10) at msort.c:54 #8 0x00007fb4868ddbc8 in msort_with_tmp (n=10, b=0x19ac7f8, p=0x7ffd4b0d5730) at msort.c:45 #9 msort_with_tmp (p=0x7ffd4b0d5730, b=0x19ac7a8, n=20) at msort.c:54 #10 0x00007fb4868ddbc8 in msort_with_tmp (n=20, b=0x19ac7a8, p=0x7ffd4b0d5730) at msort.c:45 #11 msort_with_tmp (p=0x7ffd4b0d5730, b=0x19ac708, n=40) at msort.c:54 #12 0x00007fb4868ddbc8 in msort_with_tmp (n=40, b=0x19ac708, p=0x7ffd4b0d5730) at msort.c:45 #13 msort_with_tmp (p=0x7ffd4b0d5730, b=0x19ac5d0, n=79) at msort.c:54 #14 0x00007fb4868de14c in msort_with_tmp (n=79, b=0x19ac5d0, p=0x7ffd4b0d5730) at msort.c:45 #15 __GI___qsort_r (b=b@entry=0x19ac5d0, n=n@entry=79, s=s@entry=8, cmp=0x419e30 <compcpu>, arg=arg@entry=0x0) at msort.c:297 #16 0x00007fb4868de1f8 in __GI_qsort (b=b@entry=0x19ac5d0, n=n@entry=79, s=s@entry=8, cmp=<optimized out>) at msort.c:308 #17 0x0000000000413dff in generic_samp (curtime=<optimized out>, nsecs=<optimized out>, devtstat=<optimized out>, sstat=<optimized out>, nexit=<optimized out>, noverflow=<optimized out>, flag=<optimized out>) at showgeneric.c:645 #18 0x000000000040396f in engine () at atop.c:684 #19 0x0000000000402fc3 in main (argc=4, argv=<optimized out>) at atop.c:477 Thanks Divya - I'll keep looking. :( Were they able to reproduce it with valgrind? (In reply to Nathan Scott from comment #18) > Thanks Divya - I'll keep looking. :( Were they able to reproduce it with > valgrind? Unfortunately no All the regression tests have passed. Switching to VERIFIED and setting flag SanityOnly as I am unable to reproduce this issue. It wasn't mentioned here, which led to some accidental confusion, but there is one other relavant commit (already in 7.9)... commit c22151f463e3e2494850210444a69948dc0fbdd6 Author: Nathan Scott <nathans> Date: Tue May 19 14:50:37 2020 +1000 pcp-atop: resolve a new task pointer segv qa/1080 has encountered Related to Red Hat BZ #1818710 *** Bug 1842480 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Low: pcp security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:3869 |