Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1846996

Summary: Possible false-positive while running pcp-testsuite
Product: Red Hat Enterprise Linux 8 Reporter: Alexandra Petlanová Hájková <ahajkova>
Component: valgrindAssignee: Mark Wielaard <mjw>
valgrind sub component: system-version QA Contact: qe-baseos-tools-bugs
Status: CLOSED NOTABUG Docs Contact:
Severity: unspecified    
Priority: unspecified CC: fweimer, jakub, mjw, ohudlick
Version: 8.1Flags: pm-rhel: mirror+
Target Milestone: rc   
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-06-16 10:36:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
test output none

Description Alexandra Petlanová Hájková 2020-06-15 11:51:35 UTC
Description of problem:
Running test case #1080 of pcp-testsuite, valgrind detects possible memory leak in pdubuf.c

Version-Release number of selected component (if applicable):
* RHEL-8.3-Alpha release
* valgrind-3.16.0-1.el8
* pcp-5.1.1-2.el8

How reproducible:
Always with valgrind >= 3.16
Versions of valgrid < 3.16 do not detect this possible memory leak

Steps to Reproduce:
1. Install pcp-system-tools
# yum install -y pcp-zeroconf pcp-system-tools
2. Install valgrind >= 3.16
(i.e. https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=1207405)
3. Run the test case #1080
# cd /var/lib/pcp/testsuite && ./check 1080

Actual results:
The testcase fails and valgrind reports possible memory leak.

The test case used to pass with the older versions of valgrind.

PCP bug report: https://bugzilla.redhat.com/show_bug.cgi?id=1846705

Comment 1 Alexandra Petlanová Hájková 2020-06-15 11:58:35 UTC
Created attachment 1697411 [details]
test output

Comment 2 Mark Wielaard 2020-06-15 12:02:44 UTC
The test output doesn't have associated debuginfo installed, so it is hard to tell what the application is really doing.
How is valgrind invoked precisely? Are any (other) suppressions used?

Comment 3 Mark Wielaard 2020-06-15 12:32:12 UTC
Found the source code. These comments in src/libpcp/src/p_result.c are at least somewhat suspicious and imply that this might be a real memory leak because nobody "unpins" the memory.

 * Because __pmFindPDUBuf() returns with a pinned pdu buffer, the
 * buffer passed back from __pmEncodeResult() must also remain pinned
 * (otherwise another thread could clobber the buffer after returning
 * from __pmEncodeResult()) ... it is the caller of __pmEncodeResult()
 * who is responsible for (a) not pinning the buffer again, and (b)
 * ensuring _someone_ will unpin the buffer when it is safe to do so.
 *
 * Similarly, __pmDecodeResult() accepts a pinned buffer and returns
 * a pmResult that (on 64-bit pointer platforms) may contain pointers
 * into a second underlying pinned buffer.  The input buffer remains
 * pinned, the second buffer will be pinned if it is used.  The caller
 * will typically call pmFreeResult(), but also needs to call
 * __pmUnpinPDUBuf() for the input PDU buffer.  When the result contains
 * pointers back into the input PDU buffer, this will be pinned _twice_
 * so the pmFreeResult() and __pmUnpinPDUBuf() calls will still be
 * required.

So, I think we need an analysis by the pcp hackers first to show the buffers are properly unpinned before we can conclude this is a false positive.

Comment 4 Mark Wielaard 2020-06-16 10:36:44 UTC
Got some more feedback from the pcp hackers.
The issue is not a regression in valgrind.
The issue is that pcp needs/has valgrind version specific suppression files.

The suppression has actually been in pcp upstream since valgrind 3.13.0:

commit 046b58e8e97a74ab42c09eaa4e4ac9cb81060767
Author: Ken McDonell <kenj.au>
Date:   Wed Aug 30 14:32:58 2017 +1000

    qa/valgrind-suppress-3.13.0: new valgrind version
    
    First seen on vm12 (Fedora 26).

diff --git a/qa/valgrind-suppress-3.13.0 b/qa/valgrind-suppress-3.13.0
new file mode 100644
index 000000000..a5529b2e9
--- /dev/null
+++ b/qa/valgrind-suppress-3.13.0
@@ -0,0 +1,27 @@
+# qa/1080 and qa/490 and qa/386 and qa/459 on vm12 (Fedora 26)
+# at 0x483123E: malloc (vg_replace_malloc.c:299)
+# by 0x4A92343: tsearch (in /usr/lib/libc-2.25.so)
+# by 0x487F84E: __pmFindPDUBuf (pdubuf.c:130)
+# ...
+{
+   tsearch
+   Memcheck:Leak
+   match-leak-kinds: possible
+   fun:malloc
+   fun:tsearch
+   fun:__pmFindPDUBuf
+   ...
+}
+
+# qa/1080 and qa/490 and qa/386 and qa/459 on vm12 (Fedora 26)
+# at 0x483123E: malloc (vg_replace_malloc.c:299)
+# by 0x487F7F4: __pmFindPDUBuf (pdubuf.c:119)
+# ...
+{
+   findpdubuf
+   Memcheck:Leak
+   match-leak-kinds: possible
+   fun:malloc
+   fun:__pmFindPDUBuf
+   ...
+}

The upstream pcp commit that fixed it for 3.16.0 was:

commit f30aff90b7b46e99ce6619c73b436e5e77591022
Author: Nathan Scott <nathans>
Date:   Tue Jun 9 09:18:16 2020 +1000

    qa: add valgrind suppressions needed for valgrind 3.16 on f32

diff --git a/qa/valgrind-suppress-3.16.0 b/qa/valgrind-suppress-3.16.0
new file mode 100644
index 000000000..515591747
--- /dev/null
+++ b/qa/valgrind-suppress-3.16.0
@@ -0,0 +1,27 @@
+# qa/1080 and qa/490 and qa/386 and qa/459 on Fedora 32
+# at 0x483880B: malloc (vg_replace_malloc.c:299)
+# by 0x4A0D490: tsearch (in /usr/lib64/libc-2.28.so)
+# by 0x4871EA6: __pmFindPDUBuf (pdubuf.c:126)
+# ...
+{
+   tsearch
+   Memcheck:Leak
+   match-leak-kinds: possible
+   fun:malloc
+   fun:tsearch
+   fun:__pmFindPDUBuf
+   ...
+}
+
+# qa/1080 and qa/490 and qa/386 and qa/459 on Fedora 32
+# at 0x483880B: malloc (vg_replace_malloc.c:299)
+# by 0x4871E5F: __pmFindPDUBuf (pdubuf.c:115)
+# ...
+{
+   findpdubuf
+   Memcheck:Leak
+   match-leak-kinds: possible
+   fun:malloc
+   fun:__pmFindPDUBuf
+   ...
+}

Note that the suppressions are the same.
There are also similar suppressions in pcp qa/valgrind-suppress-3.14.0 and qa/valgrind-suppress-3.15.0

We might want to investigate why pcp needs to have valgrind version specific suppressions (we would like suppressions to be generic usable across valgrind versions).
But this isn't an valgrind issue/regression. Closing.