Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1491360

Summary: Syscall param epoll_pwait(sigmask) points to unaddressable byte(s)
Product: Red Hat Enterprise Linux 7 Reporter: Sergey Kolosov <skolosov>
Component: valgrindAssignee: Mark Wielaard <mjw>
Status: CLOSED ERRATA QA Contact: Miloš Prchlík <mprchlik>
Severity: unspecified Docs Contact: Vladimír Slávik <vslavik>
Priority: unspecified    
Version: 7.4CC: arjun, codonell, dj, dodji, extras-qa, fweimer, jakub, law, lslebodn, mbenitez, mcermak, mfabian, mjw, mjw, mprchlik, ohudlick, pfrankli, siddhesh, vslavik
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: valgrind-3.13.0-9.el7 Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of: 1462258 Environment:
Last Closed: 2018-04-10 13:14:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1462258    
Bug Blocks:    

Description Sergey Kolosov 2017-09-13 14:54:43 UTC
+++ This bug was initially created as a clone of Bug #1462258 +++

Description of problem:
Latest upgrade of glibc introduced valgrind errors in epoll 

Version-Release number of selected component (if applicable):
sh$ rpm -q glibc
glibc-2.25.90-5.fc27.x86_64

How reproducible:
Deterministic

Steps to Reproduce:
1. // compile attached file
   gcc -Wall -Wextra -g3 epoll-example.c
2. valgrind --track-origins=yes ./a.out 1111

Actual results:
==22132== Memcheck, a memory error detector
==22132== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==22132== Using Valgrind-3.13.0.RC1 and LibVEX; rerun with -h for copyright info
==22132== Command: ./a.out 1111
==22132== 
==22132== Syscall param epoll_pwait(sigmask) points to unaddressable byte(s)
==22132==    at 0x4F4C7C0: epoll_pwait (epoll_pwait.c:42)
==22132==    by 0x400D78: main (epoll-example.c:131)
==22132==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==22132== 
^C==22132== 
==22132== Process terminating with default action of signal 2 (SIGINT)
==22132==    at 0x4F4C7C0: epoll_pwait (epoll_pwait.c:42)
==22132==    by 0x400D78: main (epoll-example.c:131)
==22132== 
==22132== HEAP SUMMARY:
==22132==     in use at exit: 768 bytes in 1 blocks
==22132==   total heap usage: 5 allocs, 4 frees, 3,020 bytes allocated
==22132== 
==22132== LEAK SUMMARY:
==22132==    definitely lost: 0 bytes in 0 blocks
==22132==    indirectly lost: 0 bytes in 0 blocks
==22132==      possibly lost: 0 bytes in 0 blocks
==22132==    still reachable: 768 bytes in 1 blocks
==22132==         suppressed: 0 bytes in 0 blocks
==22132== Rerun with --leak-check=full to see details of leaked memory
==22132== 
==22132== For counts of detected and suppressed errors, rerun with: -v
==22132== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

Expected results:
No errors. The same as with older version of glibc

valgrind --track-origins=yes ./a.out 1111
==22050== Memcheck, a memory error detector
==22050== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==22050== Using Valgrind-3.13.0.RC1 and LibVEX; rerun with -h for copyright info
==22050== Command: ./a.out 1111
==22050== 
^C==22050== 
==22050== Process terminating with default action of signal 2 (SIGINT)
==22050==    at 0x4F4D253: __epoll_wait_nocancel (in /usr/lib64/libc-2.25.90.so)
==22050==    by 0x400D78: main (epoll-example2.c:131)
==22050== 
==22050== HEAP SUMMARY:
==22050==     in use at exit: 768 bytes in 1 blocks
==22050==   total heap usage: 5 allocs, 4 frees, 3,020 bytes allocated
==22050== 
==22050== LEAK SUMMARY:
==22050==    definitely lost: 0 bytes in 0 blocks
==22050==    indirectly lost: 0 bytes in 0 blocks
==22050==      possibly lost: 0 bytes in 0 blocks
==22050==    still reachable: 768 bytes in 1 blocks
==22050==         suppressed: 0 bytes in 0 blocks
==22050== Rerun with --leak-check=full to see details of leaked memory
==22050== 
==22050== For counts of detected and suppressed errors, rerun with: -v
==22050== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

[build@929198206d7a ci-build-debug]$ rpm -q glibc
glibc-2.25.90-2.fc27.x86_64

Additional info:

--- Additional comment from Florian Weimer on 2017-06-16 10:33:01 EDT ---

(In reply to Lukas Slebodnik from comment #0)
> ==22132== Syscall param epoll_pwait(sigmask) points to unaddressable byte(s)
> ==22132==    at 0x4F4C7C0: epoll_pwait (epoll_pwait.c:42)
> ==22132==    by 0x400D78: main (epoll-example.c:131)
> ==22132==  Address 0x0 is not stack'd, malloc'd or (recently) free'd

For some reason, calloc returns 0 when running under valgrind.  This looks more like a valgrind bug to me, so reassigning.

--- Additional comment from Lukas Slebodnik on 2017-06-16 10:55:19 EDT ---

(In reply to Florian Weimer from comment #1)
> (In reply to Lukas Slebodnik from comment #0)
> > ==22132== Syscall param epoll_pwait(sigmask) points to unaddressable byte(s)
> > ==22132==    at 0x4F4C7C0: epoll_pwait (epoll_pwait.c:42)
> > ==22132==    by 0x400D78: main (epoll-example.c:131)
> > ==22132==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
> 
> For some reason, calloc returns 0 when running under valgrind.  This looks
> more like a valgrind bug to me, so reassigning.

Could you explain why it is not a problem with glibc-2.25.90-2.fc27.x86_64?

--- Additional comment from Mark Wielaard on 2017-06-16 10:57:20 EDT ---

(In reply to Florian Weimer from comment #1)
> (In reply to Lukas Slebodnik from comment #0)
> > ==22132== Syscall param epoll_pwait(sigmask) points to unaddressable byte(s)
> > ==22132==    at 0x4F4C7C0: epoll_pwait (epoll_pwait.c:42)
> > ==22132==    by 0x400D78: main (epoll-example.c:131)
> > ==22132==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
> 
> For some reason, calloc returns 0 when running under valgrind.  This looks
> more like a valgrind bug to me, so reassigning.

I cannot easily reproduce this. Could you add a printf ("events: %p\n", events); just before the epoll_wait () call and run it normally and under valgrind to see if it really is NULL in one case and not in the other?

--- Additional comment from Lukas Slebodnik on 2017-06-16 11:02:10 EDT ---

(In reply to Mark Wielaard from comment #3)
> I cannot easily reproduce this.
But you should be

sh# docker run -ti --rm fedora:rawhide bash
[root@2b9059b6a9a9 /]# dnf install -y --setopt=debuglevel=0 --setopt=errorlevel=0  gcc
[root@2b9059b6a9a9 /]# dnf update -y --setopt=debuglevel=0 --setopt=errorlevel=0 glibc
[root@2b9059b6a9a9 /]# dnf install -y --setopt=debuglevel=0 --setopt=errorlevel=0 valgrind
[root@2b9059b6a9a9 /]# rpm -q glibc gcc valgrind
glibc-2.25.90-6.fc27.x86_64
gcc-7.1.1-2.fc27.x86_64
valgrind-3.13.0-0.2.RC1.fc27.x86_64
[root@2b9059b6a9a9 /]# curl -o epoll-example.c 'https://bugzilla.redhat.com/attachment.cgi?id=1288381'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2486  100  2486    0     0   2080      0  0:00:01  0:00:01 --:--:--  2082
[root@2b9059b6a9a9 /]# gcc -Wall -Wextra -g3 epoll-example.c
[root@2b9059b6a9a9 /]# valgrind --track-origins=yes ./a.out 1111
==74== Memcheck, a memory error detector
==74== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==74== Using Valgrind-3.13.0.RC1 and LibVEX; rerun with -h for copyright info
==74== Command: ./a.out 1111
==74== 
==74== Syscall param epoll_pwait(sigmask) points to unaddressable byte(s)
==74==    at 0x4F4C7C0: epoll_pwait (in /usr/lib64/libc-2.25.90.so)
==74==    by 0x400D78: main (epoll-example.c:131)
==74==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==74== 
^C==74== 
==74== Process terminating with default action of signal 2 (SIGINT)
==74==    at 0x4F4C7C0: epoll_pwait (in /usr/lib64/libc-2.25.90.so)
==74==    by 0x400D78: main (epoll-example.c:131)
==74== 
==74== HEAP SUMMARY:
==74==     in use at exit: 768 bytes in 1 blocks
==74==   total heap usage: 5 allocs, 4 frees, 3,020 bytes allocated
==74== 
==74== LEAK SUMMARY:
==74==    definitely lost: 0 bytes in 0 blocks
==74==    indirectly lost: 0 bytes in 0 blocks
==74==      possibly lost: 0 bytes in 0 blocks
==74==    still reachable: 768 bytes in 1 blocks
==74==         suppressed: 0 bytes in 0 blocks
==74== Rerun with --leak-check=full to see details of leaked memory
==74== 
==74== For counts of detected and suppressed errors, rerun with: -v
==74== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

--- Additional comment from Lukas Slebodnik on 2017-06-16 11:03:58 EDT ---

(In reply to Mark Wielaard from comment #3)
> (In reply to Florian Weimer from comment #1)
> > (In reply to Lukas Slebodnik from comment #0)
> > > ==22132== Syscall param epoll_pwait(sigmask) points to unaddressable byte(s)
> > > ==22132==    at 0x4F4C7C0: epoll_pwait (epoll_pwait.c:42)
> > > ==22132==    by 0x400D78: main (epoll-example.c:131)
> > > ==22132==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
> > 
> > For some reason, calloc returns 0 when running under valgrind.  This looks
> > more like a valgrind bug to me, so reassigning.
> 
> I cannot easily reproduce this. Could you add a printf ("events: %p\n",
> events); just before the epoll_wait () call and run it normally and under
> valgrind to see if it really is NULL in one case and not in the other?

[root@2b9059b6a9a9 /]# vi epoll-example.c 
[root@2b9059b6a9a9 /]# gcc -Wall -Wextra -g3 epoll-example.c
[root@2b9059b6a9a9 /]# grep -A3 " printf" epoll-example.c
  printf ("events: %p\n", events);

  epoll_wait (efd, events, MAXEVENTS, -1);

[root@2b9059b6a9a9 /]# valgrind --track-origins=yes ./a.out 1111
==88== Memcheck, a memory error detector
==88== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==88== Using Valgrind-3.13.0.RC1 and LibVEX; rerun with -h for copyright info
==88== Command: ./a.out 1111
==88== 
events: 0x520da20
==88== Syscall param epoll_pwait(sigmask) points to unaddressable byte(s)
==88==    at 0x4F4C7C0: epoll_pwait (in /usr/lib64/libc-2.25.90.so)
==88==    by 0x400DDE: main (epoll-example.c:133)
==88==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==88==

--- Additional comment from Florian Weimer on 2017-06-16 11:04:35 EDT ---

I was confused:

calloc: 0x520da20
==24772== Syscall param epoll_pwait(sigmask) points to unaddressable byte(s)
==24772==    at 0x4F4C7C0: epoll_pwait (in /usr/lib64/libc-2.25.90.so)
==24772==    by 0x400B54: main (in /tmp/a.out)
==24772==  Address 0x0 is not stack'd, malloc'd or (recently) free'd

The events pointer is *not* zero.

The epoll_pwait system call wrapper in valgrind probably assumes that the sigmask parameter is always non-NULL.  But it can be NULL to get the epoll_wait behavior.

--- Additional comment from Florian Weimer on 2017-06-16 11:07:01 EDT ---

Looks like there is a missing NULL check in sys_epoll_pwait (comparing its implementation with sys_pselect6).

--- Additional comment from Lukas Slebodnik on 2017-06-16 11:12:37 EDT ---

(In reply to Florian Weimer from comment #7)
> Looks like there is a missing NULL check in sys_epoll_pwait (comparing its
> implementation with sys_pselect6).

Do you mean in glibc or in valgrind?

--- Additional comment from Mark Wielaard on 2017-06-16 11:12:55 EDT ---

(In reply to Florian Weimer from comment #6)
> The epoll_pwait system call wrapper in valgrind probably assumes that the
> sigmask parameter is always non-NULL.  But it can be NULL to get the
> epoll_wait behavior.

What the valgrind wrapper does is:

   if (ARG4)
      PRE_MEM_READ( "epoll_pwait(sigmask)", ARG5, sizeof(vki_sigset_t) );

So it checks ARG5 (sigmask) if ARGS4 (timeout) is non-zero.

That is probably a typo.
Unless there is a reason to only care about sigmask if timeout != 0.

--- Additional comment from Mark Wielaard on 2017-06-16 11:45:43 EDT ---

Posted upstream including proposed fix and testcase update:
https://bugs.kde.org/show_bug.cgi?id=381289

--- Additional comment from Mark Wielaard on 2017-06-16 12:01:53 EDT ---

(In reply to Lukas Slebodnik from comment #4)
> (In reply to Mark Wielaard from comment #3)
> > I cannot easily reproduce this.
> But you should be
> 
> sh# docker run -ti --rm fedora:rawhide bash
[...]

Cute! That does indeed gave an easy reproducer.

I am not completely clear on why this wasn't an issue with older glibc.
But it looks like newer glibc converts an epoll_wait into an epoll_pwait with a NULL sigmask argument. I don't know why it does that if there is a normal epoll_wait system call available.

--- Additional comment from Florian Weimer on 2017-06-16 12:06:03 EDT ---

(In reply to Mark Wielaard from comment #11)
> I am not completely clear on why this wasn't an issue with older glibc.
> But it looks like newer glibc converts an epoll_wait into an epoll_pwait
> with a NULL sigmask argument. I don't know why it does that if there is a
> normal epoll_wait system call available.

The reason for this behavior is that newer architectures only support an epoll_pwait system call, and this allows us to consolidate the epoll_wait implementation across all architectures.

--- Additional comment from Lukas Slebodnik on 2017-06-16 12:59:47 EDT ---

(In reply to Mark Wielaard from comment #11)
> (In reply to Lukas Slebodnik from comment #4)
> > (In reply to Mark Wielaard from comment #3)
> > > I cannot easily reproduce this.
> > But you should be
> > 
> > sh# docker run -ti --rm fedora:rawhide bash
> [...]
> 
> Cute! That does indeed gave an easy reproducer.
> 
I am glad I could help :-)

I hope patch will be accepted soon in upstream. So our CI on rawhide will not be blocked for long time.

--- Additional comment from Mark Wielaard on 2017-06-18 12:33:59 EDT ---

valgrind-3.13.0-2

- Add valgrind-3.13.0-ppc64-check-no-vsx.patch
- Add valgrind-3.13.0-epoll_pwait.patch (#1462258)
- Add valgrind-3.13.0-ppc64-diag.patch

https://koji.fedoraproject.org/koji/buildinfo?buildID=909403
https://copr.fedorainfracloud.org/coprs/mjw/valgrind-3.13.0/build/567497/

--- Additional comment from Fedora Update System on 2017-06-29 16:11:11 EDT ---

valgrind-3.13.0-4.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-4315a2f0cd

--- Additional comment from Fedora Update System on 2017-06-30 16:25:34 EDT ---

valgrind-3.13.0-4.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-4315a2f0cd

--- Additional comment from Fedora Update System on 2017-07-07 19:05:26 EDT ---

valgrind-3.13.0-4.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.

Comment 2 Sergey Kolosov 2017-09-13 14:58:16 UTC
Reproduced on RHEL-7.4 aarch64, valgrind-3.12.0-8.el7

[user@host ~]# valgrind --track-origins=yes ./a.out 1111
==24396== Memcheck, a memory error detector
==24396== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==24396== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==24396== Command: ./a.out 1111
==24396== 
==24396== Syscall param epoll_pwait(sigmask) points to unaddressable byte(s)
==24396==    at 0x498B7BC: epoll_pwait (in /usr/lib64/libc-2.17.so)
==24396==    by 0x400EF3: main (v.c:131)
==24396==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==24396== 
^C==24396== 
==24396== Process terminating with default action of signal 2 (SIGINT)
==24396==    at 0x498B7BC: epoll_pwait (in /usr/lib64/libc-2.17.so)
==24396==    by 0x400EF3: main (v.c:131)
==24396== 
==24396== HEAP SUMMARY:
==24396==     in use at exit: 1,024 bytes in 1 blocks
==24396==   total heap usage: 5 allocs, 4 frees, 1,876 bytes allocated
==24396== 
==24396== LEAK SUMMARY:
==24396==    definitely lost: 0 bytes in 0 blocks
==24396==    indirectly lost: 0 bytes in 0 blocks
==24396==      possibly lost: 0 bytes in 0 blocks
==24396==    still reachable: 1,024 bytes in 1 blocks
==24396==         suppressed: 0 bytes in 0 blocks
==24396== Rerun with --leak-check=full to see details of leaked memory
==24396== 
==24396== For counts of detected and suppressed errors, rerun with: -v
==24396== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

Comment 3 Mark Wielaard 2017-09-13 15:06:13 UTC
Upstream already has a fix with an easy testcase reproducer.
And this fix has already been backported to the Fedora valgrind package.

# KDE#381289 epoll_pwait can have a NULL sigmask.
Patch5: valgrind-3.13.0-epoll_pwait.patch

Comment 8 Miloš Prchlík 2018-01-08 08:28:59 UTC
Verified with build valgrind-3.13.0-10.el7.

Comment 11 errata-xmlrpc 2018-04-10 13:14:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0773