Bug 1462258

Summary: Syscall param epoll_pwait(sigmask) points to unaddressable byte(s)
Product: [Fedora] Fedora Reporter: Lukas Slebodnik <lslebodn>
Component: valgrindAssignee: Mark Wielaard <mjw>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: arjun, codonell, dj, dodji, fweimer, jakub, law, mfabian, mjw, pfrankli, siddhesh
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: valgrind-3.13.0-4.fc26 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1491360 (view as bug list) Environment:
Last Closed: 2017-07-07 23:05:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1491360    
Attachments:
Description Flags
reproducer for valgrind error none

Description Lukas Slebodnik 2017-06-16 14:21:32 UTC
Created attachment 1288381 [details]
reproducer for valgrind error

Description of problem:
Latest upgrade of glibc introduced valgrind errors in epoll 

Version-Release number of selected component (if applicable):
sh$ rpm -q glibc
glibc-2.25.90-5.fc27.x86_64

How reproducible:
Deterministic

Steps to Reproduce:
1. // compile attached file
   gcc -Wall -Wextra -g3 epoll-example.c
2. valgrind --track-origins=yes ./a.out 1111

Actual results:
==22132== Memcheck, a memory error detector
==22132== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==22132== Using Valgrind-3.13.0.RC1 and LibVEX; rerun with -h for copyright info
==22132== Command: ./a.out 1111
==22132== 
==22132== Syscall param epoll_pwait(sigmask) points to unaddressable byte(s)
==22132==    at 0x4F4C7C0: epoll_pwait (epoll_pwait.c:42)
==22132==    by 0x400D78: main (epoll-example.c:131)
==22132==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==22132== 
^C==22132== 
==22132== Process terminating with default action of signal 2 (SIGINT)
==22132==    at 0x4F4C7C0: epoll_pwait (epoll_pwait.c:42)
==22132==    by 0x400D78: main (epoll-example.c:131)
==22132== 
==22132== HEAP SUMMARY:
==22132==     in use at exit: 768 bytes in 1 blocks
==22132==   total heap usage: 5 allocs, 4 frees, 3,020 bytes allocated
==22132== 
==22132== LEAK SUMMARY:
==22132==    definitely lost: 0 bytes in 0 blocks
==22132==    indirectly lost: 0 bytes in 0 blocks
==22132==      possibly lost: 0 bytes in 0 blocks
==22132==    still reachable: 768 bytes in 1 blocks
==22132==         suppressed: 0 bytes in 0 blocks
==22132== Rerun with --leak-check=full to see details of leaked memory
==22132== 
==22132== For counts of detected and suppressed errors, rerun with: -v
==22132== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

Expected results:
No errors. The same as with older version of glibc

valgrind --track-origins=yes ./a.out 1111
==22050== Memcheck, a memory error detector
==22050== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==22050== Using Valgrind-3.13.0.RC1 and LibVEX; rerun with -h for copyright info
==22050== Command: ./a.out 1111
==22050== 
^C==22050== 
==22050== Process terminating with default action of signal 2 (SIGINT)
==22050==    at 0x4F4D253: __epoll_wait_nocancel (in /usr/lib64/libc-2.25.90.so)
==22050==    by 0x400D78: main (epoll-example2.c:131)
==22050== 
==22050== HEAP SUMMARY:
==22050==     in use at exit: 768 bytes in 1 blocks
==22050==   total heap usage: 5 allocs, 4 frees, 3,020 bytes allocated
==22050== 
==22050== LEAK SUMMARY:
==22050==    definitely lost: 0 bytes in 0 blocks
==22050==    indirectly lost: 0 bytes in 0 blocks
==22050==      possibly lost: 0 bytes in 0 blocks
==22050==    still reachable: 768 bytes in 1 blocks
==22050==         suppressed: 0 bytes in 0 blocks
==22050== Rerun with --leak-check=full to see details of leaked memory
==22050== 
==22050== For counts of detected and suppressed errors, rerun with: -v
==22050== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

[build@929198206d7a ci-build-debug]$ rpm -q glibc
glibc-2.25.90-2.fc27.x86_64

Additional info:

Comment 1 Florian Weimer 2017-06-16 14:33:01 UTC
(In reply to Lukas Slebodnik from comment #0)
> ==22132== Syscall param epoll_pwait(sigmask) points to unaddressable byte(s)
> ==22132==    at 0x4F4C7C0: epoll_pwait (epoll_pwait.c:42)
> ==22132==    by 0x400D78: main (epoll-example.c:131)
> ==22132==  Address 0x0 is not stack'd, malloc'd or (recently) free'd

For some reason, calloc returns 0 when running under valgrind.  This looks more like a valgrind bug to me, so reassigning.

Comment 2 Lukas Slebodnik 2017-06-16 14:55:19 UTC
(In reply to Florian Weimer from comment #1)
> (In reply to Lukas Slebodnik from comment #0)
> > ==22132== Syscall param epoll_pwait(sigmask) points to unaddressable byte(s)
> > ==22132==    at 0x4F4C7C0: epoll_pwait (epoll_pwait.c:42)
> > ==22132==    by 0x400D78: main (epoll-example.c:131)
> > ==22132==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
> 
> For some reason, calloc returns 0 when running under valgrind.  This looks
> more like a valgrind bug to me, so reassigning.

Could you explain why it is not a problem with glibc-2.25.90-2.fc27.x86_64?

Comment 3 Mark Wielaard 2017-06-16 14:57:20 UTC
(In reply to Florian Weimer from comment #1)
> (In reply to Lukas Slebodnik from comment #0)
> > ==22132== Syscall param epoll_pwait(sigmask) points to unaddressable byte(s)
> > ==22132==    at 0x4F4C7C0: epoll_pwait (epoll_pwait.c:42)
> > ==22132==    by 0x400D78: main (epoll-example.c:131)
> > ==22132==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
> 
> For some reason, calloc returns 0 when running under valgrind.  This looks
> more like a valgrind bug to me, so reassigning.

I cannot easily reproduce this. Could you add a printf ("events: %p\n", events); just before the epoll_wait () call and run it normally and under valgrind to see if it really is NULL in one case and not in the other?

Comment 4 Lukas Slebodnik 2017-06-16 15:02:10 UTC
(In reply to Mark Wielaard from comment #3)
> I cannot easily reproduce this.
But you should be

sh# docker run -ti --rm fedora:rawhide bash
[root@2b9059b6a9a9 /]# dnf install -y --setopt=debuglevel=0 --setopt=errorlevel=0  gcc
[root@2b9059b6a9a9 /]# dnf update -y --setopt=debuglevel=0 --setopt=errorlevel=0 glibc
[root@2b9059b6a9a9 /]# dnf install -y --setopt=debuglevel=0 --setopt=errorlevel=0 valgrind
[root@2b9059b6a9a9 /]# rpm -q glibc gcc valgrind
glibc-2.25.90-6.fc27.x86_64
gcc-7.1.1-2.fc27.x86_64
valgrind-3.13.0-0.2.RC1.fc27.x86_64
[root@2b9059b6a9a9 /]# curl -o epoll-example.c 'https://bugzilla.redhat.com/attachment.cgi?id=1288381'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2486  100  2486    0     0   2080      0  0:00:01  0:00:01 --:--:--  2082
[root@2b9059b6a9a9 /]# gcc -Wall -Wextra -g3 epoll-example.c
[root@2b9059b6a9a9 /]# valgrind --track-origins=yes ./a.out 1111
==74== Memcheck, a memory error detector
==74== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==74== Using Valgrind-3.13.0.RC1 and LibVEX; rerun with -h for copyright info
==74== Command: ./a.out 1111
==74== 
==74== Syscall param epoll_pwait(sigmask) points to unaddressable byte(s)
==74==    at 0x4F4C7C0: epoll_pwait (in /usr/lib64/libc-2.25.90.so)
==74==    by 0x400D78: main (epoll-example.c:131)
==74==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==74== 
^C==74== 
==74== Process terminating with default action of signal 2 (SIGINT)
==74==    at 0x4F4C7C0: epoll_pwait (in /usr/lib64/libc-2.25.90.so)
==74==    by 0x400D78: main (epoll-example.c:131)
==74== 
==74== HEAP SUMMARY:
==74==     in use at exit: 768 bytes in 1 blocks
==74==   total heap usage: 5 allocs, 4 frees, 3,020 bytes allocated
==74== 
==74== LEAK SUMMARY:
==74==    definitely lost: 0 bytes in 0 blocks
==74==    indirectly lost: 0 bytes in 0 blocks
==74==      possibly lost: 0 bytes in 0 blocks
==74==    still reachable: 768 bytes in 1 blocks
==74==         suppressed: 0 bytes in 0 blocks
==74== Rerun with --leak-check=full to see details of leaked memory
==74== 
==74== For counts of detected and suppressed errors, rerun with: -v
==74== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

Comment 5 Lukas Slebodnik 2017-06-16 15:03:58 UTC
(In reply to Mark Wielaard from comment #3)
> (In reply to Florian Weimer from comment #1)
> > (In reply to Lukas Slebodnik from comment #0)
> > > ==22132== Syscall param epoll_pwait(sigmask) points to unaddressable byte(s)
> > > ==22132==    at 0x4F4C7C0: epoll_pwait (epoll_pwait.c:42)
> > > ==22132==    by 0x400D78: main (epoll-example.c:131)
> > > ==22132==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
> > 
> > For some reason, calloc returns 0 when running under valgrind.  This looks
> > more like a valgrind bug to me, so reassigning.
> 
> I cannot easily reproduce this. Could you add a printf ("events: %p\n",
> events); just before the epoll_wait () call and run it normally and under
> valgrind to see if it really is NULL in one case and not in the other?

[root@2b9059b6a9a9 /]# vi epoll-example.c 
[root@2b9059b6a9a9 /]# gcc -Wall -Wextra -g3 epoll-example.c
[root@2b9059b6a9a9 /]# grep -A3 " printf" epoll-example.c
  printf ("events: %p\n", events);

  epoll_wait (efd, events, MAXEVENTS, -1);

[root@2b9059b6a9a9 /]# valgrind --track-origins=yes ./a.out 1111
==88== Memcheck, a memory error detector
==88== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==88== Using Valgrind-3.13.0.RC1 and LibVEX; rerun with -h for copyright info
==88== Command: ./a.out 1111
==88== 
events: 0x520da20
==88== Syscall param epoll_pwait(sigmask) points to unaddressable byte(s)
==88==    at 0x4F4C7C0: epoll_pwait (in /usr/lib64/libc-2.25.90.so)
==88==    by 0x400DDE: main (epoll-example.c:133)
==88==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==88==

Comment 6 Florian Weimer 2017-06-16 15:04:35 UTC
I was confused:

calloc: 0x520da20
==24772== Syscall param epoll_pwait(sigmask) points to unaddressable byte(s)
==24772==    at 0x4F4C7C0: epoll_pwait (in /usr/lib64/libc-2.25.90.so)
==24772==    by 0x400B54: main (in /tmp/a.out)
==24772==  Address 0x0 is not stack'd, malloc'd or (recently) free'd

The events pointer is *not* zero.

The epoll_pwait system call wrapper in valgrind probably assumes that the sigmask parameter is always non-NULL.  But it can be NULL to get the epoll_wait behavior.

Comment 7 Florian Weimer 2017-06-16 15:07:01 UTC
Looks like there is a missing NULL check in sys_epoll_pwait (comparing its implementation with sys_pselect6).

Comment 8 Lukas Slebodnik 2017-06-16 15:12:37 UTC
(In reply to Florian Weimer from comment #7)
> Looks like there is a missing NULL check in sys_epoll_pwait (comparing its
> implementation with sys_pselect6).

Do you mean in glibc or in valgrind?

Comment 9 Mark Wielaard 2017-06-16 15:12:55 UTC
(In reply to Florian Weimer from comment #6)
> The epoll_pwait system call wrapper in valgrind probably assumes that the
> sigmask parameter is always non-NULL.  But it can be NULL to get the
> epoll_wait behavior.

What the valgrind wrapper does is:

   if (ARG4)
      PRE_MEM_READ( "epoll_pwait(sigmask)", ARG5, sizeof(vki_sigset_t) );

So it checks ARG5 (sigmask) if ARGS4 (timeout) is non-zero.

That is probably a typo.
Unless there is a reason to only care about sigmask if timeout != 0.

Comment 10 Mark Wielaard 2017-06-16 15:45:43 UTC
Posted upstream including proposed fix and testcase update:
https://bugs.kde.org/show_bug.cgi?id=381289

Comment 11 Mark Wielaard 2017-06-16 16:01:53 UTC
(In reply to Lukas Slebodnik from comment #4)
> (In reply to Mark Wielaard from comment #3)
> > I cannot easily reproduce this.
> But you should be
> 
> sh# docker run -ti --rm fedora:rawhide bash
[...]

Cute! That does indeed gave an easy reproducer.

I am not completely clear on why this wasn't an issue with older glibc.
But it looks like newer glibc converts an epoll_wait into an epoll_pwait with a NULL sigmask argument. I don't know why it does that if there is a normal epoll_wait system call available.

Comment 12 Florian Weimer 2017-06-16 16:06:03 UTC
(In reply to Mark Wielaard from comment #11)
> I am not completely clear on why this wasn't an issue with older glibc.
> But it looks like newer glibc converts an epoll_wait into an epoll_pwait
> with a NULL sigmask argument. I don't know why it does that if there is a
> normal epoll_wait system call available.

The reason for this behavior is that newer architectures only support an epoll_pwait system call, and this allows us to consolidate the epoll_wait implementation across all architectures.

Comment 13 Lukas Slebodnik 2017-06-16 16:59:47 UTC
(In reply to Mark Wielaard from comment #11)
> (In reply to Lukas Slebodnik from comment #4)
> > (In reply to Mark Wielaard from comment #3)
> > > I cannot easily reproduce this.
> > But you should be
> > 
> > sh# docker run -ti --rm fedora:rawhide bash
> [...]
> 
> Cute! That does indeed gave an easy reproducer.
> 
I am glad I could help :-)

I hope patch will be accepted soon in upstream. So our CI on rawhide will not be blocked for long time.

Comment 14 Mark Wielaard 2017-06-18 16:33:59 UTC
valgrind-3.13.0-2

- Add valgrind-3.13.0-ppc64-check-no-vsx.patch
- Add valgrind-3.13.0-epoll_pwait.patch (#1462258)
- Add valgrind-3.13.0-ppc64-diag.patch

https://koji.fedoraproject.org/koji/buildinfo?buildID=909403
https://copr.fedorainfracloud.org/coprs/mjw/valgrind-3.13.0/build/567497/

Comment 15 Fedora Update System 2017-06-29 20:11:11 UTC
valgrind-3.13.0-4.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-4315a2f0cd

Comment 16 Fedora Update System 2017-06-30 20:25:34 UTC
valgrind-3.13.0-4.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-4315a2f0cd

Comment 17 Fedora Update System 2017-07-07 23:05:26 UTC
valgrind-3.13.0-4.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.