Bug 1462258 - Syscall param epoll_pwait(sigmask) points to unaddressable byte(s)
Summary: Syscall param epoll_pwait(sigmask) points to unaddressable byte(s)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: valgrind
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Mark Wielaard
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 1491360
TreeView+ depends on / blocked
 
Reported: 2017-06-16 14:21 UTC by Lukas Slebodnik
Modified: 2017-09-13 14:54 UTC (History)
11 users (show)

Fixed In Version: valgrind-3.13.0-4.fc26
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1491360 (view as bug list)
Environment:
Last Closed: 2017-07-07 23:05:26 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
reproducer for valgrind error (2.43 KB, text/x-csrc)
2017-06-16 14:21 UTC, Lukas Slebodnik
no flags Details


Links
System ID Private Priority Status Summary Last Updated
KDE Software Compilation 381289 0 None None None 2017-06-16 15:45:43 UTC

Description Lukas Slebodnik 2017-06-16 14:21:32 UTC
Created attachment 1288381 [details]
reproducer for valgrind error

Description of problem:
Latest upgrade of glibc introduced valgrind errors in epoll 

Version-Release number of selected component (if applicable):
sh$ rpm -q glibc
glibc-2.25.90-5.fc27.x86_64

How reproducible:
Deterministic

Steps to Reproduce:
1. // compile attached file
   gcc -Wall -Wextra -g3 epoll-example.c
2. valgrind --track-origins=yes ./a.out 1111

Actual results:
==22132== Memcheck, a memory error detector
==22132== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==22132== Using Valgrind-3.13.0.RC1 and LibVEX; rerun with -h for copyright info
==22132== Command: ./a.out 1111
==22132== 
==22132== Syscall param epoll_pwait(sigmask) points to unaddressable byte(s)
==22132==    at 0x4F4C7C0: epoll_pwait (epoll_pwait.c:42)
==22132==    by 0x400D78: main (epoll-example.c:131)
==22132==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==22132== 
^C==22132== 
==22132== Process terminating with default action of signal 2 (SIGINT)
==22132==    at 0x4F4C7C0: epoll_pwait (epoll_pwait.c:42)
==22132==    by 0x400D78: main (epoll-example.c:131)
==22132== 
==22132== HEAP SUMMARY:
==22132==     in use at exit: 768 bytes in 1 blocks
==22132==   total heap usage: 5 allocs, 4 frees, 3,020 bytes allocated
==22132== 
==22132== LEAK SUMMARY:
==22132==    definitely lost: 0 bytes in 0 blocks
==22132==    indirectly lost: 0 bytes in 0 blocks
==22132==      possibly lost: 0 bytes in 0 blocks
==22132==    still reachable: 768 bytes in 1 blocks
==22132==         suppressed: 0 bytes in 0 blocks
==22132== Rerun with --leak-check=full to see details of leaked memory
==22132== 
==22132== For counts of detected and suppressed errors, rerun with: -v
==22132== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

Expected results:
No errors. The same as with older version of glibc

valgrind --track-origins=yes ./a.out 1111
==22050== Memcheck, a memory error detector
==22050== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==22050== Using Valgrind-3.13.0.RC1 and LibVEX; rerun with -h for copyright info
==22050== Command: ./a.out 1111
==22050== 
^C==22050== 
==22050== Process terminating with default action of signal 2 (SIGINT)
==22050==    at 0x4F4D253: __epoll_wait_nocancel (in /usr/lib64/libc-2.25.90.so)
==22050==    by 0x400D78: main (epoll-example2.c:131)
==22050== 
==22050== HEAP SUMMARY:
==22050==     in use at exit: 768 bytes in 1 blocks
==22050==   total heap usage: 5 allocs, 4 frees, 3,020 bytes allocated
==22050== 
==22050== LEAK SUMMARY:
==22050==    definitely lost: 0 bytes in 0 blocks
==22050==    indirectly lost: 0 bytes in 0 blocks
==22050==      possibly lost: 0 bytes in 0 blocks
==22050==    still reachable: 768 bytes in 1 blocks
==22050==         suppressed: 0 bytes in 0 blocks
==22050== Rerun with --leak-check=full to see details of leaked memory
==22050== 
==22050== For counts of detected and suppressed errors, rerun with: -v
==22050== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

[build@929198206d7a ci-build-debug]$ rpm -q glibc
glibc-2.25.90-2.fc27.x86_64

Additional info:

Comment 1 Florian Weimer 2017-06-16 14:33:01 UTC
(In reply to Lukas Slebodnik from comment #0)
> ==22132== Syscall param epoll_pwait(sigmask) points to unaddressable byte(s)
> ==22132==    at 0x4F4C7C0: epoll_pwait (epoll_pwait.c:42)
> ==22132==    by 0x400D78: main (epoll-example.c:131)
> ==22132==  Address 0x0 is not stack'd, malloc'd or (recently) free'd

For some reason, calloc returns 0 when running under valgrind.  This looks more like a valgrind bug to me, so reassigning.

Comment 2 Lukas Slebodnik 2017-06-16 14:55:19 UTC
(In reply to Florian Weimer from comment #1)
> (In reply to Lukas Slebodnik from comment #0)
> > ==22132== Syscall param epoll_pwait(sigmask) points to unaddressable byte(s)
> > ==22132==    at 0x4F4C7C0: epoll_pwait (epoll_pwait.c:42)
> > ==22132==    by 0x400D78: main (epoll-example.c:131)
> > ==22132==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
> 
> For some reason, calloc returns 0 when running under valgrind.  This looks
> more like a valgrind bug to me, so reassigning.

Could you explain why it is not a problem with glibc-2.25.90-2.fc27.x86_64?

Comment 3 Mark Wielaard 2017-06-16 14:57:20 UTC
(In reply to Florian Weimer from comment #1)
> (In reply to Lukas Slebodnik from comment #0)
> > ==22132== Syscall param epoll_pwait(sigmask) points to unaddressable byte(s)
> > ==22132==    at 0x4F4C7C0: epoll_pwait (epoll_pwait.c:42)
> > ==22132==    by 0x400D78: main (epoll-example.c:131)
> > ==22132==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
> 
> For some reason, calloc returns 0 when running under valgrind.  This looks
> more like a valgrind bug to me, so reassigning.

I cannot easily reproduce this. Could you add a printf ("events: %p\n", events); just before the epoll_wait () call and run it normally and under valgrind to see if it really is NULL in one case and not in the other?

Comment 4 Lukas Slebodnik 2017-06-16 15:02:10 UTC
(In reply to Mark Wielaard from comment #3)
> I cannot easily reproduce this.
But you should be

sh# docker run -ti --rm fedora:rawhide bash
[root@2b9059b6a9a9 /]# dnf install -y --setopt=debuglevel=0 --setopt=errorlevel=0  gcc
[root@2b9059b6a9a9 /]# dnf update -y --setopt=debuglevel=0 --setopt=errorlevel=0 glibc
[root@2b9059b6a9a9 /]# dnf install -y --setopt=debuglevel=0 --setopt=errorlevel=0 valgrind
[root@2b9059b6a9a9 /]# rpm -q glibc gcc valgrind
glibc-2.25.90-6.fc27.x86_64
gcc-7.1.1-2.fc27.x86_64
valgrind-3.13.0-0.2.RC1.fc27.x86_64
[root@2b9059b6a9a9 /]# curl -o epoll-example.c 'https://bugzilla.redhat.com/attachment.cgi?id=1288381'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2486  100  2486    0     0   2080      0  0:00:01  0:00:01 --:--:--  2082
[root@2b9059b6a9a9 /]# gcc -Wall -Wextra -g3 epoll-example.c
[root@2b9059b6a9a9 /]# valgrind --track-origins=yes ./a.out 1111
==74== Memcheck, a memory error detector
==74== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==74== Using Valgrind-3.13.0.RC1 and LibVEX; rerun with -h for copyright info
==74== Command: ./a.out 1111
==74== 
==74== Syscall param epoll_pwait(sigmask) points to unaddressable byte(s)
==74==    at 0x4F4C7C0: epoll_pwait (in /usr/lib64/libc-2.25.90.so)
==74==    by 0x400D78: main (epoll-example.c:131)
==74==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==74== 
^C==74== 
==74== Process terminating with default action of signal 2 (SIGINT)
==74==    at 0x4F4C7C0: epoll_pwait (in /usr/lib64/libc-2.25.90.so)
==74==    by 0x400D78: main (epoll-example.c:131)
==74== 
==74== HEAP SUMMARY:
==74==     in use at exit: 768 bytes in 1 blocks
==74==   total heap usage: 5 allocs, 4 frees, 3,020 bytes allocated
==74== 
==74== LEAK SUMMARY:
==74==    definitely lost: 0 bytes in 0 blocks
==74==    indirectly lost: 0 bytes in 0 blocks
==74==      possibly lost: 0 bytes in 0 blocks
==74==    still reachable: 768 bytes in 1 blocks
==74==         suppressed: 0 bytes in 0 blocks
==74== Rerun with --leak-check=full to see details of leaked memory
==74== 
==74== For counts of detected and suppressed errors, rerun with: -v
==74== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

Comment 5 Lukas Slebodnik 2017-06-16 15:03:58 UTC
(In reply to Mark Wielaard from comment #3)
> (In reply to Florian Weimer from comment #1)
> > (In reply to Lukas Slebodnik from comment #0)
> > > ==22132== Syscall param epoll_pwait(sigmask) points to unaddressable byte(s)
> > > ==22132==    at 0x4F4C7C0: epoll_pwait (epoll_pwait.c:42)
> > > ==22132==    by 0x400D78: main (epoll-example.c:131)
> > > ==22132==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
> > 
> > For some reason, calloc returns 0 when running under valgrind.  This looks
> > more like a valgrind bug to me, so reassigning.
> 
> I cannot easily reproduce this. Could you add a printf ("events: %p\n",
> events); just before the epoll_wait () call and run it normally and under
> valgrind to see if it really is NULL in one case and not in the other?

[root@2b9059b6a9a9 /]# vi epoll-example.c 
[root@2b9059b6a9a9 /]# gcc -Wall -Wextra -g3 epoll-example.c
[root@2b9059b6a9a9 /]# grep -A3 " printf" epoll-example.c
  printf ("events: %p\n", events);

  epoll_wait (efd, events, MAXEVENTS, -1);

[root@2b9059b6a9a9 /]# valgrind --track-origins=yes ./a.out 1111
==88== Memcheck, a memory error detector
==88== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==88== Using Valgrind-3.13.0.RC1 and LibVEX; rerun with -h for copyright info
==88== Command: ./a.out 1111
==88== 
events: 0x520da20
==88== Syscall param epoll_pwait(sigmask) points to unaddressable byte(s)
==88==    at 0x4F4C7C0: epoll_pwait (in /usr/lib64/libc-2.25.90.so)
==88==    by 0x400DDE: main (epoll-example.c:133)
==88==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==88==

Comment 6 Florian Weimer 2017-06-16 15:04:35 UTC
I was confused:

calloc: 0x520da20
==24772== Syscall param epoll_pwait(sigmask) points to unaddressable byte(s)
==24772==    at 0x4F4C7C0: epoll_pwait (in /usr/lib64/libc-2.25.90.so)
==24772==    by 0x400B54: main (in /tmp/a.out)
==24772==  Address 0x0 is not stack'd, malloc'd or (recently) free'd

The events pointer is *not* zero.

The epoll_pwait system call wrapper in valgrind probably assumes that the sigmask parameter is always non-NULL.  But it can be NULL to get the epoll_wait behavior.

Comment 7 Florian Weimer 2017-06-16 15:07:01 UTC
Looks like there is a missing NULL check in sys_epoll_pwait (comparing its implementation with sys_pselect6).

Comment 8 Lukas Slebodnik 2017-06-16 15:12:37 UTC
(In reply to Florian Weimer from comment #7)
> Looks like there is a missing NULL check in sys_epoll_pwait (comparing its
> implementation with sys_pselect6).

Do you mean in glibc or in valgrind?

Comment 9 Mark Wielaard 2017-06-16 15:12:55 UTC
(In reply to Florian Weimer from comment #6)
> The epoll_pwait system call wrapper in valgrind probably assumes that the
> sigmask parameter is always non-NULL.  But it can be NULL to get the
> epoll_wait behavior.

What the valgrind wrapper does is:

   if (ARG4)
      PRE_MEM_READ( "epoll_pwait(sigmask)", ARG5, sizeof(vki_sigset_t) );

So it checks ARG5 (sigmask) if ARGS4 (timeout) is non-zero.

That is probably a typo.
Unless there is a reason to only care about sigmask if timeout != 0.

Comment 10 Mark Wielaard 2017-06-16 15:45:43 UTC
Posted upstream including proposed fix and testcase update:
https://bugs.kde.org/show_bug.cgi?id=381289

Comment 11 Mark Wielaard 2017-06-16 16:01:53 UTC
(In reply to Lukas Slebodnik from comment #4)
> (In reply to Mark Wielaard from comment #3)
> > I cannot easily reproduce this.
> But you should be
> 
> sh# docker run -ti --rm fedora:rawhide bash
[...]

Cute! That does indeed gave an easy reproducer.

I am not completely clear on why this wasn't an issue with older glibc.
But it looks like newer glibc converts an epoll_wait into an epoll_pwait with a NULL sigmask argument. I don't know why it does that if there is a normal epoll_wait system call available.

Comment 12 Florian Weimer 2017-06-16 16:06:03 UTC
(In reply to Mark Wielaard from comment #11)
> I am not completely clear on why this wasn't an issue with older glibc.
> But it looks like newer glibc converts an epoll_wait into an epoll_pwait
> with a NULL sigmask argument. I don't know why it does that if there is a
> normal epoll_wait system call available.

The reason for this behavior is that newer architectures only support an epoll_pwait system call, and this allows us to consolidate the epoll_wait implementation across all architectures.

Comment 13 Lukas Slebodnik 2017-06-16 16:59:47 UTC
(In reply to Mark Wielaard from comment #11)
> (In reply to Lukas Slebodnik from comment #4)
> > (In reply to Mark Wielaard from comment #3)
> > > I cannot easily reproduce this.
> > But you should be
> > 
> > sh# docker run -ti --rm fedora:rawhide bash
> [...]
> 
> Cute! That does indeed gave an easy reproducer.
> 
I am glad I could help :-)

I hope patch will be accepted soon in upstream. So our CI on rawhide will not be blocked for long time.

Comment 14 Mark Wielaard 2017-06-18 16:33:59 UTC
valgrind-3.13.0-2

- Add valgrind-3.13.0-ppc64-check-no-vsx.patch
- Add valgrind-3.13.0-epoll_pwait.patch (#1462258)
- Add valgrind-3.13.0-ppc64-diag.patch

https://koji.fedoraproject.org/koji/buildinfo?buildID=909403
https://copr.fedorainfracloud.org/coprs/mjw/valgrind-3.13.0/build/567497/

Comment 15 Fedora Update System 2017-06-29 20:11:11 UTC
valgrind-3.13.0-4.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-4315a2f0cd

Comment 16 Fedora Update System 2017-06-30 20:25:34 UTC
valgrind-3.13.0-4.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-4315a2f0cd

Comment 17 Fedora Update System 2017-07-07 23:05:26 UTC
valgrind-3.13.0-4.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.