Created attachment 1288381 [details] reproducer for valgrind error Description of problem: Latest upgrade of glibc introduced valgrind errors in epoll Version-Release number of selected component (if applicable): sh$ rpm -q glibc glibc-2.25.90-5.fc27.x86_64 How reproducible: Deterministic Steps to Reproduce: 1. // compile attached file gcc -Wall -Wextra -g3 epoll-example.c 2. valgrind --track-origins=yes ./a.out 1111 Actual results: ==22132== Memcheck, a memory error detector ==22132== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==22132== Using Valgrind-3.13.0.RC1 and LibVEX; rerun with -h for copyright info ==22132== Command: ./a.out 1111 ==22132== ==22132== Syscall param epoll_pwait(sigmask) points to unaddressable byte(s) ==22132== at 0x4F4C7C0: epoll_pwait (epoll_pwait.c:42) ==22132== by 0x400D78: main (epoll-example.c:131) ==22132== Address 0x0 is not stack'd, malloc'd or (recently) free'd ==22132== ^C==22132== ==22132== Process terminating with default action of signal 2 (SIGINT) ==22132== at 0x4F4C7C0: epoll_pwait (epoll_pwait.c:42) ==22132== by 0x400D78: main (epoll-example.c:131) ==22132== ==22132== HEAP SUMMARY: ==22132== in use at exit: 768 bytes in 1 blocks ==22132== total heap usage: 5 allocs, 4 frees, 3,020 bytes allocated ==22132== ==22132== LEAK SUMMARY: ==22132== definitely lost: 0 bytes in 0 blocks ==22132== indirectly lost: 0 bytes in 0 blocks ==22132== possibly lost: 0 bytes in 0 blocks ==22132== still reachable: 768 bytes in 1 blocks ==22132== suppressed: 0 bytes in 0 blocks ==22132== Rerun with --leak-check=full to see details of leaked memory ==22132== ==22132== For counts of detected and suppressed errors, rerun with: -v ==22132== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) Expected results: No errors. The same as with older version of glibc valgrind --track-origins=yes ./a.out 1111 ==22050== Memcheck, a memory error detector ==22050== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==22050== Using Valgrind-3.13.0.RC1 and LibVEX; rerun with -h for copyright info ==22050== Command: ./a.out 1111 ==22050== ^C==22050== ==22050== Process terminating with default action of signal 2 (SIGINT) ==22050== at 0x4F4D253: __epoll_wait_nocancel (in /usr/lib64/libc-2.25.90.so) ==22050== by 0x400D78: main (epoll-example2.c:131) ==22050== ==22050== HEAP SUMMARY: ==22050== in use at exit: 768 bytes in 1 blocks ==22050== total heap usage: 5 allocs, 4 frees, 3,020 bytes allocated ==22050== ==22050== LEAK SUMMARY: ==22050== definitely lost: 0 bytes in 0 blocks ==22050== indirectly lost: 0 bytes in 0 blocks ==22050== possibly lost: 0 bytes in 0 blocks ==22050== still reachable: 768 bytes in 1 blocks ==22050== suppressed: 0 bytes in 0 blocks ==22050== Rerun with --leak-check=full to see details of leaked memory ==22050== ==22050== For counts of detected and suppressed errors, rerun with: -v ==22050== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) [build@929198206d7a ci-build-debug]$ rpm -q glibc glibc-2.25.90-2.fc27.x86_64 Additional info:
(In reply to Lukas Slebodnik from comment #0) > ==22132== Syscall param epoll_pwait(sigmask) points to unaddressable byte(s) > ==22132== at 0x4F4C7C0: epoll_pwait (epoll_pwait.c:42) > ==22132== by 0x400D78: main (epoll-example.c:131) > ==22132== Address 0x0 is not stack'd, malloc'd or (recently) free'd For some reason, calloc returns 0 when running under valgrind. This looks more like a valgrind bug to me, so reassigning.
(In reply to Florian Weimer from comment #1) > (In reply to Lukas Slebodnik from comment #0) > > ==22132== Syscall param epoll_pwait(sigmask) points to unaddressable byte(s) > > ==22132== at 0x4F4C7C0: epoll_pwait (epoll_pwait.c:42) > > ==22132== by 0x400D78: main (epoll-example.c:131) > > ==22132== Address 0x0 is not stack'd, malloc'd or (recently) free'd > > For some reason, calloc returns 0 when running under valgrind. This looks > more like a valgrind bug to me, so reassigning. Could you explain why it is not a problem with glibc-2.25.90-2.fc27.x86_64?
(In reply to Florian Weimer from comment #1) > (In reply to Lukas Slebodnik from comment #0) > > ==22132== Syscall param epoll_pwait(sigmask) points to unaddressable byte(s) > > ==22132== at 0x4F4C7C0: epoll_pwait (epoll_pwait.c:42) > > ==22132== by 0x400D78: main (epoll-example.c:131) > > ==22132== Address 0x0 is not stack'd, malloc'd or (recently) free'd > > For some reason, calloc returns 0 when running under valgrind. This looks > more like a valgrind bug to me, so reassigning. I cannot easily reproduce this. Could you add a printf ("events: %p\n", events); just before the epoll_wait () call and run it normally and under valgrind to see if it really is NULL in one case and not in the other?
(In reply to Mark Wielaard from comment #3) > I cannot easily reproduce this. But you should be sh# docker run -ti --rm fedora:rawhide bash [root@2b9059b6a9a9 /]# dnf install -y --setopt=debuglevel=0 --setopt=errorlevel=0 gcc [root@2b9059b6a9a9 /]# dnf update -y --setopt=debuglevel=0 --setopt=errorlevel=0 glibc [root@2b9059b6a9a9 /]# dnf install -y --setopt=debuglevel=0 --setopt=errorlevel=0 valgrind [root@2b9059b6a9a9 /]# rpm -q glibc gcc valgrind glibc-2.25.90-6.fc27.x86_64 gcc-7.1.1-2.fc27.x86_64 valgrind-3.13.0-0.2.RC1.fc27.x86_64 [root@2b9059b6a9a9 /]# curl -o epoll-example.c 'https://bugzilla.redhat.com/attachment.cgi?id=1288381' % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 2486 100 2486 0 0 2080 0 0:00:01 0:00:01 --:--:-- 2082 [root@2b9059b6a9a9 /]# gcc -Wall -Wextra -g3 epoll-example.c [root@2b9059b6a9a9 /]# valgrind --track-origins=yes ./a.out 1111 ==74== Memcheck, a memory error detector ==74== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==74== Using Valgrind-3.13.0.RC1 and LibVEX; rerun with -h for copyright info ==74== Command: ./a.out 1111 ==74== ==74== Syscall param epoll_pwait(sigmask) points to unaddressable byte(s) ==74== at 0x4F4C7C0: epoll_pwait (in /usr/lib64/libc-2.25.90.so) ==74== by 0x400D78: main (epoll-example.c:131) ==74== Address 0x0 is not stack'd, malloc'd or (recently) free'd ==74== ^C==74== ==74== Process terminating with default action of signal 2 (SIGINT) ==74== at 0x4F4C7C0: epoll_pwait (in /usr/lib64/libc-2.25.90.so) ==74== by 0x400D78: main (epoll-example.c:131) ==74== ==74== HEAP SUMMARY: ==74== in use at exit: 768 bytes in 1 blocks ==74== total heap usage: 5 allocs, 4 frees, 3,020 bytes allocated ==74== ==74== LEAK SUMMARY: ==74== definitely lost: 0 bytes in 0 blocks ==74== indirectly lost: 0 bytes in 0 blocks ==74== possibly lost: 0 bytes in 0 blocks ==74== still reachable: 768 bytes in 1 blocks ==74== suppressed: 0 bytes in 0 blocks ==74== Rerun with --leak-check=full to see details of leaked memory ==74== ==74== For counts of detected and suppressed errors, rerun with: -v ==74== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
(In reply to Mark Wielaard from comment #3) > (In reply to Florian Weimer from comment #1) > > (In reply to Lukas Slebodnik from comment #0) > > > ==22132== Syscall param epoll_pwait(sigmask) points to unaddressable byte(s) > > > ==22132== at 0x4F4C7C0: epoll_pwait (epoll_pwait.c:42) > > > ==22132== by 0x400D78: main (epoll-example.c:131) > > > ==22132== Address 0x0 is not stack'd, malloc'd or (recently) free'd > > > > For some reason, calloc returns 0 when running under valgrind. This looks > > more like a valgrind bug to me, so reassigning. > > I cannot easily reproduce this. Could you add a printf ("events: %p\n", > events); just before the epoll_wait () call and run it normally and under > valgrind to see if it really is NULL in one case and not in the other? [root@2b9059b6a9a9 /]# vi epoll-example.c [root@2b9059b6a9a9 /]# gcc -Wall -Wextra -g3 epoll-example.c [root@2b9059b6a9a9 /]# grep -A3 " printf" epoll-example.c printf ("events: %p\n", events); epoll_wait (efd, events, MAXEVENTS, -1); [root@2b9059b6a9a9 /]# valgrind --track-origins=yes ./a.out 1111 ==88== Memcheck, a memory error detector ==88== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==88== Using Valgrind-3.13.0.RC1 and LibVEX; rerun with -h for copyright info ==88== Command: ./a.out 1111 ==88== events: 0x520da20 ==88== Syscall param epoll_pwait(sigmask) points to unaddressable byte(s) ==88== at 0x4F4C7C0: epoll_pwait (in /usr/lib64/libc-2.25.90.so) ==88== by 0x400DDE: main (epoll-example.c:133) ==88== Address 0x0 is not stack'd, malloc'd or (recently) free'd ==88==
I was confused: calloc: 0x520da20 ==24772== Syscall param epoll_pwait(sigmask) points to unaddressable byte(s) ==24772== at 0x4F4C7C0: epoll_pwait (in /usr/lib64/libc-2.25.90.so) ==24772== by 0x400B54: main (in /tmp/a.out) ==24772== Address 0x0 is not stack'd, malloc'd or (recently) free'd The events pointer is *not* zero. The epoll_pwait system call wrapper in valgrind probably assumes that the sigmask parameter is always non-NULL. But it can be NULL to get the epoll_wait behavior.
Looks like there is a missing NULL check in sys_epoll_pwait (comparing its implementation with sys_pselect6).
(In reply to Florian Weimer from comment #7) > Looks like there is a missing NULL check in sys_epoll_pwait (comparing its > implementation with sys_pselect6). Do you mean in glibc or in valgrind?
(In reply to Florian Weimer from comment #6) > The epoll_pwait system call wrapper in valgrind probably assumes that the > sigmask parameter is always non-NULL. But it can be NULL to get the > epoll_wait behavior. What the valgrind wrapper does is: if (ARG4) PRE_MEM_READ( "epoll_pwait(sigmask)", ARG5, sizeof(vki_sigset_t) ); So it checks ARG5 (sigmask) if ARGS4 (timeout) is non-zero. That is probably a typo. Unless there is a reason to only care about sigmask if timeout != 0.
Posted upstream including proposed fix and testcase update: https://bugs.kde.org/show_bug.cgi?id=381289
(In reply to Lukas Slebodnik from comment #4) > (In reply to Mark Wielaard from comment #3) > > I cannot easily reproduce this. > But you should be > > sh# docker run -ti --rm fedora:rawhide bash [...] Cute! That does indeed gave an easy reproducer. I am not completely clear on why this wasn't an issue with older glibc. But it looks like newer glibc converts an epoll_wait into an epoll_pwait with a NULL sigmask argument. I don't know why it does that if there is a normal epoll_wait system call available.
(In reply to Mark Wielaard from comment #11) > I am not completely clear on why this wasn't an issue with older glibc. > But it looks like newer glibc converts an epoll_wait into an epoll_pwait > with a NULL sigmask argument. I don't know why it does that if there is a > normal epoll_wait system call available. The reason for this behavior is that newer architectures only support an epoll_pwait system call, and this allows us to consolidate the epoll_wait implementation across all architectures.
(In reply to Mark Wielaard from comment #11) > (In reply to Lukas Slebodnik from comment #4) > > (In reply to Mark Wielaard from comment #3) > > > I cannot easily reproduce this. > > But you should be > > > > sh# docker run -ti --rm fedora:rawhide bash > [...] > > Cute! That does indeed gave an easy reproducer. > I am glad I could help :-) I hope patch will be accepted soon in upstream. So our CI on rawhide will not be blocked for long time.
valgrind-3.13.0-2 - Add valgrind-3.13.0-ppc64-check-no-vsx.patch - Add valgrind-3.13.0-epoll_pwait.patch (#1462258) - Add valgrind-3.13.0-ppc64-diag.patch https://koji.fedoraproject.org/koji/buildinfo?buildID=909403 https://copr.fedorainfracloud.org/coprs/mjw/valgrind-3.13.0/build/567497/
valgrind-3.13.0-4.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-4315a2f0cd
valgrind-3.13.0-4.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-4315a2f0cd
valgrind-3.13.0-4.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.