678518 – valgrind false positives on gcc-generated string routines

Bug 678518 - valgrind false positives on gcc-generated string routines

Summary: valgrind false positives on gcc-generated string routines

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	libidn
Sub Component:
Version:	22
Hardware:	x86_64
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	---
Assignee:	Mark Wielaard
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2011-02-18 10:41 UTC by Kamil Dudka
Modified:	2016-07-19 20:23 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2016-07-19 20:23:33 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
a reproducer (367 bytes, text/plain) 2011-02-18 10:46 UTC, Kamil Dudka	no flags	Details
View All

Description Kamil Dudka 2011-02-18 10:41:17 UTC

Version-Release number of selected component (if applicable):
libidn-1.19-2.fc15.x86_64


How reproducible:
100 %


Steps to Reproduce:
1. run the attached reproducer

  
Additional info:
The bug breaks builds of rawhide curl:

http://koji.fedoraproject.org/koji/getfile?taskID=2846952&name=build.log

Comment 1 Kamil Dudka 2011-02-18 10:46:46 UTC

Created attachment 479480 [details]
a reproducer

Comment 2 Kamil Dudka 2011-02-18 10:48:43 UTC

$ curl -JO 'https://bugzilla.redhat.com/attachment.cgi?id=479480'
$ sh bz678518.c

Invalid read of size 4
   at 0x4E33791: idna_to_ascii_4z (idna.c:514)
   by 0x4E33A34: idna_to_ascii_8z (idna.c:565)
   by 0x4E33A98: idna_to_ascii_lz (idna.c:597)
   by 0x4005A3: main (bz678518.c:15)
 Address 0x5405fcc is 12 bytes inside a block of size 15 alloc'd
   at 0x4C284F2: realloc (vg_replace_malloc.c:525)
   by 0x4E3380A: idna_to_ascii_4z (idna.c:514)
   by 0x4E33A34: idna_to_ascii_8z (idna.c:565)
   by 0x4E33A98: idna_to_ascii_lz (idna.c:597)
   by 0x4005A3: main (bz678518.c:15)

Comment 3 Kamil Dudka 2011-02-18 11:19:48 UTC

*** Bug 678521 has been marked as a duplicate of this bug. ***

Comment 4 Miroslav Lichvar 2011-02-18 13:30:56 UTC

Looks like a gcc bug, perhaps a wrong assumption about alignment of the buffer returned by realloc?

Reproducer:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main() {
        char *x;

	x = malloc(4);
        x = realloc(x, 15);
        sprintf(x, "www.xn--4cab6c");
	return strlen(x);
}

Can this cause problems outside valgrind?

Comment 5 Jakub Jelinek 2011-02-18 16:37:12 UTC

No, this is yet another case where valgrind just don't understand common programming techniques.
Here, gcc knows it can assume that x from the realloc is 2 * sizeof (void *) bytes aligned, therefore it implements strlen with this tuning as an inlined loop which reads a word at a time and looks for zero bytes in it.
It knows it won't crash if there is a zero byte before end of cliff, so it doesn't matter if it reads 1-7 extra bytes after malloced block when the access is aligned.
See e.g. https://bugs.kde.org/show_bug.cgi?id=264936
but right now it is not anywhere near a fix upstream.

Comment 6 Kamil Dudka 2011-02-18 17:11:11 UTC

Is there any gcc flag to disable the optimization that causes problems to valgrind while keeping all the other -O2 optimization passes?

Comment 7 Jakub Jelinek 2011-02-18 17:23:36 UTC

-fno-builtin-strlen, but you really don't want to use that, it will penalize code way too much (then even strlen ("abc") isn't folded etc.).

Just live with valgrind false positives, any time it reports a read access
with size N as M bytes inside a block of size O alloc'd
where (M % N) == 0 && M == (O & ~(N - 1)) && O > M it is suspect you might be looking at a valgrind false positive.

Comment 8 Kamil Dudka 2011-02-18 17:43:13 UTC

(In reply to comment #7)
> -fno-builtin-strlen, but you really don't want to use that, it will penalize
> code way too much (then even strlen ("abc") isn't folded etc.).

Thanks.  Any chance to get back the 'repnz scasb'-based implementation that -O1 produces?

> Just live with valgrind false positives, any time it reports a read access
> with size N as M bytes inside a block of size O alloc'd
> where (M % N) == 0 && M == (O & ~(N - 1)) && O > M it is suspect you might be
> looking at a valgrind false positive.

That's something hard to distinguish from the regular off-by-one errors occurring in C code as long as the unrolled strlen() does not appear on the backtrace.

Comment 9 Jakub Jelinek 2011-02-18 17:47:01 UTC

Off by one errors typically would not be accessing in one read some bytes that are part of the allocation and some bytes after it, it would be a whole access to bytes after the allocation.

Comment 10 Kamil Dudka 2011-02-18 18:08:21 UTC

That's true.  I've disabled use of valgrind for the particular failing test-case for now, will look for a more generic solution if the problem spreads over more test-cases.

http://lists.fedoraproject.org/pipermail/scm-commits/2011-February/567665.html

Comment 11 Fedora End Of Life 2013-04-03 18:09:02 UTC

This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19

Comment 12 Stef Walter 2014-01-31 07:59:56 UTC

This also affects p11-kit on both Fedora 19 and Fedora 20.

Running 'make memcheck' target for valgrind on p11-kit exposes this bug with issues like this:

FAIL: test-token
==4047== Invalid read of size 4
==4047==    at 0x40EA07: on_pem_block (parser.c:616)
==4047==    by 0x41079D: p11_pem_parse (pem.c:228)
==4047==    by 0x40DE38: parse_pem_certificates (parser.c:665)
==4047==    by 0x40F120: p11_parse_memory (parser.c:753)
==4047==  Address 0x54afc20 is 16 bytes inside a block of size 18 alloc'd
==4047==    at 0x4A081D4: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==4047==    by 0x40E9D8: on_pem_block (parser.c:612)
==4047==    by 0x41079D: p11_pem_parse (pem.c:228)
==4047==    by 0x40DE38: parse_pem_certificates (parser.c:665)

Comment 13 Fedora End Of Life 2015-01-09 16:34:35 UTC

This message is a notice that Fedora 19 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 19. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained. Approximately 4 (four) weeks from now this bug will
be closed as EOL if it remains open with a Fedora 'version' of '19'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 19 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 14 Fedora End Of Life 2015-02-17 13:38:54 UTC

Fedora 19 changed to end-of-life (EOL) status on 2015-01-06. Fedora 19 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 15 Kamil Dudka 2015-02-17 14:32:35 UTC

The bug still exists in Fedora 21:

$ rpm -q valgrind
valgrind-3.10.1-1.fc21.x86_64

$ curl -JO 'https://bugzilla.redhat.com/attachment.cgi?id=479480'
curl: Saved to filename 'bz678518.c'

$ sh bz678518.c
==543== Memcheck, a memory error detector
==543== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==543== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==543== Command: ./a.out
==543==
==543== Invalid read of size 4
==543==    at 0x3EB8A056EB: idna_to_ascii_4z (idna.c:529)
==543==    by 0x3EB8A05947: idna_to_ascii_8z (idna.c:582)
==543==    by 0x3EB8A059A9: idna_to_ascii_lz (idna.c:614)
==543==    by 0x400685: main (bz678518.c:15)
==543==  Address 0x4c45fcc is 12 bytes inside a block of size 15 alloc'd
==543==    at 0x4A08B1C: realloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==543==    by 0x3EB8A05764: idna_to_ascii_4z (idna.c:530)
==543==    by 0x3EB8A05947: idna_to_ascii_8z (idna.c:582)
==543==    by 0x3EB8A059A9: idna_to_ascii_lz (idna.c:614)
==543==    by 0x400685: main (bz678518.c:15)
==543==
www.xn--4cab6c.se
==543==
==543== HEAP SUMMARY:
==543==     in use at exit: 18 bytes in 1 blocks
==543==   total heap usage: 27 allocs, 26 frees, 35,129 bytes allocated
==543==
==543== LEAK SUMMARY:
==543==    definitely lost: 18 bytes in 1 blocks
==543==    indirectly lost: 0 bytes in 0 blocks
==543==      possibly lost: 0 bytes in 0 blocks
==543==    still reachable: 0 bytes in 0 blocks
==543==         suppressed: 0 bytes in 0 blocks
==543== Rerun with --leak-check=full to see details of leaked memory
==543==
==543== For counts of detected and suppressed errors, rerun with: -v
==543== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

Comment 16 Mark Wielaard 2015-02-17 14:51:32 UTC

Note the issue is really in libidn and the reproducer needs libidn-devel (and libidn-debuginfo) installed.

The workaround is to run with --partial-loads-ok=yes

       --partial-loads-ok=<yes|no> [default: no]
           Controls how Memcheck handles 32-, 64-, 128- and 256-bit naturally
           aligned loads from addresses for which some bytes are addressable
           and others are not. When yes, such loads do not produce an address
           error. Instead, loaded bytes originating from illegal addresses are
           marked as uninitialised, and those corresponding to legal addresses
           are handled in the normal way.

           When no, loads from partially invalid addresses are treated the
           same as loads from completely invalid addresses: an illegal-address
           error is issued, and the resulting bytes are marked as initialised.

           Note that code that behaves in this way is in violation of the ISO
           C/C++ standards, and should be considered broken. If at all
           possible, such code should be fixed. This option should be used
           only as a last resort.

Comment 17 Kamil Dudka 2015-02-17 15:09:05 UTC

(In reply to Mark Wielaard from comment #16)
> Note the issue is really in libidn and the reproducer needs libidn-devel
> (and libidn-debuginfo) installed.

Thanks for the reply!  I am moving the component back to libidn...

$ rpm -aq libidn\* | sort -V
libidn2-0.10-1.fc21.x86_64
libidn2-devel-0.10-1.fc21.x86_64
libidn-1.28-5.fc21.x86_64
libidn-debuginfo-1.28-5.fc21.x86_64
libidn-devel-1.28-5.fc21.x86_64

Comment 18 Miroslav Lichvar 2015-02-17 15:18:29 UTC

I don't see how is this a libidn problem, the code just calls strlen() on a properly allocated buffer. Either it's a bug in valgrind that it reports false positives or gcc generates bad code.

Comment 19 Mark Wielaard 2015-02-17 15:24:03 UTC

(In reply to Miroslav Lichvar from comment #18)
> I don't see how is this a libidn problem, the code just calls strlen() on a
> properly allocated buffer. Either it's a bug in valgrind that it reports
> false positives or gcc generates bad code.

As stated in comment #16 gcc generates code that valgrind can only "prove" correct when using --partial-loads-ok=yes. The code does a multi-byte loads from addresses which are partially valid and partially invalid. See that comment or the valgrind manual for more explanation.

Comment 20 Mark Wielaard 2015-09-23 19:51:05 UTC

Note that valgrind 3.11.0, which was just uploaded for f23 https://bodhi.fedoraproject.org/updates/FEDORA-2015-16525, changed the default for partial-loads-ok to yes. From the release notes:

  - The default value for --partial-loads-ok has been changed from "no" to 
    "yes", so as to avoid false positive errors resulting from some kinds
    of vectorised loops.

Comment 21 Jan Pokorný [poki] 2015-09-23 20:43:47 UTC

re [comment 20]:
This sounds reasonable, thanks for the update, Mark.

Btw. I hit this issue with F22 + strlen standard function in the code:

> valgrind-3.10.1-9.fc22.x86_64
> gcc-5.1.1-4.fc22.x86_64

Comment 22 Jan Pokorný [poki] 2015-09-23 20:45:50 UTC

Actually not sure whether libidn is affected there as well, please adjust
appropriately if needed (incl. the component?).

Comment 23 Fedora End Of Life 2016-07-19 20:23:33 UTC

Fedora 22 changed to end-of-life (EOL) status on 2016-07-19. Fedora 22 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.