Bug 1909415

Summary: valgrind makes false complaint about strcmp() on aarch64
Product: [Fedora] Fedora Reporter: Tom Lane <tgl>
Component: valgrindAssignee: Mark Wielaard <mjw>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 33CC: dodji, jakub, jseward, mjw
Target Milestone: ---   
Target Release: ---   
Hardware: aarch64   
OS: Unspecified   
Whiteboard:
Fixed In Version: valgrind-3.16.1-12.fc34 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-12-20 21:18:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
test program none

Description Tom Lane 2020-12-19 19:23:51 UTC
Created attachment 1740495 [details]
test program

Description of problem:
valgrind sometimes complains that strcmp() is accessing uninitialized bytes, even though there is surely nothing wrong with the code as given.

Version-Release number of selected component (if applicable):
valgrind-3.16.1-11.fc33.aarch64

How reproducible:
100%

Steps to Reproduce:
1. gcc -O2 -g strcmp_complaint.c
2. valgrind ./a.out ./global

Actual results:
==50792== Memcheck, a memory error detector
==50792== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==50792== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==50792== Command: ./a.out ./global
==50792== 
==50792== Conditional jump or move depends on uninitialised value(s)
==50792==    at 0x400578: main (strcmp_complaint.c:11)
==50792== 
is not pg_control
==50792== 

Expected results:
No complaint about uninitialised value(s)

Additional info:
Whether the complaint appears or not varies with the input given.

I don't see this with the same test case on x86_64.  (Now, I only tried it with RHEL8's valgrind 3.16.0; but this example is extracted from Postgres code, and we would have noticed if any widely-used valgrind version made the same complaint.)

Comment 1 Mark Wielaard 2020-12-20 13:54:31 UTC
Replicated. The assembly looks as follows:

Dump of assembler code for function main:
   0x0000000000400500 <+0>:	sub	sp, sp, #0x810
   0x0000000000400504 <+4>:	mov	x3, x1
   0x0000000000400508 <+8>:	add	x0, sp, #0x10
   0x000000000040050c <+12>:	mov	x1, #0x800                 	// #2048
   0x0000000000400510 <+16>:	adrp	x2, 0x400000
   0x0000000000400514 <+20>:	add	x2, x2, #0x7b0
   0x0000000000400518 <+24>:	stp	x29, x30, [sp]
   0x000000000040051c <+28>:	mov	x29, sp
   0x0000000000400520 <+32>:	ldr	x3, [x3, #8]
   0x0000000000400524 <+36>:	bl	0x400490 <snprintf@plt>
   0x0000000000400528 <+40>:	mov	x0, #0x2f2e                	// #12078
   0x000000000040052c <+44>:	ldr	x1, [sp, #16]
   0x0000000000400530 <+48>:	movk	x0, #0x6c67, lsl #16
   0x0000000000400534 <+52>:	movk	x0, #0x626f, lsl #32
   0x0000000000400538 <+56>:	movk	x0, #0x6c61, lsl #48
   0x000000000040053c <+60>:	cmp	x1, x0
   0x0000000000400540 <+64>:	b.eq	0x400560 <main+96>  // b.none
   0x0000000000400544 <+68>:	adrp	x0, 0x400000
   0x0000000000400548 <+72>:	add	x0, x0, #0x7e0
   0x000000000040054c <+76>:	bl	0x4004d0 <puts@plt>
   0x0000000000400550 <+80>:	mov	w0, #0x0                   	// #0
   0x0000000000400554 <+84>:	ldp	x29, x30, [sp]
   0x0000000000400558 <+88>:	add	sp, sp, #0x810
   0x000000000040055c <+92>:	ret
   0x0000000000400560 <+96>:	mov	x0, #0x702f                	// #28719
   0x0000000000400564 <+100>:	ldr	x1, [sp, #24]
   0x0000000000400568 <+104>:	movk	x0, #0x5f67, lsl #16
   0x000000000040056c <+108>:	movk	x0, #0x6f63, lsl #32
   0x0000000000400570 <+112>:	movk	x0, #0x746e, lsl #48
   0x0000000000400574 <+116>:	cmp	x1, x0
   0x0000000000400578 <+120>:	b.ne	0x400544 <main+68>  // b.any
   0x000000000040057c <+124>:	ldr	w1, [sp, #32]
   0x0000000000400580 <+128>:	mov	w0, #0x6f72                	// #28530
   0x0000000000400584 <+132>:	movk	w0, #0x6c, lsl #16
   0x0000000000400588 <+136>:	cmp	w1, w0
   0x000000000040058c <+140>:	b.ne	0x400544 <main+68>  // b.any
   0x0000000000400590 <+144>:	adrp	x0, 0x400000
   0x0000000000400594 <+148>:	add	x0, x0, #0x7d0
   0x0000000000400598 <+152>:	bl	0x4004d0 <puts@plt>
   0x000000000040059c <+156>:	b	0x400550 <main+80>
End of assembler dump.

This conditional branch is the issue:
0x0000000000400578 <+120>:	b.ne	0x400544 <main+68>  // b.any

Note that gcc is pretty clever here and has inlined the whole strcmp and tries to compare against the constant string by putting the character in registers and then comparing those agains the input. But the input is 9 chars (including the terminating zero) and so valgrind memcheck believes too much is being compared (some bytes after the input).

Note that the issue goes away when running with --expensive-definedness-checks=yes (which makes memcheck do more precise checks). And it looks like Julian already hit this issue and turned Cmp{EQ,NE}64 into expensive checks by default for arm64:

commit 359b98828ced9cfff8c1badfed75c7ef999cfee5
Author: Julian Seward <jseward>
Date:   Sun Nov 15 18:28:09 2020 +0100

    memcheck: on arm64, use expensive instrumentation for Cmp{EQ,NE}64 by default.

diff --git a/memcheck/mc_translate.c b/memcheck/mc_translate.c
index b32c9c9c5..e91d51094 100644
--- a/memcheck/mc_translate.c
+++ b/memcheck/mc_translate.c
@@ -8485,6 +8485,8 @@ IRSB* MC_(instrument) ( VgCallbackClosure* closure,
 #     elif defined(VGA_ppc64le)
       // Needed by (at least) set_AV_CR6() in the front end.
       mce.dlbo.dl_CmpEQ64_CmpNE64 = DLexpensive;
+#     elif defined(VGA_arm64)
+      mce.dlbo.dl_CmpEQ64_CmpNE64 = DLexpensive;
 #     endif
 
       /* preInstrumentationAnalysis() will allocate &mce.tmpHowUsed and then

And indeed, upstream valgrind from git trunk does not show any issues.

I'll backport this commit to the fedora valgrind package.

Comment 2 Mark Wielaard 2020-12-20 21:18:23 UTC
Applied the upstream fix to the valgrind rawhide package.