Bug 1844910

Summary: valgrind misexecutes pow() on aarch64
Product: [Fedora] Fedora Reporter: Tom Lane <tgl>
Component: valgrindAssignee: Mark Wielaard <mjw>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 32CC: dodji, jakub, jseward, mjw
Target Milestone: ---   
Target Release: ---   
Hardware: aarch64   
OS: Unspecified   
Whiteboard:
Fixed In Version: valgrind-3.16.1-11.fc33 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-12-22 01:29:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
test program
none
new test program none

Description Tom Lane 2020-06-07 23:32:47 UTC
Created attachment 1695943 [details]
test program

Description of problem:
The trivial attached program should print
r = 1008618.4899999999
However, when run under valgrind on aarch64, it prints
r = 1008618.4900000000
Examining the pow() result at the bit level confirms that it's off by one in the last place.

I do not see this misbehavior on x86_64.

Version-Release number of selected component (if applicable):
valgrind-3.16.0-1.fc32.aarch64

How reproducible:
100%

Steps to Reproduce:
1. gcc -Wall tpower.c -lm
2. ./a.out
3. valgrind ./a.out

Actual results:
Different pow() answers.

Expected results:
Same pow() answer.

Additional info:
I can't guess how widespread the wrong answers might be.  This specific test case happens to occur in the Postgres regression tests, which is how come I noticed.

Comment 1 Tom Lane 2020-06-07 23:54:37 UTC
A bit later ... chasing a second failure in the Postgres tests, which I'd initially supposed had a different root cause, I find that I also get different answers from pow(2.0, 55.0).

Comment 2 Mark Wielaard 2020-06-08 11:05:04 UTC
This I cannot replicate, at least with slightly older valgrind, gcc and glibc:

[mark@scw-arm64-centos ~]$ gcc -Wall tpower.c -lm
[mark@scw-arm64-centos ~]$ ./a.out
r = 1008618.4899999999
[mark@scw-arm64-centos ~]$ valgrind ./a.out
==8213== Memcheck, a memory error detector
==8213== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==8213== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==8213== Command: ./a.out
==8213== 
r = 1008618.4899999999
==8213== 
==8213== HEAP SUMMARY:
==8213==     in use at exit: 0 bytes in 0 blocks
==8213==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==8213== 
==8213== All heap blocks were freed -- no leaks are possible
==8213== 
==8213== For counts of detected and suppressed errors, rerun with: -v
==8213== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Note this is valgrind 3.14.0 (and gcc 4.8.5, glibc 2.17).

I'll try to get something newer, but it would be helpful if you could see if there is a difference between the gcc builtin (-O2) and using the glibc pow() function. They might be using different instructions to calculate the result.

Comment 3 Mark Wielaard 2020-06-08 11:10:20 UTC
I build valgrind upstream from git on the machine from comment #2.
The result is always r = 1008618.4899999999
Using either gcc -Wall -O0 tpower.c -lm or gcc -Wall -O2 tpower.c 
And with or without valgrind, either 3.14.0 or latest git.

Comment 4 Mark Wielaard 2020-06-08 11:15:42 UTC
Same for:

#include <math.h>
#include <stdio.h>

int main()
{
  double r = pow(2.0, 55.0);

  printf("r = %.10f\n", r);
  return 0;
}

In all cases it prints: r = 36028797018963968.0000000000

Comment 5 Tom Lane 2020-06-08 14:16:09 UTC
Hmph.  With the program as given, the error goes away for me at -O2 ... but looking at the generated assembly code with -S shows that that is because the compiler has constant-folded the pow() call.  If I change the test program to read x and y from the command line then it still fails the same way at -O2.

As far as I can see from the -S results, if it's not able to const-fold then the compiler generates a call to pow at either -O0 or -O2, so the compiler isn't a factor in this.

The libm I'm using is from F32, glibc-2.31-2.fc32.aarch64, so noticeably newer than yours.

Comment 6 Tom Lane 2020-06-08 14:20:49 UTC
BTW, at this point it might also be appropriate to start documenting the specific hardware ... I'm testing on a Raspberry Pi 3B+, which has a Broadcom BCM2837B0 quad-core A53 (ARMv8) 64-bit @ 1.4GHz SoC.

Comment 7 Tom Lane 2020-06-09 16:42:23 UTC
Created attachment 1696356 [details]
new test program

Comment 8 Tom Lane 2020-06-09 16:47:22 UTC
OK, after some digging in the glibc sources I can refine the complaint to "valgrind misexecutes __builtin_fma()".  See new attached program and do

[tgl@rpi3 ~]$ gcc -Wall  test_fma.c -o test_fma

[tgl@rpi3 ~]$ ./test_fma 0.98076171874999996 1.015625 1.0
x = 3fef626666666666 = 0.98076171874999996
y = 3ff0400000000000 = 1.015625
z = 3ff0000000000000 = 1
r = bf70080000000034 = -0.0039138793945312951

[tgl@rpi3 ~]$ valgrind ./test_fma 0.98076171874999996 1.015625 1.0 2>/dev/null
x = 3fef626666666666 = 0.98076171874999996
y = 3ff0400000000000 = 1.015625
z = 3ff0000000000000 = 1
r = bf70080000000000 = -0.00391387939453125

[tgl@rpi3 ~]$ ./test_fma 55 0.69314718055994529 38.123094930796988
x = 404b800000000000 = 55
y = 3fe62e42fefa39ef = 0.69314718055994529
z = 40430fc1931f09c9 = 38.123094930796988
r = 3ce9000000000000 = 2.7755575615628914e-15

[tgl@rpi3 ~]$ valgrind ./test_fma 55 0.69314718055994529 38.123094930796988 2>/dev/null
x = 404b800000000000 = 55
y = 3fe62e42fefa39ef = 0.69314718055994529
z = 40430fc1931f09c9 = 38.123094930796988
r = 0000000000000000 = 0

The first example seems to be just loss of low-order bits, but the second is a complete fail.

I also observed a case where valgrind produced -0 instead of 0, but I'm having a hard time
getting that to happen in this stripped-down test program, and it may not be a different
bug anyway.

Comment 9 Julian Seward 2020-06-10 06:18:38 UTC
Thanks for the small repro case.  This happens because V translates the FMA
instruction internally into its IR as a multiply followed by an add, so the
thing is split, and you get double rounding and a possible loss of accuracy,
maybe.  See dis_AdvSIMD_fp_data_proc_3_source() in VEX/priv/guest_arm64_toIR.c.

This is handled properly in the POWER port.  The IR does actually contain
all the relevant FMA-style IROps, so that FMAs that come in the front end
are eventually re-emitted as FMAs at the back end, and so the results are
always exactly correct.  Unfortunately I took the easier path and split up
incoming FMAs here.  With a certain amount of hassle (maybe a day's worth)
it should be possible to change the ARM64 port to use the FMA IROps.  Most
of the effort would be in the ARM64 instruction selector and adding an
ARM64Instr case for the FMA insns we want to emit.

Comment 10 Julian Seward 2020-06-10 06:32:13 UTC
One other observation: this ..

> The first example seems to be just loss of low-order bits,
> but the second is a complete fail.

.. makes it sound like there's something qualitatively different in
the second example.  But I don't think that's the case: in both examples,
we're looking an error consistent with the least significant one or perhaps
two mantissa bits being wrong.  I say this because, if we normalise the
error with respect to the 'z' value, we get values in the same ballpark:

  (gdb) p (-0.0039138793945312951 - -0.00391387939453125) / 1
  $1 = -4.5102810375396984e-17

  (gdb) p (2.7755575615628914e-15 - 0) / 38.123094930796988
  $2 = 7.2805147813975409e-17

Hence I'm pretty sure the explanation in comment 9 is correct.

I can indeed reproduce this (just to confirm) -- on a Pi3 B running F32.

Comment 11 Tom Lane 2020-06-10 06:46:01 UTC
(In reply to Julian Seward from comment #10)
> One other observation: this ..
> > The first example seems to be just loss of low-order bits,
> > but the second is a complete fail.
> 
> .. makes it sound like there's something qualitatively different in
> the second example.  But I don't think that's the case

Yeah, I agree --- the knowledge that the FMA is doing a premature rounding
of the product is sufficient to explain these cases.  The -0 vs. 0 case that
I couldn't pin down very likely comes from that as well.

I hope you can find the time to fix this soon.  It's been very instructive
trying to valgrind Postgres on ARM64 ... I hadn't realized that this was
cutting-edge stuff, but having found one Postgres and two Valgrind problems,
evidently it is.  I'd like to go further and run our entire test suite --- but
for the moment I'm blocked because this discrepancy makes our core regression
tests fail.

Comment 12 Julian Seward 2020-06-10 07:36:40 UTC
> I hadn't realized that this was cutting-edge stuff, 

Well, it's not cutting edge; but maybe it is the road less travelled.

---

If anyone wants to chase this before I get round to looking at it, I
suggest grepping the source for these 4:

  Iop_MAddF32  Iop_MSubF32  Iop_MAddF64  Iop_MSubF64

and possibly Iop_MAddF64r32 and Iop_MSubF64r32, although I think not --
they are I think a POWER-specific oddity.

Comment 13 Mark Wielaard 2020-08-31 10:52:34 UTC
I filed the following upstream bug: https://bugs.kde.org/show_bug.cgi?id=426014

Comment 14 Mark Wielaard 2020-10-30 13:15:41 UTC
(In reply to Mark Wielaard from comment #13)
> I filed the following upstream bug:
> https://bugs.kde.org/show_bug.cgi?id=426014

When combining all patches there we get all examples correct:

# ./pow 
r = 1008618.4899999999
# valgrind -q ./pow
r = 1008618.4900000000
# ./vg-in-place -q ./pow
r = 1008618.4899999999

# ./test_fma 0.98076171874999996 1.015625 1.0
x = 3fef626666666666 = 0.98076171874999996
y = 3ff0400000000000 = 1.015625
z = 3ff0000000000000 = 1
r = bf70080000000034 = -0.0039138793945312951
# valgrind -q ./test_fma 0.98076171874999996 1.015625 1.0
x = 3fef626666666666 = 0.98076171874999996
y = 3ff0400000000000 = 1.015625
z = 3ff0000000000000 = 1
r = bf70080000000000 = -0.00391387939453125
# ./vg-in-place -q ./test_fma 0.98076171874999996 1.015625 1.0
x = 3fef626666666666 = 0.98076171874999996
y = 3ff0400000000000 = 1.015625
z = 3ff0000000000000 = 1
r = bf70080000000034 = -0.0039138793945312951

# ./test_fma 55 0.69314718055994529 38.123094930796988
x = 404b800000000000 = 55
y = 3fe62e42fefa39ef = 0.69314718055994529
z = 40430fc1931f09c9 = 38.123094930796988
r = 3ce9000000000000 = 2.7755575615628914e-15
# valgrind -q ./test_fma 55 0.69314718055994529 38.123094930796988
x = 404b800000000000 = 55
y = 3fe62e42fefa39ef = 0.69314718055994529
z = 40430fc1931f09c9 = 38.123094930796988
r = 0000000000000000 = 0
# ./vg-in-place -q ./test_fma 55 0.69314718055994529 38.123094930796988
x = 404b800000000000 = 55
y = 3fe62e42fefa39ef = 0.69314718055994529
z = 40430fc1931f09c9 = 38.123094930796988
r = 3ce9000000000000 = 2.7755575615628914e-15

Where valgrind is valgrind-3.16.1-5.fc33.aarch64 and ./vg-in-place is the just build valgrind from git plus those patches.

So all that is needed now is cleanup and combine those patches and integrate some testcases.

Comment 15 Fedora Update System 2020-12-15 17:59:31 UTC
FEDORA-2020-de20e2cea8 has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2020-de20e2cea8

Comment 16 Fedora Update System 2020-12-16 02:10:58 UTC
FEDORA-2020-de20e2cea8 has been pushed to the Fedora 33 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-de20e2cea8`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-de20e2cea8

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 17 Tom Lane 2020-12-16 19:22:36 UTC
Hmm ... this update fixes my little test program, but the actual application (Postgres) gets somewhat further in its regression tests than it had before, and then valgrind chokes:

==00:00:08:37.506 32442== Memcheck, a memory error detector
==00:00:08:37.506 32442== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==00:00:08:37.506 32442== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==00:00:08:37.506 32442== Command: postmaster -F
==00:00:08:37.506 32442== Parent PID: 32166
==00:00:08:37.506 32442==

MSubF64(t15,t21,t22,t23)

vex: the `impossible' happened:
   expr_is_guardable: unhandled expr
vex storage: T total 3295105024 bytes allocated
vex storage: P total 0 bytes allocated

valgrind: the 'impossible' happened:
   LibVEX called failure_exit().

host stacktrace:
==00:00:10:05.976 32442==    at 0x58043394: ??? (in /usr/libexec/valgrind/memcheck-arm64-linux)
==00:00:10:05.976 32442==    by 0x580434E7: ??? (in /usr/libexec/valgrind/memcheck-arm64-linux)
==00:00:10:05.976 32442==    by 0x58043743: ??? (in /usr/libexec/valgrind/memcheck-arm64-linux)
==00:00:10:05.976 32442==    by 0x58043767: ??? (in /usr/libexec/valgrind/memcheck-arm64-linux)
==00:00:10:05.976 32442==    by 0x5805989B: ??? (in /usr/libexec/valgrind/memcheck-arm64-linux)
==00:00:10:05.976 32442==    by 0x5813355F: ??? (in /usr/libexec/valgrind/memcheck-arm64-linux)
==00:00:10:05.976 32442==    by 0x58152D5B: ??? (in /usr/libexec/valgrind/memcheck-arm64-linux)
==00:00:10:05.976 32442==    by 0x5813061F: ??? (in /usr/libexec/valgrind/memcheck-arm64-linux)
==00:00:10:05.976 32442==    by 0x58130DF7: ??? (in /usr/libexec/valgrind/memcheck-arm64-linux)
==00:00:10:05.976 32442==    by 0x5805C1FB: ??? (in /usr/libexec/valgrind/memcheck-arm64-linux)
==00:00:10:05.976 32442==    by 0x58099733: ??? (in /usr/libexec/valgrind/memcheck-arm64-linux)
==00:00:10:05.976 32442==    by 0x5809C71F: ??? (in /usr/libexec/valgrind/memcheck-arm64-linux)
==00:00:10:05.976 32442==    by 0x580E679F: ??? (in /usr/libexec/valgrind/memcheck-arm64-linux)
==00:00:10:05.976 32442==    by 0xFFFFFFFFFFFFFFFF: ???

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable (lwpid 32442)
==00:00:10:05.977 32442==    at 0x6E9EBC: get_parallel_divisor (costsize.c:5771)
==00:00:10:05.977 32442==    by 0x6E9EBC: cost_append (costsize.c:2166)
==00:00:10:05.978 32442==    by 0x72823B: create_append_path (pathnode.c:1318)
==00:00:10:05.979 32442==    by 0x6E3EE7: add_paths_to_append_rel (allpaths.c:1538)
==00:00:10:05.979 32442==    by 0x6E5637: set_append_rel_pathlist (allpaths.c:1273)
==00:00:10:05.979 32442==    by 0x6E5637: set_rel_pathlist (allpaths.c:485)
==00:00:10:05.979 32442==    by 0x6E5A1F: set_base_rel_pathlists (allpaths.c:357)
==00:00:10:05.979 32442==    by 0x6E5A1F: make_one_rel (allpaths.c:227)
==00:00:10:05.979 32442==    by 0x70793F: query_planner (planmain.c:269)
==00:00:10:05.980 32442==    by 0x70C1BB: grouping_planner (planner.c:2042)
==00:00:10:05.980 32442==    by 0x70E5F7: subquery_planner (planner.c:1009)
==00:00:10:05.980 32442==    by 0x70F8BF: standard_planner (planner.c:402)
==00:00:10:05.980 32442==    by 0x7E51FB: pg_plan_query (postgres.c:875)
==00:00:10:05.980 32442==    by 0x7E534B: pg_plan_queries (postgres.c:966)
==00:00:10:05.980 32442==    by 0x7E569F: exec_simple_query (postgres.c:1158)
==00:00:10:05.981 32442==    by 0x7E717F: PostgresMain (postgres.c:4313)
==00:00:10:05.981 32442==    by 0x755923: BackendRun (postmaster.c:4490)
==00:00:10:05.981 32442==    by 0x755923: BackendStartup (postmaster.c:4212)
==00:00:10:05.981 32442==    by 0x755923: ServerLoop (postmaster.c:1729)
==00:00:10:05.981 32442==    by 0x75685B: PostmasterMain (postmaster.c:1401)
==00:00:10:05.981 32442==    by 0x484153: main (main.c:209)
client stack range: [0x1FFEFF6000 0x1FFF000FFF] client SP: 0x1FFEFFF380
valgrind stack range: [0x10030BC000 0x10031BBFFF] top usage: 14448 of 1048576

Not sure if this is related or not.  get_parallel_divisor() does contain
one statement that looks like it could generate an FMA:

		double		leader_contribution;

		leader_contribution = 1.0 - (0.3 * path->parallel_workers);
		if (leader_contribution > 0)
			parallel_divisor += leader_contribution;

(path->parallel_workers is an int, I'm pretty sure)

Comment 18 Mark Wielaard 2020-12-16 20:10:53 UTC
Hi Tom,

(In reply to Tom Lane from comment #17)
> Hmm ... this update fixes my little test program, but the actual application
> (Postgres) gets somewhat further in its regression tests than it had before,
> and then valgrind chokes:
> 
> ==00:00:08:37.506 32442== Memcheck, a memory error detector
> ==00:00:08:37.506 32442== Copyright (C) 2002-2017, and GNU GPL'd, by Julian
> Seward et al.
> ==00:00:08:37.506 32442== Using Valgrind-3.16.1 and LibVEX; rerun with -h
> for copyright info
> ==00:00:08:37.506 32442== Command: postmaster -F
> ==00:00:08:37.506 32442== Parent PID: 32166
> ==00:00:08:37.506 32442==
> 
> MSubF64(t15,t21,t22,t23)

This is indeed a new Iop_MSubF64 (aka msub on doubles)

> vex: the `impossible' happened:
>    expr_is_guardable: unhandled expr
> vex storage: T total 3295105024 bytes allocated
> vex storage: P total 0 bytes allocated
> 
> valgrind: the 'impossible' happened:
>    LibVEX called failure_exit().

This is:

// Is it possible to guard |e|?  Meaning, is it safe (exception-free) to compute
// |e| and ignore the result?  Since |e| is by definition otherwise
// side-effect-free, we don't have to ask about any other effects caused by
// first computing |e| and then ignoring the result.
static Bool expr_is_guardable ( const IRExpr* e )
{
   switch (e->tag) {
      case Iex_Load:
         return False;
      case Iex_Unop:
         return !primopMightTrap(e->Iex.Unop.op);
      case Iex_Binop:
         return !primopMightTrap(e->Iex.Binop.op);
      case Iex_Triop:
         return !primopMightTrap(e->Iex.Triop.details->op);
      case Iex_ITE:
      case Iex_CCall:
      case Iex_Get:
      case Iex_GetI:
      case Iex_Const:
      case Iex_RdTmp:
         return True;
      default:
         vex_printf("\n"); ppIRExpr(e); vex_printf("\n");
         vpanic("expr_is_guardable: unhandled expr");
   }
}

Which does indeed not handle Iex_Qop (the tag value of a IRExpr_Qop)

I think the correct clause would be:

      case Iex_Triop:
         return !primopMightTrap(e->Iex.Qop.details->op);

> host stacktrace:
> ==00:00:10:05.976 32442==    at 0x58043394: ??? (in
> /usr/libexec/valgrind/memcheck-arm64-linux)
> ==00:00:10:05.976 32442==    by 0x580434E7: ??? (in
> /usr/libexec/valgrind/memcheck-arm64-linux)
> ==00:00:10:05.976 32442==    by 0x58043743: ??? (in
> /usr/libexec/valgrind/memcheck-arm64-linux)
> ==00:00:10:05.976 32442==    by 0x58043767: ??? (in
> /usr/libexec/valgrind/memcheck-arm64-linux)
> ==00:00:10:05.976 32442==    by 0x5805989B: ??? (in
> /usr/libexec/valgrind/memcheck-arm64-linux)
> ==00:00:10:05.976 32442==    by 0x5813355F: ??? (in
> /usr/libexec/valgrind/memcheck-arm64-linux)
> ==00:00:10:05.976 32442==    by 0x58152D5B: ??? (in
> /usr/libexec/valgrind/memcheck-arm64-linux)
> ==00:00:10:05.976 32442==    by 0x5813061F: ??? (in
> /usr/libexec/valgrind/memcheck-arm64-linux)
> ==00:00:10:05.976 32442==    by 0x58130DF7: ??? (in
> /usr/libexec/valgrind/memcheck-arm64-linux)
> ==00:00:10:05.976 32442==    by 0x5805C1FB: ??? (in
> /usr/libexec/valgrind/memcheck-arm64-linux)
> ==00:00:10:05.976 32442==    by 0x58099733: ??? (in
> /usr/libexec/valgrind/memcheck-arm64-linux)
> ==00:00:10:05.976 32442==    by 0x5809C71F: ??? (in
> /usr/libexec/valgrind/memcheck-arm64-linux)
> ==00:00:10:05.976 32442==    by 0x580E679F: ??? (in
> /usr/libexec/valgrind/memcheck-arm64-linux)
> ==00:00:10:05.976 32442==    by 0xFFFFFFFFFFFFFFFF: ???

This should look somewhat more useful with valgrind-debuginfo installed.
But isn't really necessary because we already seem to know what/where the issue is.

> sched status:
>   running_tid=1
> 
> Thread 1: status = VgTs_Runnable (lwpid 32442)
> ==00:00:10:05.977 32442==    at 0x6E9EBC: get_parallel_divisor
> (costsize.c:5771)
> ==00:00:10:05.977 32442==    by 0x6E9EBC: cost_append (costsize.c:2166)
> ==00:00:10:05.978 32442==    by 0x72823B: create_append_path
> (pathnode.c:1318)
> ==00:00:10:05.979 32442==    by 0x6E3EE7: add_paths_to_append_rel
> (allpaths.c:1538)
> ==00:00:10:05.979 32442==    by 0x6E5637: set_append_rel_pathlist
> (allpaths.c:1273)
> ==00:00:10:05.979 32442==    by 0x6E5637: set_rel_pathlist (allpaths.c:485)
> ==00:00:10:05.979 32442==    by 0x6E5A1F: set_base_rel_pathlists
> (allpaths.c:357)
> ==00:00:10:05.979 32442==    by 0x6E5A1F: make_one_rel (allpaths.c:227)
> ==00:00:10:05.979 32442==    by 0x70793F: query_planner (planmain.c:269)
> ==00:00:10:05.980 32442==    by 0x70C1BB: grouping_planner (planner.c:2042)
> ==00:00:10:05.980 32442==    by 0x70E5F7: subquery_planner (planner.c:1009)
> ==00:00:10:05.980 32442==    by 0x70F8BF: standard_planner (planner.c:402)
> ==00:00:10:05.980 32442==    by 0x7E51FB: pg_plan_query (postgres.c:875)
> ==00:00:10:05.980 32442==    by 0x7E534B: pg_plan_queries (postgres.c:966)
> ==00:00:10:05.980 32442==    by 0x7E569F: exec_simple_query (postgres.c:1158)
> ==00:00:10:05.981 32442==    by 0x7E717F: PostgresMain (postgres.c:4313)
> ==00:00:10:05.981 32442==    by 0x755923: BackendRun (postmaster.c:4490)
> ==00:00:10:05.981 32442==    by 0x755923: BackendStartup (postmaster.c:4212)
> ==00:00:10:05.981 32442==    by 0x755923: ServerLoop (postmaster.c:1729)
> ==00:00:10:05.981 32442==    by 0x75685B: PostmasterMain (postmaster.c:1401)
> ==00:00:10:05.981 32442==    by 0x484153: main (main.c:209)
> client stack range: [0x1FFEFF6000 0x1FFF000FFF] client SP: 0x1FFEFFF380
> valgrind stack range: [0x10030BC000 0x10031BBFFF] top usage: 14448 of 1048576
> 
> Not sure if this is related or not.  get_parallel_divisor() does contain
> one statement that looks like it could generate an FMA:
> 
> 		double		leader_contribution;
> 
> 		leader_contribution = 1.0 - (0.3 * path->parallel_workers);
> 		if (leader_contribution > 0)
> 			parallel_divisor += leader_contribution;
> 
> (path->parallel_workers is an int, I'm pretty sure)

Yes, this seems like something that the compiler would turn into a fmsub.

I am going to file a separate bug upstream. This isn't really my expertise and I am slightly surprised this hasn't come up with other architectures which already do handle scaler fma instructions.

Comment 19 Tom Lane 2020-12-16 20:32:11 UTC
(In reply to Mark Wielaard from comment #18)
> I am going to file a separate bug upstream. This isn't really my expertise
> and I am slightly surprised this hasn't come up with other architectures
> which already do handle scaler fma instructions.

Makes sense.  I agree it's not obvious why this wouldn't trigger on any
arch with FMA-type instructions.  (On the other hand, maybe it does?
I don't know how many people have tried to valgrind Postgres on PPC
either.)

Comment 20 Tom Lane 2020-12-16 20:50:16 UTC
Just for completeness, I did the debuginfo bit and here's a better stack trace:

host stacktrace:
==00:00:09:40.815 35392==    at 0x58043394: show_sched_status_wrk (m_libcassert.c:406)
==00:00:09:40.816 35392==    by 0x580434E7: report_and_quit (m_libcassert.c:477)
==00:00:09:40.816 35392==    by 0x58043743: UnknownInlinedFun (m_libcassert.c:553)
==00:00:09:40.816 35392==    by 0x58043743: vgPlain_core_panic_at (m_libcassert.c:558)
==00:00:09:40.816 35392==    by 0x58043767: vgPlain_core_panic (m_libcassert.c:563)
==00:00:09:40.816 35392==    by 0x5805989B: failure_exit (m_translate.c:761)
==00:00:09:40.816 35392==    by 0x5813355F: vpanic (main_util.c:253)
==00:00:09:40.817 35392==    by 0x58152D5B: UnknownInlinedFun (guest_generic_bb_to_IR.c:434)
==00:00:09:40.817 35392==    by 0x58152D5B: UnknownInlinedFun (guest_generic_bb_to_IR.c:473)
==00:00:09:40.817 35392==    by 0x58152D5B: block_is_guardable (guest_generic_bb_to_IR.c:490)
==00:00:09:40.817 35392==    by 0x58152D5B: bb_to_IR (guest_generic_bb_to_IR.c:1619)
==00:00:09:40.817 35392==    by 0x5813061F: LibVEX_FrontEnd (main_main.c:583)
==00:00:09:40.817 35392==    by 0x58130DF7: LibVEX_Translate (main_main.c:1235)
==00:00:09:40.817 35392==    by 0x5805C1FB: vgPlain_translate (m_translate.c:1828)
==00:00:09:40.818 35392==    by 0x58099733: handle_chain_me (scheduler.c:1166)
==00:00:09:40.818 35392==    by 0x5809C71F: vgPlain_scheduler (scheduler.c:1511)
==00:00:09:40.818 35392==    by 0x580E679F: thread_wrapper (syswrap-linux.c:101)
==00:00:09:40.818 35392==    by 0x580E679F: run_a_thread_NORETURN (syswrap-linux.c:154)
==00:00:09:40.820 35392==    by 0xFFFFFFFFFFFFFFFF: ???

Not sure if I can help any further --- I tried a little bit to make a self-contained
test case, but failed.

Comment 21 Mark Wielaard 2020-12-16 22:55:41 UTC
(In reply to Tom Lane from comment #20)
> Not sure if I can help any further --- I tried a little bit to make a
> self-contained test case, but failed.

I filed https://bugs.kde.org/show_bug.cgi?id=430485 upstream.

If you could try a scratch build with the proposed patch in it on your regression tests that would be helpful.
https://koji.fedoraproject.org/koji/taskinfo?taskID=57602404

Comment 22 Tom Lane 2020-12-17 05:11:26 UTC
OK, I rebuilt valgrind for Fedora 33 using the SRPM from that koji job, and it resolves the expr_is_guardable problem.

That gets me much further, but now I'm running into something that looks a heck of a lot like an infinite loop.
(Do you have any suggestions about how to narrow that down?  When I attach to the process with gdb, I get info
about what is happening in valgrind logic, and there isn't any obvious way to correlate that to the Postgres code
that it's executing.)  Anyway, that's likely unrelated to FMA, so I expect it should be a new bug whenever I can
isolate it enough to report.

Comment 23 Mark Wielaard 2020-12-17 11:27:36 UTC
(In reply to Tom Lane from comment #22)
> OK, I rebuilt valgrind for Fedora 33 using the SRPM from that koji job, and
> it resolves the expr_is_guardable problem.
> 
> That gets me much further

Nice. I'll cancel the f33 update and do a new one with this patch included.

> but now I'm running into something that looks a
> heck of a lot like an infinite loop.
> (Do you have any suggestions about how to narrow that down?  When I attach
> to the process with gdb, I get info
> about what is happening in valgrind logic, and there isn't any obvious way
> to correlate that to the Postgres code
> that it's executing.)  Anyway, that's likely unrelated to FMA, so I expect
> it should be a new bug whenever I can
> isolate it enough to report.

You are in luck, valgrind comes with its own gdbserver, so you can observe the process running under valgrind:
https://www.valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.gdbserver

If things look odd you might want to use --vgdb=full instead of --vgdb=yes but note:

" If you use the option --vgdb=full, then GDB "stop-at" commands will be obeyed precisely. The downside is that this requires each instruction to be instrumented with an additional call to a gdbserver helper function, which gives considerable overhead (+500% for memcheck) compared to --vgdb=no. Option --vgdb=yes has neglectible overhead compared to --vgdb=no. "

Comment 24 Mark Wielaard 2020-12-17 11:31:25 UTC
It would also be useful to know whether things behave differently without any instrumentation, so using valgrind --tool=none (the default is memcheck).

Comment 25 Fedora Update System 2020-12-17 14:50:32 UTC
FEDORA-2020-e54372ec4a has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2020-e54372ec4a

Comment 26 Tom Lane 2020-12-17 20:31:41 UTC
Ah, sorry, it's now looking like that was a false alarm.  The particular test that I thought was in an infinite loop seems in reality to just be affected by memcheck's overhead a lot more than other test cases.

Let's just push forward with the FMA fix.

Comment 27 Fedora Update System 2020-12-18 01:14:30 UTC
FEDORA-2020-e54372ec4a has been pushed to the Fedora 33 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-e54372ec4a`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-e54372ec4a

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 28 Fedora Update System 2020-12-22 01:29:02 UTC
FEDORA-2020-e54372ec4a has been pushed to the Fedora 33 stable repository.
If problem still persists, please make note of it in this bug report.