2212579 – openexr: Test expectations do not match F16C intrinsics (x86-64-v3 build failure)

Bug 2212579 - openexr: Test expectations do not match F16C intrinsics (x86-64-v3 build failure)

Summary: openexr: Test expectations do not match F16C intrinsics (x86-64-v3 build fail...

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	gcc
Sub Component:
Version:	rawhide
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Jakub Jelinek
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:	https://kojipkgs.fedoraproject.org//w...
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2023-06-05 23:04 UTC by Yaakov Selkowitz
Modified:	2023-08-10 02:20 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2023-08-10 02:20:42 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
.ii file that seems to be miscompiled (313.18 KB, application/x-xz) 2023-06-13 21:12 UTC, Marek Polacek	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Github	AcademySoftwareFoundation openexr issues 1456	None	open	Test suite fails with on x86-64 if F16C intrinsics used	2023-06-16 07:48:35 UTC
Gitlab	CentOS/isa sig-docs issues 5	None	opened	Fix openexr build failure	2023-06-13 16:47:51 UTC
Red Hat Issue Tracker	FC-867	None	None	None	2023-06-22 11:32:56 UTC

Description Yaakov Selkowitz 2023-06-05 23:04:05 UTC

The OpenEXR.testOptimizedInterleavePatterns test defined in openexr-3.1.7/src/test/OpenEXRTest/testOptimizedInterleavePatterns.cpp of the latest openexr package is failing when openexr is compiled with -march=x86-64-v3.  This is causing this package to FTBFS in ELN despite building successfully in rawhide.



Reproducible: Always

Steps to Reproduce:
On x86_64, build openexr from rawhide in ELN tag (e.g. fedpkg --release eln build|mockbuild) *or* build for rawhide with `%global __cflags_arch_x86_64 "-march=x86-64-v3"` defined in openexr.spec.

Actual Results:  
        Start  91: OpenEXR.testOptimizedInterleavePatterns
 80/112 Test  #91: OpenEXR.testOptimizedInterleavePatterns ...Subprocess aborted***Exception:   0.55 sec
tempDir = /var/tmp/OpenEXRTest_RIMFCORH

=======
Running testOptimizedInterleavePatterns
Testing SSE optimisation with different interleave patterns (large images) ... 
 0, 0: RGBHalf read as RGBHalf...                            OK OPTIMISED 
 0, 1: RGBHalf read as RGBAHalf...                           OK OPTIMISED 
 0, 2: RGBHalf read as ABGRHalf...                           OK 
 0, 3: RGBHalf read as RGBFloat...                          

error reading back channel B pixel 21,-76 got -nan expected -nan
OpenEXRTest: /builddir/build/BUILD/openexr-3.1.7/src/test/OpenEXRTest/testOptimizedInterleavePatterns.cpp:233: bool {anonymous}::compare(const Imf_3_1::FrameBuffer&, const Imf_3_1::FrameBuffer&, const Imath_3_1::Box2i&, bool): Assertion `writtenHalf.bits()==readHalf.bits()' failed.


Expected Results:  
Test passes, as in rawhide (which defaults to -mtune=generic).

Comment 1 Yaakov Selkowitz 2023-06-06 14:38:11 UTC

To narrow things down a bit, this test passes with `%global __cflags_arch_x86_64 "-march=x86-64-v2"`, so it is specific to a -v3 optimization.

Comment 2 Yaakov Selkowitz 2023-06-06 15:56:19 UTC

Further narrowing this down, looks like F16C is the culprit; `-march=x86-64-v2 -mf16c` (with or without AVX/AVX2/BMI/BMI2) fails, `-march=x86-64-v3 -mno-f16c` succeeds.

Comment 3 Marek Polacek 2023-06-13 17:57:15 UTC

Reproduced:

99% tests passed, 1 tests failed out of 112

Total Test time (real) =  62.74 sec

The following tests FAILED:
	 91 - OpenEXR.testOptimizedInterleavePatterns (Subprocess aborted)
Errors while running CTest
error: Bad exit status from /var/tmp/rpm-tmp.TSF4gE (%check)

Comment 4 Marek Polacek 2023-06-13 21:12:58 UTC

Created attachment 1970772 [details]
.ii file that seems to be miscompiled

Looks like this file is miscompiled.  I can't bisect it though.

Comment 5 Marek Polacek 2023-06-13 21:16:51 UTC

Compiled with

/usr/bin/g++ -DILM_IMF_TEST_IMAGEDIR=\"/builddir/build/BUILD/openexr-3.1.7/src/test/OpenEXRTest/\" -I/builddir/build/BUILD/openexr-3.1.7/redhat-linux-build/src/test/OpenEXRTest -I/builddir/build/BUILD/openexr-3.1.7/src/test/OpenEXRTest -I/builddir/build/BUILD/openexr-3.1.7/src/lib/OpenEXR -I/builddir/build/BUILD/openexr-3.1.7/redhat-linux-build/cmake -I/builddir/build/BUILD/openexr-3.1.7/src/lib/Iex -I/builddir/build/BUILD/openexr-3.1.7/src/lib/IlmThread -isystem /usr/include/Imath -O2 -fexceptions -Wall -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64 -march=x86-64-v3 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -DNDEBUG -MD -MT src/test/OpenEXRTest/CMakeFiles/OpenEXRTest.dir/testOptimizedInterleavePatterns.cpp.o -MF CMakeFiles/OpenEXRTest.dir/testOptimizedInterleavePatterns.cpp.o.d -o CMakeFiles/OpenEXRTest.dir/testOptimizedInterleavePatterns.cpp.o -c /builddir/build/BUILD/openexr-3.1.7/src/test/OpenEXRTest/testOptimizedInterleavePatterns.cpp

Adding -mno-f16c fixes the problem.

Comment 6 Jakub Jelinek 2023-06-13 21:29:29 UTC

The code is different when -mf16c is enabled, see src/lib/OpenEXRCore/unpack.c.
Or for this case more importantly
imath_float_to_half in /usr/include/Imath/half.h which uses _cvtss_sh if -mf16c but some hand-written bit-twiddling otherwise.
The failure is that readHalf is 0xff26 while writtenHalf is 0xfd26, the former one is a qNaN and the latter corresponding sNaN.
The corresponding word at ptr for readHalf is 0xffe4c000 which is again a qNaN, writtenHalf is read directly as half.

I'd say something is canonicalizing the sNaN somewhere on the way to qNaN, which I believe the F16C instructions
normally do but the hand-written code in half.h doesn't:
        // inf or nan
        if (IMATH_UNLIKELY (ui >= 0x7f800000))
        {
            ret |= 0x7c00;
            if (ui == 0x7f800000)
                return ret;
            m = (ui & 0x7fffff) >> 13;
            // make sure we have at least one bit after shift to preserve nan-ness
            return ret | (uint16_t)m | (uint16_t)(m == 0);
        }

Comment 7 Jakub Jelinek 2023-06-13 21:46:33 UTC

A small program which shows the differences between the f16c intrinsics and the pure C implementation of the <half.h> conversions (left in just the parts related to NaNs):
#include <x86intrin.h>

float
foo (unsigned short x)
{
  return _cvtsh_ss (x);
}

float
bar (unsigned short x)
{
  unsigned int hexpmant = ( (unsigned int)(x) << 17 ) >> 4;
  unsigned int vi = ((unsigned int)(x >> 15)) << 31;

  if (hexpmant >= 0x00800000)
    {
      vi |= hexpmant;
      if (hexpmant < 0x0f800000)
        vi += 0x38000000;
      else
        vi |= 0x7f800000;
    }
  else
    __builtin_abort ();
  float f;
  __builtin_memcpy (&f, &vi, sizeof (float));
  return f;
}

unsigned short
baz (float x)
{
  return _cvtss_sh (x, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC);
}

unsigned short
qux (float x)
{
  unsigned int vi;
  __builtin_memcpy (&vi, &x, sizeof (float));
  unsigned int ui  = (vi & ~0x80000000);
  unsigned int ret = ((vi >> 16) & 0x8000);
  if (ui >= 0x38800000)
    {
      if (ui >= 0x7f800000)
        {
	  ret |= 0x7c00;
	  if (ui == 0x7f800000)
	    return ret;
	  unsigned int m = (ui & 0x7fffff) >> 13;
	  return ret | (unsigned short)m | (unsigned short)(m == 0);
        }
      else
	__builtin_abort ();
    }
  else
    __builtin_abort ();
}

int
main ()
{
  union U { float f; unsigned int u; } x[6];
  x[0].f = foo (0xff26);
  x[1].f = bar (0xff26);
  x[2].f = foo (0xfd26);
  x[3].f = bar (0xfd26);
  __builtin_printf ("%x %x %x %x\n", x[0].u, x[1].u, x[2].u, x[3].u);
  x[4].u = 0xffe4c000;
  x[5].u = 0xffa4c000;
  __builtin_printf ("%x %x %x %x\n", baz (x[4].f), qux (x[4].f), baz (x[5].f), qux (x[5].f));
  return 0;
}
With -O2 -mf16c this prints:

ffe4c000 ffe4c000 ffe4c000 ffa4c000
ff26 ff26 ff26 fd26
for me, which shows that in both directions, the f16c intrinsics convert sNaNs to qNaNs, while the handwritten code does not.

Comment 8 Florian Weimer 2023-06-16 07:48:35 UTC

The half-to-float conversion already goes wrong. I re-ran the test with some instrumentation and reported this upstream: https://github.com/AcademySoftwareFoundation/openexr/issues/1456

At this point, I have no idea whether the test is incorrect, or whether OpenEXR should stop using imath because it does not do what it needs.

Comment 9 Florian Weimer 2023-06-19 07:48:37 UTC

I'm going to switch this package to -mno-f16c on x86-64 until upstream figures out whether the code (i.e., dependency on imath) is wrong or the test.

Comment 10 Florian Weimer 2023-06-19 12:00:46 UTC

(In reply to Florian Weimer from comment #9)
> I'm going to switch this package to -mno-f16c on x86-64 until upstream
> figures out whether the code (i.e., dependency on imath) is wrong or the
> test.

Well, the package is now FTBFS in rawhide, too. I filed two new upstream issues:

SSE2 test inconsistency on i686
<https://github.com/AcademySoftwareFoundation/openexr/issues/1459>

testDWAACompression and testDWABCompression fail on aarch64, i686
<https://github.com/AcademySoftwareFoundation/openexr/issues/1460>

I've pushed my F16C fix nevertheless, but it currently won't build on those other architectures. I don't want to disable the testsuite and push a potentially broken package into rawhide.

Note You need to log in before you can comment on or make changes to this bug.