Hide Forgot
We need to update the dynamic loader trampoline to optimize for SSE, AVX, and AVX512 usage: f3dcae82d54e5097e18e1d6ef4ff55c2ea4e621e fb0f7a6755c1bfaec38f490fbfcaa39a66ee3604 The last commit is required to avoid the state transition penalties caused by the first commit and as described here: https://sourceware.org/bugzilla/show_bug.cgi?id=20495 We need to do this for rhel-7.4 to avoid any performance issues with newer DTS which can generate the instructions that cause performance problems.
It may make sense to backport this change as well, for additional test coverage: commit 3403a17fea8ccef7dc5f99553a13231acf838744 Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Feb 9 12:19:44 2017 -0800 x86-64: Verify that _dl_runtime_resolve preserves vector registers On x86-64, _dl_runtime_resolve must preserve the first 8 vector registers. Add 3 _dl_runtime_resolve tests to verify that SSE, AVX and AVX512 registers are preserved. However, we would have to replace the intrinsics with inline assembly because our GCC is too old to support this test.
We have a report that upstream commit fb0f7a6755c1bfaec38f490fbfcaa39a66ee3604 introduces a regression: https://sourceware.org/bugzilla/show_bug.cgi?id=21236 My feeling is that it is too risky to include this patch until we know what is going on.
Upstream conclusion is that ICC violates the ps-abi for x86_64 and there is no fault in glibc.
Used DTS 6 on a KNL box with AVX512 to rebuild and run tst-avx512 as a final verification that all the register saves/restores are in place as expected. For a record of what I did: (1) Install rhel-7.4 (2) Install DTS 6. (3) Compiled glibc. (4) Recompiled tst-avx512 with AVX512 support (missing from the system compiler) and run test. #!/bin/bash set -e set -x # CC=gcc CC=/opt/rh/devtoolset-6/root/bin/gcc # GCC_INCLUDE=/usr/lib/gcc/x86_64-redhat-linux/4.8.5/include GCC_INCLUDE=/opt/rh/devtoolset-6/root/usr/lib/gcc/x86_64-redhat-linux/6.2.1/include AVX512_CFLAGS=-mavx512f # Compiler the shared object code with AVX512 support. $CC ../sysdeps/x86_64/tst-avx512mod.c -c -std=gnu99 -fgnu89-inline -DNDEBUG $AVX512_CFLAGS -O3 -Wall -Winline -Wwrite-strings -fasynchronous-unwind-tables -fmerge-all-constants -fno-asynchronous-unwind-tables -frounding-math -g -mtune=generic -Wstrict-prototypes -Werror=implicit-function-declaration -fPIC -fno-tree-loop-distribute-patterns -I../include -I/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/elf -I/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux -I../sysdeps/unix/sysv/linux/x86_64/64/nptl -I../sysdeps/unix/sysv/linux/x86_64/64 -I../nptl/sysdeps/unix/sysv/linux/x86_64 -I../nptl/sysdeps/unix/sysv/linux/x86 -I../sysdeps/unix/sysv/linux/x86 -I../rtkaio/sysdeps/unix/sysv/linux/x86_64 -I../sysdeps/unix/sysv/linux/x86_64 -I../sysdeps/unix/sysv/linux/wordsize-64 -I../ports/sysdeps/unix/sysv/linux -I../nptl/sysdeps/unix/sysv/linux -I../nptl/sysdeps/pthread -I../rtkaio/sysdeps/pthread -I../sysdeps/pthread -I../rtkaio/sysdeps/unix/sysv/linux -I../sysdeps/unix/sysv/linux -I../sysdeps/gnu -I../sysdeps/unix/inet -I../ports/sysdeps/unix/sysv -I../nptl/sysdeps/unix/sysv -I../rtkaio/sysdeps/unix/sysv -I../sysdeps/unix/sysv -I../sysdeps/unix/x86_64 -I../ports/sysdeps/unix -I../nptl/sysdeps/unix -I../rtkaio/sysdeps/unix -I../sysdeps/unix -I../sysdeps/posix -I../nptl/sysdeps/x86_64/64 -I../sysdeps/x86_64/64 -I../sysdeps/x86_64/fpu/multiarch -I../sysdeps/x86_64/fpu -I../sysdeps/x86/fpu -I../sysdeps/x86_64/multiarch -I../nptl/sysdeps/x86_64 -I../sysdeps/x86_64 -I../sysdeps/x86 -I../sysdeps/ieee754/ldbl-96 -I../sysdeps/ieee754/dbl-64/wordsize-64 -I../sysdeps/ieee754/dbl-64 -I../sysdeps/ieee754/flt-32 -I../sysdeps/wordsize-64 -I../sysdeps/ieee754 -I../sysdeps/generic -I../ports -I../nptl -I../rtkaio -I.. -I../libio -I. -nostdinc -isystem $GCC_INCLUDE -isystem /usr/include -D_LIBC_REENTRANT -include /root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/libc-modules.h -DMODULE_NAME=nonlib -include ../include/libc-symbols.h -DPIC -DSHARED -o /root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/elf/tst-avx512mod.os -MD -MP -MF /root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/elf/tst-avx512mod.os.dt -MT /root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/elf/tst-avx512mod.os # Link the shared object used in the test. $CC -shared -static-libgcc -Wl,-dynamic-linker=/lib64/ld-linux-x86-64.so.2 -Wl,-z,defs -B/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/csu/ -Wl,-z,combreloc -Wl,-z,relro -Wl,--hash-style=both -L/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux -L/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/math -L/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/elf -L/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/dlfcn -L/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/nss -L/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/nis -L/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/rt -L/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/resolv -L/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/crypt -L/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/support -L/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/nptl -Wl,-rpath-link=/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux:/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/math:/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/elf:/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/dlfcn:/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/nss:/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/nis:/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/rt:/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/resolv:/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/crypt:/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/support:/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/nptl -o /root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/elf/tst-avx512mod.so -T /root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/shlib.lds /root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/csu/abi-note.o /root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/elf/tst-avx512mod.os -Wl,--start-group /root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/libc.so /root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/libc_nonshared.a -Wl,--as-needed /root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/elf/ld.so -Wl,--no-as-needed -Wl,--end-group # Compile the test object with AVX512 support. $CC ../sysdeps/x86_64/tst-avx512.c -c -std=gnu99 -fgnu89-inline -DNDEBUG $AVX512_CFLAGS -O3 -Wall -Winline -Wwrite-strings -fasynchronous-unwind-tables -fmerge-all-constants -fno-asynchronous-unwind-tables -frounding-math -g -mtune=generic -Wstrict-prototypes -Werror=implicit-function-declaration -fno-tree-loop-distribute-patterns -I../include -I/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/elf -I/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux -I../sysdeps/unix/sysv/linux/x86_64/64/nptl -I../sysdeps/unix/sysv/linux/x86_64/64 -I../nptl/sysdeps/unix/sysv/linux/x86_64 -I../nptl/sysdeps/unix/sysv/linux/x86 -I../sysdeps/unix/sysv/linux/x86 -I../rtkaio/sysdeps/unix/sysv/linux/x86_64 -I../sysdeps/unix/sysv/linux/x86_64 -I../sysdeps/unix/sysv/linux/wordsize-64 -I../ports/sysdeps/unix/sysv/linux -I../nptl/sysdeps/unix/sysv/linux -I../nptl/sysdeps/pthread -I../rtkaio/sysdeps/pthread -I../sysdeps/pthread -I../rtkaio/sysdeps/unix/sysv/linux -I../sysdeps/unix/sysv/linux -I../sysdeps/gnu -I../sysdeps/unix/inet -I../ports/sysdeps/unix/sysv -I../nptl/sysdeps/unix/sysv -I../rtkaio/sysdeps/unix/sysv -I../sysdeps/unix/sysv -I../sysdeps/unix/x86_64 -I../ports/sysdeps/unix -I../nptl/sysdeps/unix -I../rtkaio/sysdeps/unix -I../sysdeps/unix -I../sysdeps/posix -I../nptl/sysdeps/x86_64/64 -I../sysdeps/x86_64/64 -I../sysdeps/x86_64/fpu/multiarch -I../sysdeps/x86_64/fpu -I../sysdeps/x86/fpu -I../sysdeps/x86_64/multiarch -I../nptl/sysdeps/x86_64 -I../sysdeps/x86_64 -I../sysdeps/x86 -I../sysdeps/ieee754/ldbl-96 -I../sysdeps/ieee754/dbl-64/wordsize-64 -I../sysdeps/ieee754/dbl-64 -I../sysdeps/ieee754/flt-32 -I../sysdeps/wordsize-64 -I../sysdeps/ieee754 -I../sysdeps/generic -I../ports -I../nptl -I../rtkaio -I.. -I../libio -I. -nostdinc -isystem $GCC_INCLUDE -isystem /usr/include -D_LIBC_REENTRANT -include /root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/libc-modules.h -DMODULE_NAME=nonlib -include ../include/libc-symbols.h -o /root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/elf/tst-avx512.o -MD -MP -MF /root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/elf/tst-avx512.o.dt -MT /root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/elf/tst-avx512.o # Build the auxiliary object with AVX512 support. $CC ../sysdeps/x86_64/tst-avx512-aux.c -c -std=gnu99 -fgnu89-inline -DNDEBUG $AVX512_CFLAGS -O3 -Wall -Winline -Wwrite-strings -fasynchronous-unwind-tables -fmerge-all-constants -fno-asynchronous-unwind-tables -frounding-math -g -mtune=generic -Wstrict-prototypes -Werror=implicit-function-declaration -fno-tree-loop-distribute-patterns -I../include -I/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/elf -I/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux -I../sysdeps/unix/sysv/linux/x86_64/64/nptl -I../sysdeps/unix/sysv/linux/x86_64/64 -I../nptl/sysdeps/unix/sysv/linux/x86_64 -I../nptl/sysdeps/unix/sysv/linux/x86 -I../sysdeps/unix/sysv/linux/x86 -I../rtkaio/sysdeps/unix/sysv/linux/x86_64 -I../sysdeps/unix/sysv/linux/x86_64 -I../sysdeps/unix/sysv/linux/wordsize-64 -I../ports/sysdeps/unix/sysv/linux -I../nptl/sysdeps/unix/sysv/linux -I../nptl/sysdeps/pthread -I../rtkaio/sysdeps/pthread -I../sysdeps/pthread -I../rtkaio/sysdeps/unix/sysv/linux -I../sysdeps/unix/sysv/linux -I../sysdeps/gnu -I../sysdeps/unix/inet -I../ports/sysdeps/unix/sysv -I../nptl/sysdeps/unix/sysv -I../rtkaio/sysdeps/unix/sysv -I../sysdeps/unix/sysv -I../sysdeps/unix/x86_64 -I../ports/sysdeps/unix -I../nptl/sysdeps/unix -I../rtkaio/sysdeps/unix -I../sysdeps/unix -I../sysdeps/posix -I../nptl/sysdeps/x86_64/64 -I../sysdeps/x86_64/64 -I../sysdeps/x86_64/fpu/multiarch -I../sysdeps/x86_64/fpu -I../sysdeps/x86/fpu -I../sysdeps/x86_64/multiarch -I../nptl/sysdeps/x86_64 -I../sysdeps/x86_64 -I../sysdeps/x86 -I../sysdeps/ieee754/ldbl-96 -I../sysdeps/ieee754/dbl-64/wordsize-64 -I../sysdeps/ieee754/dbl-64 -I../sysdeps/ieee754/flt-32 -I../sysdeps/wordsize-64 -I../sysdeps/ieee754 -I../sysdeps/generic -I../ports -I../nptl -I../rtkaio -I.. -I../libio -I. -nostdinc -isystem $GCC_INCLUDE -isystem /usr/include -D_LIBC_REENTRANT -include /root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/libc-modules.h -DMODULE_NAME=nonlib -include ../include/libc-symbols.h -o /root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/elf/tst-avx512-aux.o -MD -MP -MF /root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/elf/tst-avx512-aux.o.dt -MT /root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/elf/tst-avx512-aux.o # Link the final test binary. $CC -nostdlib -nostartfiles -o /root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/elf/tst-avx512 -Wl,-dynamic-linker=/lib64/ld-linux-x86-64.so.2 -Wl,-z,combreloc -Wl,-z,relro -Wl,--hash-style=both /root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/csu/crt1.o /root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/csu/crti.o `$CC --print-file-name=crtbegin.o` /root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/elf/tst-avx512.o /root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/support/libsupport_nonshared.a /root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/elf/tst-avx512-aux.o /root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/elf/tst-avx512mod.so -Wl,-rpath-link=/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux:/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/math:/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/elf:/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/dlfcn:/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/nss:/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/nis:/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/rt:/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/resolv:/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/crypt:/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/support:/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/nptl /root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/libc.so.6 /root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/libc_nonshared.a -Wl,--as-needed /root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/elf/ld.so -Wl,--no-as-needed -lgcc -Wl,--as-needed -lgcc_s -Wl,--no-as-needed `$CC --print-file-name=crtend.o` /root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/csu/crtn.o # Re-run the test with avx512 support. env GCONV_PATH=/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/iconvdata LOCPATH=/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/localedata LC_ALL=C /root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/elf/ld-linux-x86-64.so.2 --library-path /root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux:/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/math:/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/elf:/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/dlfcn:/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/nss:/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/nis:/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/rt:/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/resolv:/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/crypt:/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/support:/root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/nptl /root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/elf/tst-avx512 > /root/rpmbuild/BUILD/glibc-2.17-c758a686/build-x86_64-redhat-linux/elf/tst-avx512.out QE should IMO take tst-avx512 out of the glibc test framework (easy to extract) and compile it stand-alone with DTS6 and run it under the newly installed glibc to verify the same as I have done above. Reach out to me if you need any help doing that.
Intel just added: commit c15f8eb50cea7ad1a4ccece6e0982bf426d52c00 Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Mar 21 10:59:31 2017 -0700 x86-64: Improve branch predication in _dl_runtime_resolve_avx512_opt [BZ #21258] On Skylake server, _dl_runtime_resolve_avx512_opt is used to preserve the first 8 vector registers. The code layout is if only %xmm0 - %xmm7 registers are used preserve %xmm0 - %xmm7 registers if only %ymm0 - %ymm7 registers are used preserve %ymm0 - %ymm7 registers preserve %zmm0 - %zmm7 registers Branch predication always executes the fallthrough code path to preserve %zmm0 - %zmm7 registers speculatively, even though only %xmm0 - %xmm7 registers are used. This leads to lower CPU frequency on Skylake server. This patch changes the fallthrough code path to preserve %xmm0 - %xmm7 registers instead: if whole %zmm0 - %zmm7 registers are used preserve %zmm0 - %zmm7 registers if only %ymm0 - %ymm7 registers are used preserve %ymm0 - %ymm7 registers preserve %xmm0 - %xmm7 registers Tested on Skylake server. [BZ #21258] * sysdeps/x86_64/dl-trampoline.S (_dl_runtime_resolve_opt): Define only if _dl_runtime_resolve is defined to _dl_runtime_resolve_sse_vex. * sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_opt): Fallthrough to _dl_runtime_resolve_sse_vex. --- To upstream. This is a relatively minor change that would mean rhel-7.4 would be optimally placed for Skylake. Given that I'm going to respin for lock elision we should consider this bug too. I'm going to flip this back to ASSIGNED to make sure we don't miss this.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:1916