When attempting to run any version of NodeJS from Fedora 40, it prints out Illegal Instruction (core dumped). This is because AVX instructions were used, and my CPU does not support AVX. This is very unexpected, as the AVX instructions are not guarded by a cpuid check, and were generated by GCC for an uint32_t copy. THIS IS VERY DEPENDENT ON HAVING HARDWARE THAT DOES NOT SUPPORT AVX. Reproducible: Always Steps to Reproduce: 1. Install NodeJS using DNF 2. Run NodeJS Actual Results: Illegal Instruction (core dumped) Expected Results: NodeJS runs Reported upstream here https://github.com/nodejs/node/issues/52371, but I decided to close it after I determined that it must be with a problem with Fedora. Orginally I assumed this was a NodeJS bug, but I am assuming this may be a GCC bug, because it happens with the Fedora 40 package for NodeJS 18, but not the Fedora 39 package for NodeJS 18.
Created attachment 2025313 [details] The bad instructions generated.
Created attachment 2025314 [details] The location of the crash. (NodeJS 18)
*** Bug 2271955 has been marked as a duplicate of this bug. ***
*** Bug 2269972 has been marked as a duplicate of this bug. ***
Specifically, it seems to be related to the built-in simdutf (https://github.com/simdutf/simdutf) dependency, which was added in 18.15+ That library uses processor extensions to accelerate unicode routines. So I definitely suspect some sort of issue with GCC and simdutf interaction. I'm going to move this to GCC for now, as I don't really have any idea how to dig into this futher. From the F41 build logs of 20.12.1, here's the compilation line for simdutf: [1095/2761] g++ -MMD -MF obj/deps/simdutf/simdutf.simdutf.o.d -D_GLIBCXX_USE_CXX11_ABI=1 -DNODE_OPENSSL_CONF_NAME=openssl_conf -DNODE_OPENSSL_CERT_STORE -DOPENSSL_FIPS -DICU_NO_USER_DATA_OVERRIDE -DNODE_SHARED_BUILTIN_CJS_MODULE_LEXER_LEXER_PATH=/usr/lib/node_modules/cjs-module-lexer/lexer.js -DNODE_SHARED_BUILTIN_CJS_MODULE_LEXER_DIST_LEXER_PATH=/usr/lib/node_modules/cjs-module-lexer/dist/lexer.js -DNODE_SHARED_BUILTIN_UNDICI_UNDICI_PATH=/usr/lib/node_modules/undici/loader.js -D__STDC_FORMAT_MACROS -I../../deps/simdutf -pthread -Wall -Wextra -Wno-unused-parameter -fPIC -m64 -O3 -flto=4 -fuse-linker-plugin -ffat-lto-objects -fno-omit-frame-pointer -O2 -fexceptions -g1 -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -march=x86-64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -mtls-dialect=gnu2 -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -DZLIB_CONST -fno-delete-null-pointer-checks -O3 -fno-ipa-icf -fno-rtti -fno-exceptions -std=gnu++17 -c ../../deps/simdutf/simdutf.cpp -o obj/deps/simdutf/simdutf.simdutf.o ../../deps/simdutf/simdutf.cpp:1745:38: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor] 1745 | simdutf_really_inline simd8<bool>() : base8() {} | ^ ../../deps/simdutf/simdutf.cpp:1745:38: note: remove the ‘< >’ ../../deps/simdutf/simdutf.cpp:1746:38: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor] 1746 | simdutf_really_inline simd8<bool>(const __m256i _value) : base8<bool>(_value) {} | ^ ../../deps/simdutf/simdutf.cpp:1746:38: note: remove the ‘< >’ ../../deps/simdutf/simdutf.cpp:1748:38: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor] 1748 | simdutf_really_inline simd8<bool>(bool _value) : base8<bool>(splat(_value)) {} | ^ ../../deps/simdutf/simdutf.cpp:1748:38: note: remove the ‘< >’ ../../deps/simdutf/simdutf.cpp:2096:37: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor] 2096 | simdutf_really_inline simd16<bool>() : base16() {} | ^ ../../deps/simdutf/simdutf.cpp:2096:37: note: remove the ‘< >’ ../../deps/simdutf/simdutf.cpp:2097:37: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor] 2097 | simdutf_really_inline simd16<bool>(const __m256i _value) : base16<bool>(_value) {} | ^ ../../deps/simdutf/simdutf.cpp:2097:37: note: remove the ‘< >’ ../../deps/simdutf/simdutf.cpp:2099:37: warning: template-id not allowed for constructor in C++20 [-Wtemplate-id-cdtor] 2099 | simdutf_really_inline simd16<bool>(bool _value) : base16<bool>(splat(_value)) {} | ^ ../../deps/simdutf/simdutf.cpp:2099:37: note: remove the ‘< >’ And the linker: [2419/2761] g++ -MMD -MF obj/deps/googletest/src/gtest.gtest-assertion-result.o.d -D_GLIBCXX_USE_CXX11_ABI=1 -DNODE_OPENSSL_CONF_NAME=openssl_conf -DNODE_OPENSSL_CERT_STORE -DOPENSSL_FIPS -DICU_NO_USER_DATA_OVERRIDE -DNODE_SHARED_BUILTIN_CJS_MODULE_LEXER_LEXER_PATH=/usr/lib/node_modules/cjs-module-lexer/lexer.js -DNODE_SHARED_BUILTIN_CJS_MODULE_LEXER_DIST_LEXER_PATH=/usr/lib/node_modules/cjs-module-lexer/dist/lexer.js -DNODE_SHARED_BUILTIN_UNDICI_UNDICI_PATH=/usr/lib/node_modules/undici/loader.js -D__STDC_FORMAT_MACROS -DGTEST_HAS_POSIX_RE=0 -DGTEST_LANG_CXX11=1 -I../../deps/googletest -I../../deps/googletest/include -pthread -Wall -Wextra -Wno-unused-parameter -fPIC -m64 -O3 -flto=4 -fuse-linker-plugin -ffat-lto-objects -fno-omit-frame-pointer -O2 -fexceptions -g1 -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -march=x86-64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -mtls-dialect=gnu2 -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -DZLIB_CONST -fno-delete-null-pointer-checks -O3 -fno-ipa-icf -fno-rtti -fno-exceptions -std=gnu++17 -c ../../deps/googletest/src/gtest-assertion-result.cc -o obj/deps/googletest/src/gtest.gtest-assertion-result.o [2420/2761] g++ -Wl,-z,relro -Wl,--as-needed -Wl,-z,pack-relative-relocs -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld-errors -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -Wl,--build-id=sha1 -specs=/usr/lib/rpm/redhat/redhat-package-notes -pthread -rdynamic -m64 -flto=4 -fuse-linker-plugin -ffat-lto-objects -o node_js2c -Wl,--start-group obj/tools/node_js2c.js2c.o obj/src/node_js2c.embedded_data.o obj/deps/simdutf/libsimdutf.a -lz -luv -lbrotlidec -lbrotlienc -lcrypto -lssl -Wl,--end-group
Moving to GCC as my guess here is that this is a code-generation bug.
I think the critical question is doing to be what https://github.com/simdutf/simdutf/blob/6224a24f9a00bced2be4119e40873ae71cc301b7/src/implementation.cpp#L861 is doing - is it selecting an implementation that it not appropriate for the processor or is the implementation miscompiled.
I don't think it is that line. I think this is the line at fault. https://github.com/simdutf/simdutf/blob/6224a24f9a00bced2be4119e40873ae71cc301b7/src/implementation.cpp#L850 Here is a more complete stack trace (from NodeJS 20) to show why I think it is that line. I don't think we have reached main when node crashes. And I doubt that `node --version` would call into the funtion you posted, and it also crashes. Program received signal SIGILL, Illegal instruction. simdutf::implementation::implementation () at ../../deps/simdutf/simdutf.h:3296 3296 _required_instruction_sets(required_instruction_sets) (gdb) bt #0 simdutf::implementation::implementation () at ../../deps/simdutf/simdutf.h:3296 #1 simdutf::internal::unsupported_implementation::unsupported_implementation() [clone .constprop.0] () at ../../deps/simdutf/simdutf.cpp:5252 #2 0x00007fbe0a81c96e in _sub_I_65535_0.0 () from /lib64/libnode.so.115 #3 0x00007fbe0da72277 in call_init (env=0x7ffe8cd2fd58, argv=0x7ffe8cd2fd48, argc=1, l=<optimized out>) at dl-init.c:74 #4 call_init (l=<optimized out>, argc=1, argv=0x7ffe8cd2fd48, env=0x7ffe8cd2fd58) at dl-init.c:26 #5 0x00007fbe0da7236d in _dl_init (main_map=0x7fbe0daa22e0, argc=1, argv=0x7ffe8cd2fd48, env=0x7ffe8cd2fd58) at dl-init.c:121 #6 0x00007fbe0da893d0 in _dl_start_user () from /lib64/ld-linux-x86-64.so.2 #7 0x0000000000000001 in ?? () #8 0x00007ffe8cd30782 in ?? () #9 0x0000000000000000 in ?? ()
I don't have any non-AVX machines around, can somebody reproduce it with say qemu or valgrind if it is possible to disable some ISAs for it? Anyway, #2 0x00007fbe0a81c96e in _sub_I_65535_0.0 () from /lib64/libnode.so.115 suggests that this is most likely the dynamic initializer of const unsupported_implementation unsupported_singleton{}; Can you reproduce the problem in non-LTO build (i.e. when rebuilt with %global _lto_cflags %{nil} early in the spec file? If yes, guess we want to look at how the corresponding preprocessed simdutf.cpp (but in the github repo it is implementation.cpp it seems) looks like vs. target pragmas/attributes and with what flags it is compiled, plus what the non-virtual backtrace looks like (whether simdutf::implementation::implementation() has been inlined into simdutf::internal::unsupported_implementation::unsupported_implementation() or not, and whether that has been inlined into the global ctors function for the TU or not. And if it has not been inlined, if there are e.g. multiple versions of the inline function, each with different ISA compilation options/target pragmas/target attributes. If that is the case, that would be a bug in the library.
I wasn't saying that was the line where it was faulting, just that the code in that function is what chooses which simdutf implementation is appropriate for your processor and we need to know whether it is making the right decision. That said your new information is very interesting as it shows that we are actually very close to that code when it crashes, and the reason it happens at startup is that it runs from the constructor for a global object so that it runs before main is entered. Oddly that looks like it is crashing on https://github.com/simdutf/simdutf/blob/6224a24f9a00bced2be4119e40873ae71cc301b7/src/implementation.cpp#L850 when it constructs the unsupported_implementation singleton but that doesn't make much sense as that shouldn't be executing any odd instructions.
Maybe booting with the noxsave kernel option reproduces it?
(In reply to Jakub Jelinek from comment #9) > I don't have any non-AVX machines around, can somebody reproduce it with say > qemu or valgrind if it is possible to disable some ISAs for it? > Anyway, > #2 0x00007fbe0a81c96e in _sub_I_65535_0.0 () from /lib64/libnode.so.115 > suggests that this is most likely the dynamic initializer of > const unsupported_implementation unsupported_singleton{}; > > Can you reproduce the problem in non-LTO build (i.e. when rebuilt with > %global _lto_cflags %{nil} > early in the spec file? It is already in the NodeJS spec file. So yes? > If yes, guess we want to look at how the corresponding preprocessed > simdutf.cpp (but in the github repo it is implementation.cpp it seems) looks > like vs. target pragmas/attributes and with what flags it is compiled, plus > what the non-virtual backtrace looks like (whether > simdutf::implementation::implementation() has been inlined into > simdutf::internal::unsupported_implementation::unsupported_implementation() > or not, and whether that has been inlined into the global ctors function for > the TU or not. It has been inlined. Infact, unsupported_implementation is the only visable ctors, because the rest where inlined into simdutf::internal::get_available_implementation_pointers() > And if it has not been inlined, if there are e.g. multiple versions of the > inline function, each with different ISA compilation options/target > pragmas/target attributes. > If that is the case, that would be a bug in the library. I will add a file with the GDB disassembly of simdutf::internal::get_available_implementation_pointers(), but I can also see the use of AVX instructions in it, so it appears to be generating the same instructions for each inlined copy of simdutf::implementation::implementation()
Created attachment 2025408 [details] Disassembly of simdutf::internal::get_available_implementation_pointers()
This looks like a bug in the source to me. grep -C1 GCC.[tp] simdutf.simdutf.ii after the intrinsic headers looks like: # 1361 "../../deps/simdutf/simdutf.cpp" #pragma GCC push_options # 1361 "../../deps/simdutf/simdutf.cpp" -- # 1361 "../../deps/simdutf/simdutf.cpp" #pragma GCC target("avx512f,avx512dq,avx512cd,avx512bw,avx512vbmi,avx512vbmi2,avx512vl,avx2,bmi,bmi2,pclmul,lzcnt,popcnt,avx512vpopcntdq") -- # 1398 "../../deps/simdutf/simdutf.cpp" #pragma GCC pop_options -- # 1632 "../../deps/simdutf/simdutf.cpp" #pragma GCC push_options # 1632 "../../deps/simdutf/simdutf.cpp" -- # 1632 "../../deps/simdutf/simdutf.cpp" #pragma GCC target("avx2,bmi,lzcnt,popcnt") -- # 17777 "../../deps/simdutf/simdutf.cpp" #pragma GCC push_options # 17777 "../../deps/simdutf/simdutf.cpp" -- # 17777 "../../deps/simdutf/simdutf.cpp" #pragma GCC target("avx512f,avx512dq,avx512cd,avx512bw,avx512vbmi,avx512vbmi2,avx512vl,avx2,bmi,bmi2,pclmul,lzcnt,popcnt,avx512vpopcntdq") -- # 21982 "../../deps/simdutf/simdutf.cpp" #pragma GCC pop_options So, on line 1361 it saved options, on next line switched to avx512f+..., then on line 1398 restored the previous state (i.e. -march=x86-64 passed on the command line), then on line 1632 saves options again and on next line switches to avx2,bmi,lzcnt,popcnt (but never restores the previous state, and finally on line 17777 again switches to avx512f+ temporarily and on line 21982 back. The bug is clear: grep CAN_ALWAYS.*__ simdutf/src/simdutf/*.h simdutf/src/simdutf/haswell.h:#define SIMDUTF_CAN_ALWAYS_RUN_HASWELL ((SIMDUTF_IMPLEMENTATION_HASWELL) && (SIMDUTF_IS_X86_64) && (__AVX2__)) simdutf/src/simdutf/icelake.h:#define SIMDUTF_CAN_ALWAYS_RUN_ICELAKE ((SIMDUTF_IMPLEMENTATION_ICELAKE) && (SIMDUTF_IS_X86_64) && (__AVX2__) && (SIMDUTF_HAS_AVX512F && \ simdutf/src/simdutf/westmere.h:#define SIMDUTF_CAN_ALWAYS_RUN_WESTMERE (SIMDUTF_IMPLEMENTATION_WESTMERE && SIMDUTF_IS_X86_64 && __SSE4_2__) Using __AVX2__ or __SSE4_2__ or other predefined macros in the SIMDUTF_CAN_ALWAYS_RUN_* macros is wrong. Consider compiling with -O2 -mno-avx: #ifdef __AVX2__ int i = __AVX2__; #endif #pragma GCC push_options #pragma GCC target ("avx2") #ifdef __AVX2__ int j = __AVX2__; #endif #pragma GCC pop_options #ifdef __AVX2__ int k = __AVX2__; #endif GCC already starting with 4.7 when pragma GCC target started to be implemented compiles this as int j = 1; and no i/k vars, though before https://gcc.gnu.org/r14-4967 only when using the integrated preprocessor together with the compilation (i.e. no -save-temps). Apparently clang implemented the GCC extension but never bothered to implement the changing of the predefined macros when switching ISAs. Anyway, this means that if you have #define SIMDUTF_CAN_ALWAYS_RUN_AVX2 __AVX2__ #if SIMDUTF_CAN_ALWAYS_RUN_AVX2 #else #pragma GCC push_options #pragma GCC target ("avx2") #endif ... #if SIMDUTF_CAN_ALWAYS_RUN_AVX2 #else #pragma GCC pop_options #endif it will not behave correctly, because the first SIMDUTF_CAN_ALWAYS_RUN_AVX2 will evaluate to 0 but the second one to 1.
I've filed https://github.com/simdutf/simdutf/issues/391
Hey, I believe this is still a gcc bug. If I understand everything correctly all the F40 packages are recompiled with gcc 14. And regarding node, there is a cli program called n (https://www.npmjs.com/package/n) that is a version manager, so that you can have many versions of node installed on one system at the same time. The program itself downloads the tarball from nodejs' official website, which works perfectly. I don't know what version is used to compile these tarballs, but I guess it is older than 14.
I accept that there is an upstream bug. But I think an explanation of why the upstream bug only affect Fedora 40 is required before we just say this isn't a GCC bug. Because the upstream bug should affect Fedora 39, but Fedora 39 is complete unaffected by it.
(In reply to seda18 from comment #16) > Hey, > I believe this is still a gcc bug. If I understand everything correctly all > the F40 packages are recompiled with gcc 14. And regarding node, there is a > cli program called n (https://www.npmjs.com/package/n) that is a version > manager, so that you can have many versions of node installed on one system > at the same time. The program itself downloads the tarball from nodejs' > official website, which works perfectly. > I don't know what version is used to compile these tarballs, but I guess it > is older than 14. Those appear to be built with GCC 10 on RHEL 8. There is a mix of GCC 8.5 and GCC 10 strings in the binary.
It's simple - newer gcc bring new optimisations which expose a bug that was there all along. That's why F39 (which uses gcc 13) doesn't see it and why the upstream node builds don't it they're using gcc 10.
Thank you for the detailed analysis, @jjelen ! I'll watch the upstream bug for updates.
jjelen isn't me. Anyway, I've said what exactly changed in GCC 14, it is https://gcc.gnu.org/r14-4967 and the commit log contains explaination on what changed and why. Forgot to mention what that commit log also stated, that the #ifdef __AVX2__ int i = __AVX2__; #endif #pragma GCC push_options #pragma GCC target ("avx2") #ifdef __AVX2__ int j = __AVX2__; #endif #pragma GCC pop_options #ifdef __AVX2__ int k = __AVX2__; #endif testcase with -O2 -mno-avx provided j (and not i or k) definitions since GCC 4.7 only in C (and only without -save-temps), in C++ it actually behaved like with -save-temps and didn't define any variables, because C parsing handled the pragmas during preprocessing, while C++ parsing first preprocessed everything and only then handled the pragmas. GCC 14 behaves consistenly in both C/C++, integrated preprocessing as well as -save-temps.
As mentioned by Jakub, removing bogus needinfo as I hope he already provided the needed information.
FEDORA-2024-91bb4ed803 (nodejs20-20.12.1-3.fc39) has been submitted as an update to Fedora 39. https://bodhi.fedoraproject.org/updates/FEDORA-2024-91bb4ed803
FEDORA-2024-25b66392e2 (nodejs20-20.12.1-3.fc40) has been submitted as an update to Fedora 40. https://bodhi.fedoraproject.org/updates/FEDORA-2024-25b66392e2
FEDORA-2024-91bb4ed803 has been pushed to the Fedora 39 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-91bb4ed803` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-91bb4ed803 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2024-25b66392e2 has been pushed to the Fedora 40 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-25b66392e2` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-25b66392e2 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
The update fixes the crashes for me. Thanks!
Fix is working. Is there anyway the fix can also be applied to the NodeJS 18 package? It still the same problem as well.
Yes, I'll be making a new release of 18.x this week that also incorporates this fix. I am waiting for https://nodejs.org/en/blog/vulnerability/april-2024-security-releases-2 to release.
Thanks, verified the fix as well. Nodejs no longer crashes on Fedora 40 # node --version v20.12.1 # npm --version 10.5.0 # cat /proc/cpuinfo | grep Intel vendor_id : GenuineIntel model name : Intel(R) Celeron(R) CPU 3865U @ 1.80GHz vendor_id : GenuineIntel model name : Intel(R) Celeron(R) CPU 3865U @ 1.80GHz
FEDORA-2024-2ffe03eaa6 has been pushed to the Fedora 40 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-2ffe03eaa6` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-2ffe03eaa6 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2024-e28ccc9c17 has been pushed to the Fedora 39 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-e28ccc9c17` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-e28ccc9c17 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2024-2ffe03eaa6 (nodejs20-20.12.2-1.fc40) has been pushed to the Fedora 40 stable repository. If problem still persists, please make note of it in this bug report.
FEDORA-2024-e28ccc9c17 (nodejs20-20.12.2-1.fc39) has been pushed to the Fedora 39 stable repository. If problem still persists, please make note of it in this bug report.