Bug 2280347
Summary: | Invalid opcode on Xeon Phi 7290 | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Jocelyn Falempe <jfalempe> |
Component: | zlib-ng | Assignee: | Tulio Magno Quites Machado Filho <tuliom> |
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 40 | CC: | aekoroglu, code, ljavorsk, tuliom |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | zlib-ng-2.1.6-3.fc41 zlib-ng-2.1.6-5.fc40 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2024-05-17 17:17:09 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Jocelyn Falempe
2024-05-14 12:42:16 UTC
I had a chat with Jocelyn and we managed to collect more data. Stack trace of thread 28701: #0 0x00007f2ee88399f8 adler32_avx512 (libz.so.1 + 0x139f8) #1 0x00007f2ee88348a4 inflate (libz.so.1 + 0xe8a4) #2 0x0000562f3d0beb3f git_inflate (git + 0x1c6b3f) #3 0x0000562f3cf7730b parse_pack_objects.lto_priv.0 (git + 0x7f30b) #4 0x0000562f3cf80cb0 cmd_index_pack (git + 0x88cb0) #5 0x0000562f3cf04215 handle_builtin.lto_priv.0 (git + 0xc215) #6 0x0000562f3cf047d2 run_argv.lto_priv.0 (git + 0xc7d2) #7 0x0000562f3ceff6eb main (git + 0x76eb) #8 0x00007f2ee8663088 __libc_start_call_main (libc.so.6 + 0x2a088) #9 0x00007f2ee866314b __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x2a14b) #10 0x0000562f3ceffb55 _start (git + 0x7b55) GDB identified the correct source code line and invalid instruction: Program terminated with signal SIGILL, Illegal instruction. #0 0x00007f2ee88399f8 in _mm512_madd_epi16 (__B=..., __A=...) at /usr/lib/gcc/x86_64-redhat-linux/14/include/avx512bwintrin.h:1544 warning: Source file is more recent than executable. 1544 return (__m512i) __builtin_ia32_pmaddwd512_mask ((__v32hi) __A, (gdb) disas Dump of assembler code for function adler32_avx512: 0x00007f2ee88399b0 <+0>: endbr64 0x00007f2ee88399b4 <+4>: mov %rdx,%rcx 0x00007f2ee88399b7 <+7>: test %rsi,%rsi 0x00007f2ee88399ba <+10>: je 0x7f2ee8839b44 <adler32_avx512+404> 0x00007f2ee88399c0 <+16>: mov %edi,%eax 0x00007f2ee88399c2 <+18>: test %rdx,%rdx 0x00007f2ee88399c5 <+21>: je 0x7f2ee8839b3f <adler32_avx512+399> 0x00007f2ee88399cb <+27>: shr $0x10,%eax 0x00007f2ee88399ce <+30>: movzwl %di,%edx 0x00007f2ee88399d1 <+33>: cmp $0x3f,%rcx 0x00007f2ee88399d5 <+37>: jbe 0x7f2ee8839b4a <adler32_avx512+410> 0x00007f2ee88399db <+43>: mov $0x1,%edi 0x00007f2ee88399e0 <+48>: vmovdqa64 0x8916(%rip),%zmm7 # 0x7f2ee8842300 0x00007f2ee88399ea <+58>: vmovdqa32 0x894c(%rip),%zmm8 # 0x7f2ee8842340 0x00007f2ee88399f4 <+68>: vpxor %xmm6,%xmm6,%xmm6 => 0x00007f2ee88399f8 <+72>: vpbroadcastw %edi,%zmm5 I'm writing a bug report upstream. Per https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#techs=AVX_512&text=madd_epi16&ig_expand=4203, the _mm512_madd_epi16 intrinsic requires the AVX512BW CPUID flag. Upstream only checks for AVX512F here. https://github.com/zlib-ng/zlib-ng/blob/1007e7a9c74148fe915384d7cc44921559500241/arch/x86/x86_features.c#L102 Changing ebx & 0x00010000 to ebx & 0x40010000 should suffice, but someone should really audit the AVX-512 code to see if any of the other intrinsics are outside AVX512F. It makes sense that this showed up on Xeon PHI, because those processors have an unusual combination of AVX-512 features compared to the “normal” Xeon server chips that upstream probably tested with. I've just created the following pull request upstream: https://github.com/zlib-ng/zlib-ng/pull/1723 Pull request downstream has just been created: https://src.fedoraproject.org/rpms/zlib-ng/pull-request/13 FEDORA-2024-1a7f4b98c3 (zlib-ng-2.1.6-3.fc41) has been submitted as an update to Fedora 41. https://bodhi.fedoraproject.org/updates/FEDORA-2024-1a7f4b98c3 I can confirm that by installing zlib-ng-compat x86_64 2.1.6-3.fc41 zlib-ng-compat-devel 2.1.6-3.fc41 from the bodhi link in comment #5 on my F40, fixes the issue, and now git clone works. Thanks for the fix. FEDORA-2024-1a7f4b98c3 (zlib-ng-2.1.6-3.fc41) has been pushed to the Fedora 41 stable repository. If problem still persists, please make note of it in this bug report. Jocelyn, upstream asked for some important changes to the fix I had implemented. Could you check if it still fixes the issue you were seeing, please? This has now landed upstream and has been backported to rawhide in zlib-ng 2.1.6-4. I have a scratch build for F40 available here too: https://koji.fedoraproject.org/koji/taskinfo?taskID=118346414 Sure, I will test your scratch build on Monday, and report here. Thanks. I have tested on the hpe-xl260 machine with F40, and installing zlib-ng-compat-2.1.6-5.fc40.x86_64.rpm does fix the issue. Thanks for you support. FEDORA-2024-241a7a825b (zlib-ng-2.1.6-5.fc40) has been submitted as an update to Fedora 40. https://bodhi.fedoraproject.org/updates/FEDORA-2024-241a7a825b FEDORA-2024-241a7a825b has been pushed to the Fedora 40 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-241a7a825b` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-241a7a825b See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates. FEDORA-2024-241a7a825b (zlib-ng-2.1.6-5.fc40) has been pushed to the Fedora 40 stable repository. If problem still persists, please make note of it in this bug report. |