Description of problem: Applications (eg. sshd) abort with SIGILL on Power8 machines after recent openssl update. I suspect the sync from RHEL brought some HW level expectation valid only in RHEL (eg. RHEL-9 requires a Power9 or newer system). Version-Release number of selected component (if applicable): openssl-1:3.0.5-3.fc38 How reproducible: 100% Steps to Reproduce: 1. ssh to sshd running on power8 or 2. try "dnf udpate" from local console or 3. use some other app Actual results: [ 3705.137658] sshd[1703]: illegal instruction (4) at 7fff85526aac nip 7fff85526aac lr 7fff854828e0 code 1 in libcrypto.so.3.0.5[7fff85240000+300000] [ 3705.137866] sshd[1703]: code: 7f4909ce 39290010 7f6909ce 39290010 7f8909ce 39290010 7fa909ce 39290010 [ 3705.137920] sshd[1703]: code: 7fc909ce 39290010 7fe909ce f8010210 <7c0046d9> 39400020 7c4a4699 39400030 Expected results: no SIGILL Additional info:
a downgrade to openssl-3.0.5-2.fc37 (via rpm) fixes the issue
Could you please check this scratch build? https://koji.fedoraproject.org/koji/taskinfo?taskID=91732215
The only thing we did in this area is https://bugzilla.redhat.com/show_bug.cgi?id=2051312. These are performance optimizations also applied upstream in https://github.com/openssl/openssl/commit/44a563dde1584cd9284e80b6e45ee5019be8d36c and https://github.com/openssl/openssl/commit/345c99b6654b8313c792d54f829943068911ddbd for AES-GCM and https://github.com/openssl/openssl/commit/f596bbe4da779b56eea34d96168b557d78e1149 and https://github.com/openssl/openssl/commit/7e1f3ffcc5bc15fb9a12b9e3bb202f544c6ed5aa for ChaCha20. If these don't work on Power8, we should also notify OpenSSL upstream. I'm Cc'ing the IBM people that worked on this.
(In reply to Dmitry Belyavskiy from comment #2) > Could you please check this scratch build? > https://koji.fedoraproject.org/koji/taskinfo?taskID=91732215 seems this build is OK for Power8
backtrace from sshd captured by coredumpctl Stack trace of thread 1703: #0 0x00007fff85526aac n/a (libcrypto.so.3 + 0x396aac) #1 0x00007fff854828e0 aes_p10_gcm_crypt.lto_priv.0 (libcrypto.so.3 + 0x2f28e0) #2 0x00007fff85485888 generic_aes_gcm_cipher_update (libcrypto.so.3 + 0x2f5888) #3 0x00007fff854de720 gcm_cipher_internal (libcrypto.so.3 + 0x34e720) #4 0x00007fff854debd8 ossl_gcm_cipher (libcrypto.so.3 + 0x34ebd8) #5 0x00007fff85361cf0 EVP_Cipher (libcrypto.so.3 + 0x1d1cf0) #6 0x0000000114961320 cipher_crypt (sshd + 0x71320) #7 0x0000000114971890 ssh_packet_send2_wrapped (sshd + 0x81890) #8 0x00000001149756f8 sshpkt_send (sshd + 0x856f8) #9 0x00000001149837f0 kex_send_newkeys (sshd + 0x937f0) #10 0x0000000114987c18 input_kex_gen_init (sshd + 0x97c18) #11 0x0000000114979278 ssh_dispatch_run_fatal (sshd + 0x89278) #12 0x000000011490b048 do_ssh2_kex (sshd + 0x1b048) #13 0x0000000114907f8c main (sshd + 0x17f8c) #14 0x00007fff84bd802c __libc_start_call_main (libc.so.6 + 0x3802c) #15 0x00007fff84bd826c __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x3826c) ELF object binary architecture: PowerPC64 seems like it's using the p10 variant ...
Dan, could you also check against openssl upstream?
Despite the aes_p10_gcm_crypt name, there does not seem to be anything Power10-specific about this function: https://github.com/openssl/openssl/commit/345c99b6654b8313c792d54f829943068911ddbd#diff-603e722ab30f575238f8b4b59fd4a6c1f6120463db2165e0578975067ff900f4R50. It delegates to ppc_aes_gcm_encrypt, which is implemented in https://github.com/openssl/openssl/commit/44a563dde1584cd9284e80b6e45ee5019be8d36c#diff-4dc4358ced630b88de0636ecb930703c759366f83dcc73302d6ec31c25a14aa7R458, and the commit message claims it should work on Power9 and above. https://github.com/openssl/openssl/commit/44a563dde1584cd9284e80b6e45ee5019be8d36c#diff-603e722ab30f575238f8b4b59fd4a6c1f6120463db2165e0578975067ff900f4R37 should only select this algorithm if supported by the current architecture, and does that by checking OPENSSL_ppccap_P's PPC_MADD300 bit (1<<4), which according to https://github.com/openssl/openssl/blob/master/crypto/ppccap.c#L193-L194 should be set on POWER9 and later. Could you share the output of openssl version -c on your machine?
openssl version -c returns "CPUINFO: N/A" for both my Power8 and Power9 systems
for the record, OPENSSL_cpuid_setup() is taking the GETAUXVAL codepath on Fedora
(In reply to Dmitry Belyavskiy from comment #6) > Dan, could you also check against openssl upstream? upstream "make test" from master branch sees a bunch of failures with the same symptoms (illegal instruction) when run on my p8 machine let me verify on another p8 ...
Great! Could you please raise a bug report upstream then?
For the record: [root@ibm-p9z-25-lp3 ~]# grep cpu /proc/cpuinfo | sort -u cpu : POWER9 (architected), altivec supported [root@ibm-p9z-25-lp3 ~]# ./test getauxval(AT_HWCAP): 0xdc0065c2 getauxval(AT_HWCAP2): 0xfff00000 [root@ibm-p8-pvm-09-guest-11 ~]# grep cpu /proc/cpuinfo | sort -u cpu : POWER8 (architected), altivec supported [root@ibm-p8-pvm-09-guest-11 ~]# gcc -o test test.c && ./test getauxval(AT_HWCAP): 0xdc0065c2 getauxval(AT_HWCAP2): 0xff000000
From what I can see, a getauxval(AT_HWCAP2) of 0xff000000 should correctly disable this code, since crypto/ppccap.c checks for 0xff000000 & HWCAP_ARCH_3_00, which is 0xff000000 & (1U << 23) = 0x0. Dan, could you compile and run #include <sys/auxv.h> #include <stdio.h> int main() { fprintf(stderr, "getauxval(AT_HWCAP): 0x%lx\n", getauxval(AT_HWCAP)); fprintf(stderr, "getauxval(AT_HWCAP2): 0x%lx\n", getauxval(AT_HWCAP2)); } on your machine?
For the record I have reserved a p8 machine (VM/LPAR) from beaker with F-36 and re-run the upstream build and tests. This setup reproduced the results from my development machine. Test Summary Report ------------------- 30-test_evp.t (Wstat: 256 (exited 1) Tests: 74 Failed: 1) Failed test: 2 Non-zero exit status: 1 70-test_asyncio.t (Wstat: 256 (exited 1) Tests: 1 Failed: 1) Failed test: 1 Non-zero exit status: 1 70-test_comp.t (Wstat: 27392 (exited 107) Tests: 0 Failed: 0) Non-zero exit status: 107 Parse errors: No plan found in TAP output 70-test_key_share.t (Wstat: 27392 (exited 107) Tests: 0 Failed: 0) Non-zero exit status: 107 Parse errors: No plan found in TAP output 70-test_recordlen.t (Wstat: 256 (exited 1) Tests: 1 Failed: 1) Failed test: 1 Non-zero exit status: 1 70-test_servername.t (Wstat: 256 (exited 1) Tests: 1 Failed: 1) Failed test: 1 Non-zero exit status: 1 70-test_sslextension.t (Wstat: 27392 (exited 107) Tests: 7 Failed: 0) Non-zero exit status: 107 Parse errors: Bad plan. You planned 8 tests but ran 7. 70-test_sslrecords.t (Wstat: 27392 (exited 107) Tests: 12 Failed: 0) Non-zero exit status: 107 Parse errors: Bad plan. You planned 21 tests but ran 12. 70-test_sslsigalgs.t (Wstat: 27392 (exited 107) Tests: 0 Failed: 0) Non-zero exit status: 107 Parse errors: No plan found in TAP output 70-test_sslsignature.t (Wstat: 27392 (exited 107) Tests: 0 Failed: 0) Non-zero exit status: 107 Parse errors: No plan found in TAP output 70-test_sslversions.t (Wstat: 27392 (exited 107) Tests: 4 Failed: 0) Non-zero exit status: 107 Parse errors: Bad plan. You planned 8 tests but ran 4. 70-test_tls13alerts.t (Wstat: 27392 (exited 107) Tests: 0 Failed: 0) Non-zero exit status: 107 Parse errors: No plan found in TAP output 70-test_tls13cookie.t (Wstat: 27392 (exited 107) Tests: 0 Failed: 0) Non-zero exit status: 107 Parse errors: No plan found in TAP output 70-test_tls13downgrade.t (Wstat: 27392 (exited 107) Tests: 4 Failed: 0) Non-zero exit status: 107 Parse errors: Bad plan. You planned 6 tests but ran 4. 70-test_tls13hrr.t (Wstat: 27392 (exited 107) Tests: 0 Failed: 0) Non-zero exit status: 107 Parse errors: No plan found in TAP output 70-test_tls13kexmodes.t (Wstat: 27392 (exited 107) Tests: 0 Failed: 0) Non-zero exit status: 107 Parse errors: No plan found in TAP output 70-test_tls13messages.t (Wstat: 28416 (exited 111) Tests: 0 Failed: 0) Non-zero exit status: 111 Parse errors: No plan found in TAP output 70-test_tls13psk.t (Wstat: 27392 (exited 107) Tests: 0 Failed: 0) Non-zero exit status: 107 Parse errors: No plan found in TAP output 70-test_tlsextms.t (Wstat: 27392 (exited 107) Tests: 9 Failed: 0) Non-zero exit status: 107 Parse errors: Bad plan. You planned 10 tests but ran 9. 80-test_dtls_mtu.t (Wstat: 256 (exited 1) Tests: 1 Failed: 1) Failed test: 1 Non-zero exit status: 1 80-test_ssl_new.t (Wstat: 6144 (exited 24) Tests: 31 Failed: 24) Failed tests: 1-18, 20-21, 24, 26-28 Non-zero exit status: 24 80-test_ssl_old.t (Wstat: 512 (exited 2) Tests: 7 Failed: 2) Failed tests: 2-3 Non-zero exit status: 2 80-test_sslcorrupt.t (Wstat: 256 (exited 1) Tests: 1 Failed: 1) Failed test: 1 Non-zero exit status: 1 90-test_sslapi.t (Wstat: 256 (exited 1) Tests: 1 Failed: 1) Failed test: 1 Non-zero exit status: 1 90-test_sslbuffers.t (Wstat: 256 (exited 1) Tests: 1 Failed: 1) Failed test: 1 Non-zero exit status: 1 90-test_tls13ccs.t (Wstat: 256 (exited 1) Tests: 1 Failed: 1) Failed test: 1 Non-zero exit status: 1 90-test_tls13encryption.t (Wstat: 256 (exited 1) Tests: 1 Failed: 1) Failed test: 1 Non-zero exit status: 1 99-test_fuzz_client.t (Wstat: 256 (exited 1) Tests: 2 Failed: 1) Failed test: 2 Non-zero exit status: 1 99-test_fuzz_server.t (Wstat: 256 (exited 1) Tests: 2 Failed: 1) Failed test: 2 Non-zero exit status: 1 Files=256, Tests=3316, 562 wallclock secs (10.93 usr 0.69 sys + 495.51 cusr 36.15 csys = 543.28 CPU) Result: FAIL
the AUXVAL results [sharkcz@tyan-openpower-01 ~]$ ./test getauxval(AT_HWCAP): 0xdc0065c2 getauxval(AT_HWCAP2): 0xff000000
I am going to open an upstream bug now ...
*** Bug 2125295 has been marked as a duplicate of this bug. ***
Dmitry, could you build an update with the problematic AES patch disabled? Seems the upstream fix take some time.
Dan, if you need it ASAP, I will do. If not, could we delay it to, say, Tuesday?
I think Tuesday will be OK, the COPR guys have already applied a workaround for their buildsystem to unblock rawhide builds, thus it's not super urgent.
Could you also check that ChaCha is not affected?
(In reply to Dmitry Belyavskiy from comment #21) > Could you also check that ChaCha is not affected? Seems it is OK, the test-suite passes in local rpm build when only the AES patch is disabled. Also the accelerated ChaCha is plugged in in a different way if I understand it right.
https://github.com/openssl/openssl/pull/19182 looks like a relevant fix.
FEDORA-2022-343ea0d960 has been submitted as an update to Fedora 38. https://bodhi.fedoraproject.org/updates/FEDORA-2022-343ea0d960
FEDORA-2022-343ea0d960 has been pushed to the Fedora 38 stable repository. If problem still persists, please make note of it in this bug report.