The libgcrypt build on rawhide currently fails due to gcc ICE: See: https://koji.fedoraproject.org/koji/taskinfo?taskID=40919865 libtool: compile: gcc -DHAVE_CONFIG_H -I. -I.. -I../src -I../src -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -fasynchronous-unwind-tables -fstack-clash-protection -fvisibility=hidden -fno-delete-null-pointer-checks -Wall -c rijndael.c -fPIC -DPIC -o .libs/rijndael.o *** stack smashing detected ***: terminated *** WARNING *** there are active plugins, do not report this as a bug unless you can reproduce it without enabling any plugins. Event | Plugins PLUGIN_FINISH_UNIT | annobin: Generate final annotations PLUGIN_START_UNIT | annobin: Generate global annotations PLUGIN_ALL_PASSES_START | annobin: Generate per-function annotations PLUGIN_ALL_PASSES_END | annobin: Register per-function end symbol during GIMPLE pass: ccp rijndael.c: In function 'prepare_decryption': rijndael.c:2022:3: internal compiler error: Aborted 2022 | }; | ^ Please submit a full bug report, with preprocessed source if appropriate. See <http://bugzilla.redhat.com/bugzilla> for instructions.
Actually the situation on armv7hl is even worse - a different source is miscompiled so it produces wrong results. The poly1305 implementation fails the known answer test. basic: algo 501, mac gcry_mac_verify failed: Checksum error basic: algo 501, digest mismatch basic: algo 501, mac gcry_mac_verify failed: Checksum error computed: e6 79 60 ed 25 01 65 27 28 0f c2 cb 48 7c b6 67 expected: f3 47 7e 7c d9 54 17 af 89 a6 b8 79 4c 31 0c f0 computed: e6 79 60 ed 25 01 65 27 28 0f c2 cb 48 7c b6 67 expected: f3 47 7e 7c d9 54 17 af 89 a6 b8 79 4c 31 0c f0 computed: 11 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 expected: 05 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 computed: 11 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 expected: 05 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 computed: 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 expected: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 computed: 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 expected: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 computed: 01 00 00 00 00 00 00 00 09 00 00 00 00 00 00 00 expected: 14 00 00 00 00 00 00 00 55 00 00 00 00 00 00 00 computed: 01 00 00 00 00 00 00 00 09 00 00 00 00 00 00 00 expected: 14 00 00 00 00 00 00 00 55 00 00 00 00 00 00 00 computed: 00 00 00 00 00 00 00 00 05 00 00 00 00 00 00 00 expected: 13 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 computed: 00 00 00 00 00 00 00 00 05 00 00 00 00 00 00 00 expected: 13 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 basic: algo 501, digest mismatch basic: algo 501, mac gcry_mac_verify failed: Checksum error basic: algo 501, digest mismatch basic: algo 501, mac gcry_mac_verify failed: Checksum error basic: algo 501, digest mismatch basic: algo 501, mac gcry_mac_verify failed: Checksum error basic: algo 501, digest mismatch basic: algo 501, mac gcry_mac_verify failed: Checksum error basic: algo 501, digest mismatch basic: algo 501, mac gcry_mac_verify failed: Checksum error basic: algo 501, digest mismatch basic: algo 501, mac gcry_mac_verify failed: Checksum error basic: algo 501, digest mismatch basic: algo 501, mac gcry_mac_verify failed: Checksum error basic: algo 501, digest mismatch basic: algo 501, mac gcry_mac_verify failed: Checksum error basic: algo 501, digest mismatch FAIL: basic
For the buffer overflow I've filed upstream PR93454 and will fix.
For the #c1 issue, I've so far bisected it to cipher/poly1305.c when compiled with -nostdinc poly1305.i -dumpbase poly1305.c -mfpu=vfpv3-d16 -mtune=generic-armv7-a -mabi=aapcs-linux -mfloat-abi=hard -mtls-dialect=gnu -marm -march=armv7-a+fp -O2 -Werror=format-security -Wall -version -fexceptions -fstack-protector-strong -fvisibility=hidden -fno-delete-null-pointer-checks -fPIC -o poly1305.s by gcc before https://gcc.gnu.org/r271553 , the test works, when compiled by r271553 or later, it doesn't, but it actually seems to be the make -ftree-loop-distribute-patterns change that matters, with that additional option even GCC 8 fails the test.
And out of the 3 functions where -ftree-loop-distribute-patterns makes a difference it is _gcry_poly1305_init, adding __attribute__((optimize ("tree-loop-distribute-patterns"))) to that function makes it misbehave even with older gcc revisions and adding __attribute__((optimize ("no-tree-loop-distribute-patterns"))) makes the problem go away even with current revisions.
I believe this is just a bug in _gcry_poly1305_armv7_neon_init_ext in cipher/poly1305-armv7-neon.S: _gcry_poly1305_armv7_neon_init_ext: .Lpoly1305_init_ext_neon_local: stmfd sp!, {r4-r11, lr} sub sp, sp, #32 mov r14, r2 and r2, r2, r2 moveq r14, #-1 My ARM assembly knowledge is very limited, so perhaps I'm missing something important, but the function takes two arguments, void *state, const poly1305_key_t *key, and those are passed in r0 and r1 registers, r2-r3 are also used for parameter passing, but as this function has just two, their content is undefined. r4-r11 are call saved registers. I believe none of the above instructions modify condition flags (NZCV bits in cpsr register), they aren't cmp/cmn or tst and don't have s suffix. So, the above after storing r4-r11 and lr to stack and decrementing stack by further 32 bytes copies the r2 register with some random value from the caller to r14, clears r2 and if Z bit is set in cpsr, sets r14 to -1, otherwise keeps it unmodified. The problem is that that this depends on in what state happened the caller to leave the cpsr register in. In older gccs (with -fno-tree-loop-distribute-patterns) the code used to have a loop performing the buf_cpy (keytmp.b, key, POLY1305_KEYLEN); 4 bytes at a time, with bne .Lwhatever looping, and no further instructions modified the Z bit before the indirect function call. So, in that case the Z bit was always set (otherwise it would loop once more) and so moveq set r14 to -1. Now, with a recent gcc or with -ftree-loop-distribute-patterns, the copying in the caller isn't done using a loop, but a memcpy inline expansion, so like: ldr r0, [r4] @ unaligned ldr r1, [r4, #4] @ unaligned ldr r2, [r4, #8] @ unaligned ldr r3, [r4, #12] @ unaligned add ip, sp, #4 stmia ip!, {r0, r1, r2, r3} ldr r0, [r4, #16] @ unaligned ldr r1, [r4, #20] @ unaligned ldr r2, [r4, #24] @ unaligned ldr r3, [r4, #28] @ unaligned stmia ip!, {r0, r1, r2, r3} But this doesn't change Z flag in any way, and the last instruction before that that modifies the flags is: tst r0, #32768 bne .L62 ldr lr, .L70+8 add lr, pc, lr str lr, [r6, #176] .L63: ... .L62: ldr r3, .L70+16 add r3, pc, r3 mov lr, r3 str r3, [r6, #176] b .L63 the tst above, which tests if (features & HWF_ARM_NEON) and as features contains that flag, it is not equal, so Z bit is clear and it jumps to .L62, where no instruction modifies the flags, then goes to .L63 where it performs the memcpy and finally performs indirect function call to the buggy routine. So, instead of r14 being set to -1, it is kept equal to whatever random value r2 had (in my case it is 0 - the bytes 24-27 from key argument). Now, if I perform a hack and in the gcc emitted assembly add cmp r0, r0 instruction before the indirect function call blx r3, everything passes again. I'm afraid I have no idea what the cipher/poly1305-armv7-neon.S wanted to say, it makes absolutely no sense, neither the use of the r2 value nor the conditional move.
Jakub, thank you very much for this thorough analysis of the problem! I will handle this with libgcrypt upstream.
This bug appears to have been reported against 'rawhide' during the Fedora 32 development cycle. Changing version to 32.