Bug 1794488
| Summary: | gcc ICE when building libgcrypt on arm 64 bit and miscompilation on arm 32 bit | ||
|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Tomas Mraz <tmraz> |
| Component: | libgcrypt | Assignee: | Tomas Mraz <tmraz> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 32 | CC: | aoliva, avi.kivity, crypto-team, dmalcolm, fweimer, jakub, jwakely, law, mpolacek, msebor, nickc, tmraz |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | aarch64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | libgcrypt-1.8.5-3.fc32 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-04-01 08:50:21 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Tomas Mraz
2020-01-23 17:20:06 UTC
Actually the situation on armv7hl is even worse - a different source is miscompiled so it produces wrong results. The poly1305 implementation fails the known answer test. basic: algo 501, mac gcry_mac_verify failed: Checksum error basic: algo 501, digest mismatch basic: algo 501, mac gcry_mac_verify failed: Checksum error computed: e6 79 60 ed 25 01 65 27 28 0f c2 cb 48 7c b6 67 expected: f3 47 7e 7c d9 54 17 af 89 a6 b8 79 4c 31 0c f0 computed: e6 79 60 ed 25 01 65 27 28 0f c2 cb 48 7c b6 67 expected: f3 47 7e 7c d9 54 17 af 89 a6 b8 79 4c 31 0c f0 computed: 11 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 expected: 05 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 computed: 11 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 expected: 05 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 computed: 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 expected: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 computed: 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 expected: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 computed: 01 00 00 00 00 00 00 00 09 00 00 00 00 00 00 00 expected: 14 00 00 00 00 00 00 00 55 00 00 00 00 00 00 00 computed: 01 00 00 00 00 00 00 00 09 00 00 00 00 00 00 00 expected: 14 00 00 00 00 00 00 00 55 00 00 00 00 00 00 00 computed: 00 00 00 00 00 00 00 00 05 00 00 00 00 00 00 00 expected: 13 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 computed: 00 00 00 00 00 00 00 00 05 00 00 00 00 00 00 00 expected: 13 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 basic: algo 501, digest mismatch basic: algo 501, mac gcry_mac_verify failed: Checksum error basic: algo 501, digest mismatch basic: algo 501, mac gcry_mac_verify failed: Checksum error basic: algo 501, digest mismatch basic: algo 501, mac gcry_mac_verify failed: Checksum error basic: algo 501, digest mismatch basic: algo 501, mac gcry_mac_verify failed: Checksum error basic: algo 501, digest mismatch basic: algo 501, mac gcry_mac_verify failed: Checksum error basic: algo 501, digest mismatch basic: algo 501, mac gcry_mac_verify failed: Checksum error basic: algo 501, digest mismatch basic: algo 501, mac gcry_mac_verify failed: Checksum error basic: algo 501, digest mismatch basic: algo 501, mac gcry_mac_verify failed: Checksum error basic: algo 501, digest mismatch FAIL: basic For the buffer overflow I've filed upstream PR93454 and will fix. For the #c1 issue, I've so far bisected it to cipher/poly1305.c when compiled with -nostdinc poly1305.i -dumpbase poly1305.c -mfpu=vfpv3-d16 -mtune=generic-armv7-a -mabi=aapcs-linux -mfloat-abi=hard -mtls-dialect=gnu -marm -march=armv7-a+fp -O2 -Werror=format-security -Wall -version -fexceptions -fstack-protector-strong -fvisibility=hidden -fno-delete-null-pointer-checks -fPIC -o poly1305.s by gcc before https://gcc.gnu.org/r271553 , the test works, when compiled by r271553 or later, it doesn't, but it actually seems to be the make -ftree-loop-distribute-patterns change that matters, with that additional option even GCC 8 fails the test. And out of the 3 functions where -ftree-loop-distribute-patterns makes a difference it is _gcry_poly1305_init, adding __attribute__((optimize ("tree-loop-distribute-patterns"))) to that function
makes it misbehave even with older gcc revisions and adding __attribute__((optimize ("no-tree-loop-distribute-patterns"))) makes the problem go away even with current revisions.
I believe this is just a bug in _gcry_poly1305_armv7_neon_init_ext in cipher/poly1305-armv7-neon.S:
_gcry_poly1305_armv7_neon_init_ext:
.Lpoly1305_init_ext_neon_local:
stmfd sp!, {r4-r11, lr}
sub sp, sp, #32
mov r14, r2
and r2, r2, r2
moveq r14, #-1
My ARM assembly knowledge is very limited, so perhaps I'm missing something important, but the function
takes two arguments, void *state, const poly1305_key_t *key, and those are passed in r0 and r1
registers, r2-r3 are also used for parameter passing, but as this function has just two, their content is undefined.
r4-r11 are call saved registers. I believe none of the above instructions modify condition flags (NZCV bits in
cpsr register), they aren't cmp/cmn or tst and don't have s suffix.
So, the above after storing r4-r11 and lr to stack and decrementing stack by further 32 bytes copies the r2
register with some random value from the caller to r14, clears r2 and if Z bit is set in cpsr, sets r14 to -1,
otherwise keeps it unmodified.
The problem is that that this depends on in what state happened the caller to leave the cpsr register in.
In older gccs (with -fno-tree-loop-distribute-patterns) the code used to have a loop performing the
buf_cpy (keytmp.b, key, POLY1305_KEYLEN);
4 bytes at a time, with bne .Lwhatever looping, and no further instructions modified the Z bit before the indirect function call.
So, in that case the Z bit was always set (otherwise it would loop once more) and so moveq set r14 to -1.
Now, with a recent gcc or with -ftree-loop-distribute-patterns, the copying in the caller isn't done using a loop,
but a memcpy inline expansion, so like:
ldr r0, [r4] @ unaligned
ldr r1, [r4, #4] @ unaligned
ldr r2, [r4, #8] @ unaligned
ldr r3, [r4, #12] @ unaligned
add ip, sp, #4
stmia ip!, {r0, r1, r2, r3}
ldr r0, [r4, #16] @ unaligned
ldr r1, [r4, #20] @ unaligned
ldr r2, [r4, #24] @ unaligned
ldr r3, [r4, #28] @ unaligned
stmia ip!, {r0, r1, r2, r3}
But this doesn't change Z flag in any way, and the last instruction before that that modifies the flags is:
tst r0, #32768
bne .L62
ldr lr, .L70+8
add lr, pc, lr
str lr, [r6, #176]
.L63:
...
.L62:
ldr r3, .L70+16
add r3, pc, r3
mov lr, r3
str r3, [r6, #176]
b .L63
the tst above, which tests if (features & HWF_ARM_NEON) and as features contains that flag, it is not equal,
so Z bit is clear and it jumps to .L62, where no instruction modifies the flags, then goes to .L63 where it performs the memcpy and finally performs indirect function call to the
buggy routine. So, instead of r14 being set to -1, it is kept equal to whatever random value r2 had (in my case it is 0 - the bytes 24-27 from key argument).
Now, if I perform a hack and in the gcc emitted assembly add cmp r0, r0 instruction before the indirect function call blx r3, everything passes again.
I'm afraid I have no idea what the cipher/poly1305-armv7-neon.S wanted to say, it makes absolutely no sense, neither the use of the r2 value nor the conditional move.
Jakub, thank you very much for this thorough analysis of the problem! I will handle this with libgcrypt upstream. This bug appears to have been reported against 'rawhide' during the Fedora 32 development cycle. Changing version to 32. |