Bug 1794488 - gcc ICE when building libgcrypt on arm 64 bit and miscompilation on arm 32 bit
Summary: gcc ICE when building libgcrypt on arm 64 bit and miscompilation on arm 32 bit
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: libgcrypt
Version: 32
Hardware: aarch64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Tomas Mraz
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-23 17:20 UTC by Tomas Mraz
Modified: 2020-04-01 08:50 UTC (History)
12 users (show)

Fixed In Version: libgcrypt-1.8.5-3.fc32
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-04-01 08:50:21 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
GNU Compiler Collection 93454 0 P1 RESOLVED [10 Regression] buffer overflow in fold_array_ctor_reference since r10-1882 2020-04-01 08:48:56 UTC

Description Tomas Mraz 2020-01-23 17:20:06 UTC
The libgcrypt build on rawhide currently fails due to gcc ICE:

See:

https://koji.fedoraproject.org/koji/taskinfo?taskID=40919865

libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I.. -I../src -I../src -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -fasynchronous-unwind-tables -fstack-clash-protection -fvisibility=hidden -fno-delete-null-pointer-checks -Wall -c rijndael.c  -fPIC -DPIC -o .libs/rijndael.o
*** stack smashing detected ***: terminated
*** WARNING *** there are active plugins, do not report this as a bug unless you can reproduce it without enabling any plugins.
Event                            | Plugins
PLUGIN_FINISH_UNIT               | annobin: Generate final annotations
PLUGIN_START_UNIT                | annobin: Generate global annotations
PLUGIN_ALL_PASSES_START          | annobin: Generate per-function annotations
PLUGIN_ALL_PASSES_END            | annobin: Register per-function end symbol
during GIMPLE pass: ccp
rijndael.c: In function 'prepare_decryption':
rijndael.c:2022:3: internal compiler error: Aborted
 2022 |   };
      |   ^
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://bugzilla.redhat.com/bugzilla> for instructions.

Comment 1 Tomas Mraz 2020-01-23 17:39:02 UTC
Actually the situation on armv7hl is even worse - a different source is miscompiled so it produces wrong results. The poly1305 implementation fails the known answer test.

basic: algo 501, mac gcry_mac_verify failed: Checksum error
basic: algo 501, digest mismatch
basic: algo 501, mac gcry_mac_verify failed: Checksum error
computed: e6 79 60 ed 25 01 65 27 28 0f c2 cb 48 7c b6 67 
expected: f3 47 7e 7c d9 54 17 af 89 a6 b8 79 4c 31 0c f0 
computed: e6 79 60 ed 25 01 65 27 28 0f c2 cb 48 7c b6 67 
expected: f3 47 7e 7c d9 54 17 af 89 a6 b8 79 4c 31 0c f0 
computed: 11 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
expected: 05 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
computed: 11 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
expected: 05 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
computed: 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 
expected: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
computed: 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 
expected: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
computed: 01 00 00 00 00 00 00 00 09 00 00 00 00 00 00 00 
expected: 14 00 00 00 00 00 00 00 55 00 00 00 00 00 00 00 
computed: 01 00 00 00 00 00 00 00 09 00 00 00 00 00 00 00 
expected: 14 00 00 00 00 00 00 00 55 00 00 00 00 00 00 00 
computed: 00 00 00 00 00 00 00 00 05 00 00 00 00 00 00 00 
expected: 13 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
computed: 00 00 00 00 00 00 00 00 05 00 00 00 00 00 00 00 
expected: 13 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
basic: algo 501, digest mismatch
basic: algo 501, mac gcry_mac_verify failed: Checksum error
basic: algo 501, digest mismatch
basic: algo 501, mac gcry_mac_verify failed: Checksum error
basic: algo 501, digest mismatch
basic: algo 501, mac gcry_mac_verify failed: Checksum error
basic: algo 501, digest mismatch
basic: algo 501, mac gcry_mac_verify failed: Checksum error
basic: algo 501, digest mismatch
basic: algo 501, mac gcry_mac_verify failed: Checksum error
basic: algo 501, digest mismatch
basic: algo 501, mac gcry_mac_verify failed: Checksum error
basic: algo 501, digest mismatch
basic: algo 501, mac gcry_mac_verify failed: Checksum error
basic: algo 501, digest mismatch
basic: algo 501, mac gcry_mac_verify failed: Checksum error
basic: algo 501, digest mismatch
FAIL: basic

Comment 2 Jakub Jelinek 2020-01-27 14:34:13 UTC
For the buffer overflow I've filed upstream PR93454 and will fix.

Comment 3 Jakub Jelinek 2020-01-28 15:54:57 UTC
For the #c1 issue, I've so far bisected it to cipher/poly1305.c when compiled with
-nostdinc poly1305.i -dumpbase poly1305.c -mfpu=vfpv3-d16 -mtune=generic-armv7-a -mabi=aapcs-linux -mfloat-abi=hard -mtls-dialect=gnu -marm -march=armv7-a+fp -O2 -Werror=format-security -Wall -version -fexceptions -fstack-protector-strong -fvisibility=hidden -fno-delete-null-pointer-checks -fPIC -o poly1305.s
by gcc before https://gcc.gnu.org/r271553 , the test works, when compiled by r271553 or later, it doesn't, but it actually seems to be the make -ftree-loop-distribute-patterns change that matters, with that additional option even GCC 8 fails the test.

Comment 4 Jakub Jelinek 2020-01-28 16:13:59 UTC
And out of the 3 functions where -ftree-loop-distribute-patterns makes a difference it is _gcry_poly1305_init, adding __attribute__((optimize ("tree-loop-distribute-patterns"))) to that function
makes it misbehave even with older gcc revisions and adding __attribute__((optimize ("no-tree-loop-distribute-patterns"))) makes the problem go away even with current revisions.

Comment 5 Jakub Jelinek 2020-01-28 18:54:41 UTC
I believe this is just a bug in _gcry_poly1305_armv7_neon_init_ext in cipher/poly1305-armv7-neon.S:
_gcry_poly1305_armv7_neon_init_ext:
.Lpoly1305_init_ext_neon_local:
        stmfd sp!, {r4-r11, lr}
        sub sp, sp, #32
        mov r14, r2
        and r2, r2, r2
        moveq r14, #-1
My ARM assembly knowledge is very limited, so perhaps I'm missing something important, but the function
takes two arguments, void *state, const poly1305_key_t *key, and those are passed in r0 and r1
registers, r2-r3 are also used for parameter passing, but as this function has just two, their content is undefined.
r4-r11 are call saved registers.  I believe none of the above instructions modify condition flags (NZCV bits in
cpsr register), they aren't cmp/cmn or tst and don't have s suffix.
So, the above after storing r4-r11 and lr to stack and decrementing stack by further 32 bytes copies the r2
register with some random value from the caller to r14, clears r2 and if Z bit is set in cpsr, sets r14 to -1,
otherwise keeps it unmodified.
The problem is that that this depends on in what state happened the caller to leave the cpsr register in.
In older gccs (with -fno-tree-loop-distribute-patterns) the code used to have a loop performing the
  buf_cpy (keytmp.b, key, POLY1305_KEYLEN);
4 bytes at a time, with bne .Lwhatever looping, and no further instructions modified the Z bit before the indirect function call.
So, in that case the Z bit was always set (otherwise it would loop once more) and so moveq set r14 to -1.
Now, with a recent gcc or with -ftree-loop-distribute-patterns, the copying in the caller isn't done using a loop,
but a memcpy inline expansion, so like:
        ldr     r0, [r4]        @ unaligned
        ldr     r1, [r4, #4]    @ unaligned
        ldr     r2, [r4, #8]    @ unaligned
        ldr     r3, [r4, #12]   @ unaligned
        add     ip, sp, #4
        stmia   ip!, {r0, r1, r2, r3}
        ldr     r0, [r4, #16]   @ unaligned
        ldr     r1, [r4, #20]   @ unaligned
        ldr     r2, [r4, #24]   @ unaligned
        ldr     r3, [r4, #28]   @ unaligned
        stmia   ip!, {r0, r1, r2, r3}
But this doesn't change Z flag in any way, and the last instruction before that that modifies the flags is:
        tst     r0, #32768
        bne     .L62
        ldr     lr, .L70+8
        add     lr, pc, lr
        str     lr, [r6, #176]
.L63:
...
.L62:
        ldr     r3, .L70+16
        add     r3, pc, r3
        mov     lr, r3
        str     r3, [r6, #176]
        b       .L63
the tst above, which tests if (features & HWF_ARM_NEON) and as features contains that flag, it is not equal,
so Z bit is clear and it jumps to .L62, where no instruction modifies the flags, then goes to .L63 where it performs the memcpy and finally performs indirect function call to the
buggy routine.  So, instead of r14 being set to -1, it is kept equal to whatever random value r2 had (in my case it is 0 - the bytes 24-27 from key argument).

Now, if I perform a hack and in the gcc emitted assembly add cmp r0, r0 instruction before the indirect function call blx r3, everything passes again.

I'm afraid I have no idea what the cipher/poly1305-armv7-neon.S wanted to say, it makes absolutely no sense, neither the use of the r2 value nor the conditional move.

Comment 6 Tomas Mraz 2020-01-29 09:39:06 UTC
Jakub, thank you very much for this thorough analysis of the problem! I will handle this with libgcrypt upstream.

Comment 7 Ben Cotton 2020-02-11 17:23:36 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 32 development cycle.
Changing version to 32.


Note You need to log in before you can comment on or make changes to this bug.