Bug 1652929
Summary: | Backport ppc64le str[n]cmp inlined code | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Mark Wielaard <mjw> |
Component: | gcc | Assignee: | Marek Polacek <mpolacek> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Michael Petlan <mpetlan> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 8.0 | CC: | codonell, fweimer, jakub, law, mcermak, mpetlan, ohudlick |
Target Milestone: | rc | ||
Target Release: | 8.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | gcc-8.2.1-3.4.el8 | Doc Type: | No Doc Update |
Doc Text: |
undefined
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2019-06-13 22:56:54 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1532205, 1652932, 1734295 |
Description
Mark Wielaard
2018-11-23 15:38:18 UTC
Backport gcc patch has been posted upstream now: https://gcc.gnu.org/ml/gcc-patches/2018-11/msg02161.html (It hasn't landed on the gcc-8-branch yet though.) This is now on the upstream gcc-8-branch: Author: acsawdey Date: Wed Nov 28 19:33:04 2018 New Revision: 266578 URL: https://gcc.gnu.org/viewcvs?rev=266578&root=gcc&view=rev Log: 2018-11-28 Aaron Sawdey <acsawdey.com> Backport from mainline 2018-10-25 Aaron Sawdey <acsawdey.com> * config/rs6000/rs6000-string.c (expand_strncmp_gpr_sequence): Change to a shorter sequence with fewer branches. (emit_final_str_compare_gpr): Ditto. Backport from mainline to allow the above code to go in: 2018-06-14 Aaron Sawdey <acsawdey.com> * config/rs6000/rs6000-string.c (do_and3, do_and3_mask, do_cmpb3, do_rotl3): New functions. Modified: branches/gcc-8-branch/gcc/ChangeLog branches/gcc-8-branch/gcc/config/rs6000/rs6000-string.c I think I have finally managed to see the difference in code generated by 8.2.1-3.3.el8 and -3.5.el8. I have tried it on expanded strcmp: ================================= OLD ================================= int a = strcmp(s, u); 100007d0: 00 00 00 39 li r8,0 100007d4: f8 53 26 7d cmpb r6,r9,r10 100007d8: f8 43 28 7d cmpb r8,r9,r8 100007dc: 38 33 06 7d orc r6,r8,r6 100007e0: 74 00 c6 7c cntlzd r6,r6 100007e4: 08 00 c6 38 addi r6,r6,8 100007e8: 30 36 29 79 rldcl r9,r9,r6,56 100007ec: 30 36 4a 79 rldcl r10,r10,r6,56 100007f0: 50 48 ca 7c subf r6,r10,r9 100007f4: 84 ff ff 4b b 10000778 <main+0xd8> 100007f8: 08 00 3e 39 addi r9,r30,8 100007fc: 08 00 5f 39 addi r10,r31,8 10000800: 28 4c 20 7d ldbrx r9,0,r9 10000804: 28 54 40 7d ldbrx r10,0,r10 10000808: 51 48 ca 7c subf. r6,r10,r9 1000080c: c4 ff 82 40 bne 100007d0 <main+0x130> 10000810: f8 33 2a 7d cmpb r10,r9,r6 10000814: 00 00 aa 2f cmpdi cr7,r10,0 10000818: 60 ff 9e 40 bne cr7,10000778 <main+0xd8> 1000081c: 10 00 3e 39 addi r9,r30,16 10000820: 10 00 5f 39 addi r10,r31,16 10000824: 28 4c 20 7d ldbrx r9,0,r9 10000828: 28 54 40 7d ldbrx r10,0,r10 1000082c: 51 48 ca 7c subf. r6,r10,r9 10000830: a0 ff 82 40 bne 100007d0 <main+0x130> 10000834: f8 33 2a 7d cmpb r10,r9,r6 10000838: 00 00 aa 2f cmpdi cr7,r10,0 1000083c: 3c ff 9e 40 bne cr7,10000778 <main+0xd8> 10000840: 18 00 3e 39 addi r9,r30,24 10000844: 18 00 5f 39 addi r10,r31,24 10000848: 28 4c 20 7d ldbrx r9,0,r9 1000084c: 28 54 40 7d ldbrx r10,0,r10 10000850: 51 48 ca 7c subf. r6,r10,r9 10000854: 7c ff 82 40 bne 100007d0 <main+0x130> 10000858: f8 33 2a 7d cmpb r10,r9,r6 1000085c: 00 00 aa 2f cmpdi cr7,r10,0 10000860: 18 ff 9e 40 bne cr7,10000778 <main+0xd8> 10000864: 20 00 3e 39 addi r9,r30,32 10000868: 20 00 5f 39 addi r10,r31,32 1000086c: 28 4c 20 7d ldbrx r9,0,r9 10000870: 28 54 40 7d ldbrx r10,0,r10 10000874: 51 48 ca 7c subf. r6,r10,r9 10000878: 58 ff 82 40 bne 100007d0 <main+0x130> 1000087c: f8 33 2a 7d cmpb r10,r9,r6 10000880: 00 00 aa 2f cmpdi cr7,r10,0 10000884: f4 fe 9e 40 bne cr7,10000778 <main+0xd8> 10000888: 28 00 3e 39 addi r9,r30,40 1000088c: 28 00 5f 39 addi r10,r31,40 10000890: 28 4c 20 7d ldbrx r9,0,r9 10000894: 28 54 40 7d ldbrx r10,0,r10 10000898: 51 48 ca 7c subf. r6,r10,r9 1000089c: 34 ff 82 40 bne 100007d0 <main+0x130> 100008a0: f8 33 2a 7d cmpb r10,r9,r6 100008a4: 00 00 aa 2f cmpdi cr7,r10,0 100008a8: d0 fe 9e 40 bne cr7,10000778 <main+0xd8> 100008ac: 30 00 3e 39 addi r9,r30,48 100008b0: 30 00 5f 39 addi r10,r31,48 100008b4: 28 4c 20 7d ldbrx r9,0,r9 100008b8: 28 54 40 7d ldbrx r10,0,r10 100008bc: 51 48 ca 7c subf. r6,r10,r9 100008c0: 10 ff 82 40 bne 100007d0 <main+0x130> 100008c4: f8 33 2a 7d cmpb r10,r9,r6 100008c8: 00 00 aa 2f cmpdi cr7,r10,0 100008cc: ac fe 9e 40 bne cr7,10000778 <main+0xd8> 100008d0: 38 00 3e 39 addi r9,r30,56 100008d4: 38 00 5f 39 addi r10,r31,56 100008d8: 28 4c 20 7d ldbrx r9,0,r9 100008dc: 28 54 40 7d ldbrx r10,0,r10 100008e0: 51 48 ca 7c subf. r6,r10,r9 100008e4: ec fe 82 40 bne 100007d0 <main+0x130> 100008e8: f8 33 2a 7d cmpb r10,r9,r6 100008ec: 00 00 aa 2f cmpdi cr7,r10,0 100008f0: 88 fe 9e 40 bne cr7,10000778 <main+0xd8> 100008f4: 40 00 9f 38 addi r4,r31,64 100008f8: 40 00 7e 38 addi r3,r30,64 100008fc: 25 fd ff 4b bl 10000620 <00000022.plt_call.strcmp@@GLIBC_2.17> 10000900: 18 00 41 e8 ld r2,24(r1) 10000904: 78 1b 66 7c mr r6,r3 10000908: 70 fe ff 4b b 10000778 <main+0xd8> ================================= NEW ================================= 100007dc: 99 fe 20 7c lxvd2x vs33,0,r31 100007e0: 99 f6 00 7c lxvd2x vs32,0,r30 100007e4: 8c 03 a0 11 vspltisw v13,0 100007e8: 00 00 40 39 li r10,0 100007ec: 06 00 81 11 vcmpequb v12,v1,v0 100007f0: 06 68 01 10 vcmpequb v0,v1,v13 100007f4: 57 65 00 f0 xxlorc vs32,vs32,vs44 100007f8: 06 6c 20 10 vcmpequb. v1,v0,v13 100007fc: 78 00 98 40 bge cr6,10000874 <main+0x1d4> 10000800: 10 00 40 39 li r10,16 10000804: 99 56 3f 7c lxvd2x vs33,r31,r10 10000808: 99 56 1e 7c lxvd2x vs32,r30,r10 1000080c: 06 68 81 11 vcmpequb v12,v1,v13 10000810: 06 00 01 10 vcmpequb v0,v1,v0 10000814: 57 05 0c f0 xxlorc vs32,vs44,vs32 10000818: 06 6c 20 10 vcmpequb. v1,v0,v13 1000081c: 58 00 98 40 bge cr6,10000874 <main+0x1d4> 10000820: 20 00 40 39 li r10,32 10000824: 99 56 3f 7c lxvd2x vs33,r31,r10 10000828: 99 56 1e 7c lxvd2x vs32,r30,r10 1000082c: 06 68 81 11 vcmpequb v12,v1,v13 10000830: 06 00 01 10 vcmpequb v0,v1,v0 10000834: 57 05 0c f0 xxlorc vs32,vs44,vs32 10000838: 06 6c 20 10 vcmpequb. v1,v0,v13 1000083c: 38 00 98 40 bge cr6,10000874 <main+0x1d4> 10000840: 30 00 40 39 li r10,48 10000844: 99 56 3f 7c lxvd2x vs33,r31,r10 10000848: 99 56 1e 7c lxvd2x vs32,r30,r10 1000084c: 06 68 81 11 vcmpequb v12,v1,v13 10000850: 06 00 01 10 vcmpequb v0,v1,v0 10000854: 57 05 0c f0 xxlorc vs32,vs44,vs32 10000858: 06 6c a0 11 vcmpequb. v13,v0,v13 1000085c: 18 00 98 40 bge cr6,10000874 <main+0x1d4> 10000860: 40 00 9e 38 addi r4,r30,64 10000864: 40 00 7f 38 addi r3,r31,64 10000868: b9 fd ff 4b bl 10000620 <00000022.plt_call.strcmp@@GLIBC_2.17> 1000086c: 18 00 41 e8 ld r2,24(r1) 10000870: 14 ff ff 4b b 10000784 <main+0xe4> 10000874: 0c 05 00 10 vgbbd v0,v0 10000878: 6c 02 00 10 vsldoi v0,v0,v0,9 1000087c: 67 00 09 7c mfvsrd r9,vs32 10000880: ff ff 09 39 addi r8,r9,-1 10000884: 78 48 09 7d andc r9,r8,r9 10000888: f4 03 29 7d popcntd r9,r9 1000088c: 14 4a 4a 7d add r10,r10,r9 10000890: ae 50 df 7c lbzx r6,r31,r10 10000894: ae 50 7e 7c lbzx r3,r30,r10 10000898: 50 30 63 7c subf r3,r3,r6 1000089c: e8 fe ff 4b b 10000784 <main+0xe4> VERIFIED. |