Bug 1652929

Summary: Backport ppc64le str[n]cmp inlined code
Product: Red Hat Enterprise Linux 8 Reporter: Mark Wielaard <mjw>
Component: gccAssignee: Marek Polacek <mpolacek>
Status: CLOSED CURRENTRELEASE QA Contact: Michael Petlan <mpetlan>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.0CC: codonell, fweimer, jakub, law, mcermak, mpetlan, ohudlick
Target Milestone: rc   
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: gcc-8.2.1-3.4.el8 Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-13 22:56:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1532205, 1652932, 1734295    

Description Mark Wielaard 2018-11-23 15:38:18 UTC
On ppc64le gcc8 generates inlined code for str[n]cmp that valgrind memcheck
cannot proof correct. gcc9/trunk generates slightly different code for the
inlined strncmp code that valgrind memcheck can proof correct with some
patches.

The gcc patch https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01589.html needs to
be backported to solve this (and then at least glibc needs to be rebuild with
the patched compiler).

Comment 1 Mark Wielaard 2018-11-27 20:36:46 UTC
Backport gcc patch has been posted upstream now:
https://gcc.gnu.org/ml/gcc-patches/2018-11/msg02161.html
(It hasn't landed on the gcc-8-branch yet though.)

Comment 3 Mark Wielaard 2018-11-28 19:51:01 UTC
This is now on the upstream gcc-8-branch:

Author: acsawdey
Date: Wed Nov 28 19:33:04 2018
New Revision: 266578

URL: https://gcc.gnu.org/viewcvs?rev=266578&root=gcc&view=rev
Log:
2018-11-28  Aaron Sawdey  <acsawdey.com>

	Backport from mainline
	2018-10-25  Aaron Sawdey  <acsawdey.com>

	* config/rs6000/rs6000-string.c (expand_strncmp_gpr_sequence): Change to
	a shorter sequence with fewer branches.
	(emit_final_str_compare_gpr): Ditto.

	Backport from mainline to allow the above code to go in:
	2018-06-14  Aaron Sawdey  <acsawdey.com>

	* config/rs6000/rs6000-string.c (do_and3, do_and3_mask,
	do_cmpb3, do_rotl3): New functions.



Modified:
    branches/gcc-8-branch/gcc/ChangeLog
    branches/gcc-8-branch/gcc/config/rs6000/rs6000-string.c

Comment 11 Michael Petlan 2019-02-11 14:31:40 UTC
I think I have finally managed to see the difference in code generated by 8.2.1-3.3.el8 and -3.5.el8. I have tried it on expanded strcmp:

================================= OLD =================================
        int a = strcmp(s, u);
    100007d0:   00 00 00 39     li      r8,0
    100007d4:   f8 53 26 7d     cmpb    r6,r9,r10
    100007d8:   f8 43 28 7d     cmpb    r8,r9,r8
    100007dc:   38 33 06 7d     orc     r6,r8,r6
    100007e0:   74 00 c6 7c     cntlzd  r6,r6
    100007e4:   08 00 c6 38     addi    r6,r6,8
    100007e8:   30 36 29 79     rldcl   r9,r9,r6,56
    100007ec:   30 36 4a 79     rldcl   r10,r10,r6,56
    100007f0:   50 48 ca 7c     subf    r6,r10,r9
    100007f4:   84 ff ff 4b     b       10000778 <main+0xd8>
    100007f8:   08 00 3e 39     addi    r9,r30,8
    100007fc:   08 00 5f 39     addi    r10,r31,8
    10000800:   28 4c 20 7d     ldbrx   r9,0,r9
    10000804:   28 54 40 7d     ldbrx   r10,0,r10
    10000808:   51 48 ca 7c     subf.   r6,r10,r9
    1000080c:   c4 ff 82 40     bne     100007d0 <main+0x130>
    10000810:   f8 33 2a 7d     cmpb    r10,r9,r6
    10000814:   00 00 aa 2f     cmpdi   cr7,r10,0
    10000818:   60 ff 9e 40     bne     cr7,10000778 <main+0xd8>
    1000081c:   10 00 3e 39     addi    r9,r30,16
    10000820:   10 00 5f 39     addi    r10,r31,16
    10000824:   28 4c 20 7d     ldbrx   r9,0,r9
    10000828:   28 54 40 7d     ldbrx   r10,0,r10
    1000082c:   51 48 ca 7c     subf.   r6,r10,r9
    10000830:   a0 ff 82 40     bne     100007d0 <main+0x130>
    10000834:   f8 33 2a 7d     cmpb    r10,r9,r6
    10000838:   00 00 aa 2f     cmpdi   cr7,r10,0
    1000083c:   3c ff 9e 40     bne     cr7,10000778 <main+0xd8>
    10000840:   18 00 3e 39     addi    r9,r30,24
    10000844:   18 00 5f 39     addi    r10,r31,24
    10000848:   28 4c 20 7d     ldbrx   r9,0,r9
    1000084c:   28 54 40 7d     ldbrx   r10,0,r10
    10000850:   51 48 ca 7c     subf.   r6,r10,r9
    10000854:   7c ff 82 40     bne     100007d0 <main+0x130>
    10000858:   f8 33 2a 7d     cmpb    r10,r9,r6
    1000085c:   00 00 aa 2f     cmpdi   cr7,r10,0
    10000860:   18 ff 9e 40     bne     cr7,10000778 <main+0xd8>
    10000864:   20 00 3e 39     addi    r9,r30,32
    10000868:   20 00 5f 39     addi    r10,r31,32
    1000086c:   28 4c 20 7d     ldbrx   r9,0,r9
    10000870:   28 54 40 7d     ldbrx   r10,0,r10
    10000874:   51 48 ca 7c     subf.   r6,r10,r9
    10000878:   58 ff 82 40     bne     100007d0 <main+0x130>
    1000087c:   f8 33 2a 7d     cmpb    r10,r9,r6
    10000880:   00 00 aa 2f     cmpdi   cr7,r10,0
    10000884:   f4 fe 9e 40     bne     cr7,10000778 <main+0xd8>
    10000888:   28 00 3e 39     addi    r9,r30,40
    1000088c:   28 00 5f 39     addi    r10,r31,40
    10000890:   28 4c 20 7d     ldbrx   r9,0,r9
    10000894:   28 54 40 7d     ldbrx   r10,0,r10
    10000898:   51 48 ca 7c     subf.   r6,r10,r9
    1000089c:   34 ff 82 40     bne     100007d0 <main+0x130>
    100008a0:   f8 33 2a 7d     cmpb    r10,r9,r6
    100008a4:   00 00 aa 2f     cmpdi   cr7,r10,0
    100008a8:   d0 fe 9e 40     bne     cr7,10000778 <main+0xd8>
    100008ac:   30 00 3e 39     addi    r9,r30,48
    100008b0:   30 00 5f 39     addi    r10,r31,48
    100008b4:   28 4c 20 7d     ldbrx   r9,0,r9
    100008b8:   28 54 40 7d     ldbrx   r10,0,r10
    100008bc:   51 48 ca 7c     subf.   r6,r10,r9
    100008c0:   10 ff 82 40     bne     100007d0 <main+0x130>
    100008c4:   f8 33 2a 7d     cmpb    r10,r9,r6
    100008c8:   00 00 aa 2f     cmpdi   cr7,r10,0
    100008cc:   ac fe 9e 40     bne     cr7,10000778 <main+0xd8>
    100008d0:   38 00 3e 39     addi    r9,r30,56
    100008d4:   38 00 5f 39     addi    r10,r31,56
    100008d8:   28 4c 20 7d     ldbrx   r9,0,r9
    100008dc:   28 54 40 7d     ldbrx   r10,0,r10
    100008e0:   51 48 ca 7c     subf.   r6,r10,r9
    100008e4:   ec fe 82 40     bne     100007d0 <main+0x130>
    100008e8:   f8 33 2a 7d     cmpb    r10,r9,r6
    100008ec:   00 00 aa 2f     cmpdi   cr7,r10,0
    100008f0:   88 fe 9e 40     bne     cr7,10000778 <main+0xd8>
    100008f4:   40 00 9f 38     addi    r4,r31,64
    100008f8:   40 00 7e 38     addi    r3,r30,64
    100008fc:   25 fd ff 4b     bl      10000620 <00000022.plt_call.strcmp@@GLIBC_2.17>
    10000900:   18 00 41 e8     ld      r2,24(r1)
    10000904:   78 1b 66 7c     mr      r6,r3
    10000908:   70 fe ff 4b     b       10000778 <main+0xd8>

================================= NEW =================================
    100007dc:   99 fe 20 7c     lxvd2x  vs33,0,r31
    100007e0:   99 f6 00 7c     lxvd2x  vs32,0,r30
    100007e4:   8c 03 a0 11     vspltisw v13,0
    100007e8:   00 00 40 39     li      r10,0
    100007ec:   06 00 81 11     vcmpequb v12,v1,v0
    100007f0:   06 68 01 10     vcmpequb v0,v1,v13
    100007f4:   57 65 00 f0     xxlorc  vs32,vs32,vs44
    100007f8:   06 6c 20 10     vcmpequb. v1,v0,v13
    100007fc:   78 00 98 40     bge     cr6,10000874 <main+0x1d4>
    10000800:   10 00 40 39     li      r10,16
    10000804:   99 56 3f 7c     lxvd2x  vs33,r31,r10
    10000808:   99 56 1e 7c     lxvd2x  vs32,r30,r10
    1000080c:   06 68 81 11     vcmpequb v12,v1,v13
    10000810:   06 00 01 10     vcmpequb v0,v1,v0
    10000814:   57 05 0c f0     xxlorc  vs32,vs44,vs32
    10000818:   06 6c 20 10     vcmpequb. v1,v0,v13
    1000081c:   58 00 98 40     bge     cr6,10000874 <main+0x1d4>
    10000820:   20 00 40 39     li      r10,32
    10000824:   99 56 3f 7c     lxvd2x  vs33,r31,r10
    10000828:   99 56 1e 7c     lxvd2x  vs32,r30,r10
    1000082c:   06 68 81 11     vcmpequb v12,v1,v13
    10000830:   06 00 01 10     vcmpequb v0,v1,v0
    10000834:   57 05 0c f0     xxlorc  vs32,vs44,vs32
    10000838:   06 6c 20 10     vcmpequb. v1,v0,v13
    1000083c:   38 00 98 40     bge     cr6,10000874 <main+0x1d4>
    10000840:   30 00 40 39     li      r10,48
    10000844:   99 56 3f 7c     lxvd2x  vs33,r31,r10
    10000848:   99 56 1e 7c     lxvd2x  vs32,r30,r10
    1000084c:   06 68 81 11     vcmpequb v12,v1,v13
    10000850:   06 00 01 10     vcmpequb v0,v1,v0
    10000854:   57 05 0c f0     xxlorc  vs32,vs44,vs32
    10000858:   06 6c a0 11     vcmpequb. v13,v0,v13
    1000085c:   18 00 98 40     bge     cr6,10000874 <main+0x1d4>
    10000860:   40 00 9e 38     addi    r4,r30,64
    10000864:   40 00 7f 38     addi    r3,r31,64
    10000868:   b9 fd ff 4b     bl      10000620 <00000022.plt_call.strcmp@@GLIBC_2.17>
    1000086c:   18 00 41 e8     ld      r2,24(r1)
    10000870:   14 ff ff 4b     b       10000784 <main+0xe4>
    10000874:   0c 05 00 10     vgbbd   v0,v0
    10000878:   6c 02 00 10     vsldoi  v0,v0,v0,9
    1000087c:   67 00 09 7c     mfvsrd  r9,vs32
    10000880:   ff ff 09 39     addi    r8,r9,-1
    10000884:   78 48 09 7d     andc    r9,r8,r9
    10000888:   f4 03 29 7d     popcntd r9,r9
    1000088c:   14 4a 4a 7d     add     r10,r10,r9
    10000890:   ae 50 df 7c     lbzx    r6,r31,r10
    10000894:   ae 50 7e 7c     lbzx    r3,r30,r10
    10000898:   50 30 63 7c     subf    r3,r3,r6
    1000089c:   e8 fe ff 4b     b       10000784 <main+0xe4>

VERIFIED.