Bug 244575 - Problem with gcc i386 register allocation
Problem with gcc i386 register allocation
Product: Fedora
Classification: Fedora
Component: gcc (Show other bugs)
i386 Linux
low Severity low
: ---
: ---
Assigned To: Jakub Jelinek
Depends On:
  Show dependency treegraph
Reported: 2007-06-17 11:27 EDT by Søren Sandmann Pedersen
Modified: 2014-06-18 05:09 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2007-06-20 05:24:48 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
The test case (944 bytes, text/x-csrc)
2007-06-17 11:27 EDT, Søren Sandmann Pedersen
no flags Details
The generated assembly (1.17 KB, application/octet-stream)
2007-06-17 11:28 EDT, Søren Sandmann Pedersen
no flags Details

External Trackers
Tracker ID Priority Status Summary Last Updated
GNU Compiler Collection 32414 None None None Never

  None (edit)
Description Søren Sandmann Pedersen 2007-06-17 11:27:03 EDT
The C file to be attached is compiled to code where gratuitous memory
references are inserted in the inner loop even though there are plenty of
registers available. It even looks like gcc is actually allocating the
variables in registers, but then forgets it in the inner loop. 

I am attaching both the source file and the generated assembly. The relevant
part of the assembly is this:

        movl    %ebx, -16(%ebp)        <- the variables are in registers here
        movl    %ecx, -36(%ebp)
        # begin inner
        movl    -36(%ebp), %edi        <- memory ref - should use %ecx instead.
        addl    $4, -36(%ebp)          <- same
        movl    (%edi), %eax
        movl    -16(%ebp), %edi        <- memory ref - should use %ebx instead
        orl     $-16777216, %eax
        movl    %eax, (%edi)
        addl    $4, %edi
        movl    %edi, -16(%ebp)        <- same
        # end inner
        subl    $1, %edx
        cmpw    $-1, %dx
        je      .L4
        jmp     .L6

When compiled with -O3, the problem goes away except for one apparently
gratuitous memory write in the loop, but I'd think that even -O2 should get
this right.

dhcp83-218:~% rpm -q gcc

gcc commandline:
gcc -O2 -S gcc-register.c
Comment 1 Søren Sandmann Pedersen 2007-06-17 11:27:03 EDT
Created attachment 157224 [details]
The test case
Comment 2 Søren Sandmann Pedersen 2007-06-17 11:28:16 EDT
Created attachment 157225 [details]
The generated assembly
Comment 3 Jakub Jelinek 2007-06-18 12:40:43 EDT
Both gcc 4.1.x and 4.2.x behave this way, in *.lreg this is
(insn:HI 60 58 61 5 (parallel [
            (set (reg:SI 90)
                (ior:SI (mem:SI (reg/v/f:SI 63 [ src ]) [3 S4 A32])
                    (const_int -16777216 [0xffffffffff000000])))
            (clobber (reg:CC 17 flags))
        ]) 318 {*iorsi_1} (nil)
    (expr_list:REG_EQUIV (mem:SI (reg/v/f:SI 65 [ dst ]) [3 S4 A32])
        (expr_list:REG_UNUSED (reg:CC 17 flags)

(insn:HI 61 60 62 5 (set (mem:SI (reg/v/f:SI 65 [ dst ]) [3 S4 A32])
        (reg:SI 90)) 40 {*movsi_1} (insn_list:REG_DEP_TRUE 60 (nil))
    (expr_list:REG_DEAD (reg:SI 90)
        (expr_list:REG_EQUAL (ior:SI (mem:SI (reg/v/f:SI 63 [ src ]) [3 S4 A32])
                (const_int -16777216 [0xffffffffff000000]))
(plus src/dst bump and w decrement), but after global alloc and reload the
code is terrible.  Both 3.4.x and the trunk happen to assign different hard
registers to src and dst and so the loop looks nicer, but I'm not sure if
that isn't just a coincidence.  Anyway, register allocator is a known painful
spot in gcc, Vlad is working on that area, but unless the fix turns out to be
very obvious the chances of backporting this to 4.1.x-RH are close to nil, it
would be terribly risky change.
Comment 4 Søren Sandmann Pedersen 2007-06-18 16:25:35 EDT
Yeah, I wasn't really expecting any back porting. Feel free to close this bug if
it isn't useful.

Note though that this issue is a real problem for the cairo and X server
rendering code.
Comment 5 Jakub Jelinek 2007-06-18 19:48:52 EDT
cairo or X can work around this:
s/uint16_t w;/uint32_t w;/
    while (height--)
        dst = dstLine;
        dstLine += dstStride;
        src = srcLine;
        srcLine += srcStride;
        for (w = 0; w < width; w++)
          dst[w] = src[w] | 0xFF000000;
        movl    -4(%edx,%ebx,4), %eax
        orl     $-16777216, %eax
        movl    %eax, -4(%ecx,%ebx,4)
        addl    $1, %ebx
        cmpl    -24(%ebp), %ebx
        je      .L4
        jmp     .L6

The question is if it is only better code on register starved i?86 (which ought
to die soon), or other arches too.
Comment 6 Jakub Jelinek 2007-06-20 05:24:48 EDT
Tracking upstream.

Note You need to log in before you can comment on or make changes to this bug.