Bug 244575
Summary: | Problem with gcc i386 register allocation | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Søren Sandmann Pedersen <sandmann> | ||||||
Component: | gcc | Assignee: | Jakub Jelinek <jakub> | ||||||
Status: | CLOSED UPSTREAM | QA Contact: | |||||||
Severity: | low | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | rawhide | CC: | kem, vmakarov | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | i386 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2007-06-20 09:24:48 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Søren Sandmann Pedersen
2007-06-17 15:27:03 UTC
Created attachment 157224 [details]
The test case
Created attachment 157225 [details]
The generated assembly
Both gcc 4.1.x and 4.2.x behave this way, in *.lreg this is (insn:HI 60 58 61 5 (parallel [ (set (reg:SI 90) (ior:SI (mem:SI (reg/v/f:SI 63 [ src ]) [3 S4 A32]) (const_int -16777216 [0xffffffffff000000]))) (clobber (reg:CC 17 flags)) ]) 318 {*iorsi_1} (nil) (expr_list:REG_EQUIV (mem:SI (reg/v/f:SI 65 [ dst ]) [3 S4 A32]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil)))) (insn:HI 61 60 62 5 (set (mem:SI (reg/v/f:SI 65 [ dst ]) [3 S4 A32]) (reg:SI 90)) 40 {*movsi_1} (insn_list:REG_DEP_TRUE 60 (nil)) (expr_list:REG_DEAD (reg:SI 90) (expr_list:REG_EQUAL (ior:SI (mem:SI (reg/v/f:SI 63 [ src ]) [3 S4 A32]) (const_int -16777216 [0xffffffffff000000])) (nil)))) (plus src/dst bump and w decrement), but after global alloc and reload the code is terrible. Both 3.4.x and the trunk happen to assign different hard registers to src and dst and so the loop looks nicer, but I'm not sure if that isn't just a coincidence. Anyway, register allocator is a known painful spot in gcc, Vlad is working on that area, but unless the fix turns out to be very obvious the chances of backporting this to 4.1.x-RH are close to nil, it would be terribly risky change. Yeah, I wasn't really expecting any back porting. Feel free to close this bug if it isn't useful. Note though that this issue is a real problem for the cairo and X server rendering code. cairo or X can work around this: s/uint16_t w;/uint32_t w;/ while (height--) { dst = dstLine; dstLine += dstStride; src = srcLine; srcLine += srcStride; for (w = 0; w < width; w++) dst[w] = src[w] | 0xFF000000; } .L6: movl -4(%edx,%ebx,4), %eax orl $-16777216, %eax movl %eax, -4(%ecx,%ebx,4) addl $1, %ebx cmpl -24(%ebp), %ebx je .L4 jmp .L6 The question is if it is only better code on register starved i?86 (which ought to die soon), or other arches too. Tracking upstream. |