Created attachment 480576 [details] gcc 4.5.1 -S output of st_mesa_to_tgsi.c Description of problem: Building mesa package with gcc 4.6.0 and option -fno-omit-frame-pointer lead to bogus code, mesa will always segfault at the same place because of bogus arguments. bt below. Same package build with same option with gcc 4.5.1 (f14) works properly. Program received signal SIGSEGV, Segmentation fault. src_register (t=0xbfffdb60, file=PROGRAM_OUTPUT, index=-2046) at state_tracker/st_mesa_to_tgsi.c:243 243 return ureg_src(t->outputs[t->outputMapping[index]]); /* not needed? */ (gdb) bt #0 src_register (t=0xbfffdb60, file=PROGRAM_OUTPUT, index=-2046) at state_tracker/st_mesa_to_tgsi.c:243 #1 0x0039f514 in translate_src (t=0xbfffdb60, SrcReg=0x83eaa94) at state_tracker/st_mesa_to_tgsi.c:316 #2 0x003a108c in compile_instruction (inst=0x83eaa78, t=0xbfffdb60) at state_tracker/st_mesa_to_tgsi.c:664 #3 st_translate_mesa_program (ctx=0x8374ee0, procType=0, ureg=0x83c1908, program=0x83b7f50, numInputs=1, inputMapping=0xbffff068, inputSemanticName=0xbffff10c "\001\261W", inputSemanticIndex=0xbffff12c "", interpMode=0xbfffefe8, numOutputs=1, outputMapping=0xbffff0e0, outputSemanticName=0xbffff14c "\001\361\377\277X\254:\b\350\354\071\b", outputSemanticIndex=0xbffff16c "", passthrough_edgeflags=0 '\000') at state_tracker/st_mesa_to_tgsi.c:1155 Attached output of : gcc -c -S -o state_tracker/st_mesa_to_tgsi.S state_tracker/st_mesa_to_tgsi.c -DFEATURE_GL=1 -DFEATURE_ES1=1 -DFEATURE_ES2=1 -D_GNU_SOURCE -DPTHREADS -DHAVE_POSIX_MEMALIGN -DMESA_SELINUX -DUSE_EXTERNAL_DXTN_LIB=1 -DIN_DRI_DRIVER -DGLX_DIRECT_RENDERING -DGLX_INDIRECT_RENDERING -DHAVE_ALIAS -DHAVE_XCB_DRI2 -DHAVE_LIBUDEV -I../../include -I../../src/glsl -I../../src/mesa -I../../src/mapi -I../../src/gallium/include -I../../src/gallium/auxiliary -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 -march=i686 -mtune=atom -fasynchronous-unwind-tables -fno-omit-frame-pointer -Wall -Wmissing-prototypes -std=c99 -ffast-math -fvisibility=hidden -fno-strict-aliasing -fPIC with gcc 4.5.1 (working one) & 4.6.0 (problematic one) as well the original source from f15 mesa package.
Created attachment 480577 [details] gcc 4.6.0 -S output of st_mesa_to_tgsi.c
Created attachment 480578 [details] Original source file
Created attachment 480582 [details] gcc460-without-fno-omit-frame-pointer
This isn't sufficient, you haven't said how to reproduce the segfault. Better yet would be if you could try playing with options, see if say mesa works when built with gcc 4.6 without -fomit-frame-pointer switch, or say with -O0. If with -O0 it works, do a binary search to find out problematic source file (or if you suspect st_mesa_to_tgsi.c just try to rebuild that file with -O0 and vice versa, all files with -O0 and this file with -O2). Try -fno-strict-aliasing, perhaps the source is invalid. If you narrow it down to one particular source file, you can try also -fno-inline to see if the problem at -O2 -fno-omit-frame-pointer still remains without inlining, in that case you could do a binary search with __attribute__((optimize(0))) marked routines to narrow down to a problematic routine. Often such problems turn out to be just buggy app source codes, so if I don't have to debug all packages in the distro, I'll have more time for actually fixing gcc bugs. Also for the segfault it would be nice to know from gdb *t and whether t->outputMapping is really meant to be indexed by negative indexes.
When built without -fno-omit-frame-pointer & gcc 4.6.0 there is no segfault. I will try to iron out what's going wrong.
And index shouldn't be negative, it's invalid AFAIK
Rebuilding just st_mesa_to_tgsi.c without -fno-omit-frame-pointer avoid the issue Rebuilding just st_mesa_to_tgsi.c with -fno-omit-frame-pointer & -fno-inline avoid the issue I isolated the issue done to one call site of ureg_swizzle inline function (patch attached) with that patch mesa works fine with -fno-omit-frame-pointer Thus issue is when gcc inline ureg_swizzle inside the translate_src function To reproduce build mesa from f15 package with gcc 4.6 and use either supported AMD gpu (r300/r400/r600/r700/evergreen or newer aka from 9600 up to HD5xxx) or supported NVidia gpu (nv50 aka 8xxx/9xxx or nvc0 aka fermi aka gt4xx iirc) glxgears will segfault, all gl applications i tested gave the same bt and segfault in same place, they all works with the patch to avoid ureg_swizzle inlining inside translate_src Let me know if you need any more informations, my asm reading is rusty and i can't make much sense of what is going wrong.
Created attachment 480592 [details] Patch to st_mesa_to_tgsi.c that work around the inlining issue of gcc 4.6.0
Created attachment 480617 [details] post cpp source file since I know how much compiler people like preprocessed output, here is the file for you to play around with.
Thanks for the preprocessed source. I've eyeballed the assembly, but haven't spotted anything obvious. As I don't have a spare box with such card around right now, I'd appreciate some help into a self-contained runtime testcase. From what I see, you say that the problem is fixed by tweaking whether ureg_swizzle is inlined into translate_src or not, therefore I'd guess the problem is in translate_src. With the above mentioned gcc options, I see everything is inlined into translate_src, except for src_register. So, first step towards that would be add __attribute__((noinline, regparm(2))) on src_register and also translate_src, make translate_src no longer static - regparm 2 is apparently the calling convention gcc decides to use for the static functions in both cases (%eax is address of return struct and %edx is the first argument, the second (and third argument for src_register) go to stack. Then in a debugger see on which call to translate_src it crashes and perhaps run side by side gdb sessions on -fno-omit-frame-pointer and -fomit-frame-pointer code to see which arguments is translate_src called with and what it calls src_register with. In the easier case all that matters are input arguments of translate_src and what it calls src_register with (e.g. if the crash happens already on the first translate_src call or if it is independent on earlier translate_src calls). The runtime testcase would then be the preprocessed source, modified to only contain translate_src, src_register and inlines used in those and types, plus a main which sets up arguments to translate_src based on what you saw in gdb on the translate_src call which eventually crashed, and perhaps src_register stubbed just that it checks for the bad parameters. If I know enough details from your gdb session I might try to write it myself. translate_src (including everything that is inlined into it) seems to access just t->procType and t->Index and say assume whole *srcReg. So main could look like: int main () { struct st_translate t = { .procType = A, .Index = B }; struct prog_src_register p = { C, D, E, F, G, H, I, J, K }; struct ureg_src u; u = translate_src (&t, &p); return 0; } for whatever A-K values you see from the debugger.
Oh, and perhaps try __attribute__((noinline, regparm (2), optimize (0))) on src_register early on to confirm the issue is in translate_src and not src_register.
Created attachment 480843 [details] Self contained test case Here is bt i get with self contained test case, no compiling with -fno-omit-frame-pointer or with -O0 doesn't fix the issue, will it does with the full mesa build. So it takes the wrong case in the switc of src_register Program received signal SIGSEGV, Segmentation fault. ureg_src_undef () at rh679924.c:158 158 return src; (gdb) bt #0 ureg_src_undef () at rh679924.c:158 #1 src_register (t=0xbfffe324, file=PROGRAM_TEMPORARY, index=1091) at rh679924.c:290 #2 0x08048760 in translate_src (t=0xbfffe324, SrcReg=0xbffff548) at rh679924.c:333 #3 0x0804838a in main () at rh679924.c:394 Mesa bt i got today, dunno why it's different from yesterday but it's same issue as self contained case. Here if i don't build with -fno-omit-frame-pointer or if i remove static to translate_src it fix the issue. Program received signal SIGSEGV, Segmentation fault. ureg_src_undef () at ../../src/gallium/auxiliary/tgsi/tgsi_ureg.h:996 996 return src; (gdb) bt #0 ureg_src_undef () at ../../src/gallium/auxiliary/tgsi/tgsi_ureg.h:996 #1 src_register (t=0xbfffdb60, file=PROGRAM_TEMPORARY, index=1091) at state_tracker/st_mesa_to_tgsi.c:237 #2 0x003a4454 in translate_src (t=0xbfffdb60, SrcReg=0x83dfe04) at state_tracker/st_mesa_to_tgsi.c:338 #3 0x003a65ac in compile_instruction (inst=0x83dfd28, t=0xbfffdb60) at state_tracker/st_mesa_to_tgsi.c:686 #4 st_translate_mesa_program (ctx=0x836a0a8, procType=0, ureg=0x83b6b88, program=0x83ad1d0, numInputs=1, inputMapping=0xbffff068, inputSemanticName=0xbffff10c "\001\361U", inputSemanticIndex=0xbffff12c "", interpMode=0xbfffefe8, numOutputs=1, outputMapping=0xbffff0e0, outputSemanticName=0xbffff14c "\001\361\377\277\270>9\b\030\023:\b", outputSemanticIndex=0xbffff16c "", passthrough_edgeflags=0 '\000') at state_tracker/st_mesa_to_tgsi.c:1177 #5 0x003abd02 in st_translate_fragment_program (st=0x83a8370, stfp=0x83ad1d0) at state_tracker/st_program.c:445 #6 0x003946d2 in translate_fp (stfp=0x83ad1d0, st=0x83a8370) at state_tracker/st_atom_shader.c:64 #7 update_fp (st=0x83a8370) at state_tracker/st_atom_shader.c:175 #8 0x00391c12 in st_validate_state (st=0x83a8370) at state_tracker/st_atom.c:172 #9 0x00398899 in st_Clear (ctx=0x836a0a8, mask=18) at state_tracker/st_cb_clear.c:464 #10 0x0035c002 in _mesa_Clear (mask=<optimized out>) at main/clear.c:241 #11 0x001c600b in glClear (mask=16640) at ../../../src/mapi/glapi/glapitemp.h:1102 #12 0x0804a099 in ?? () #13 0x08049852 in ?? () #14 0x47372413 in __libc_start_main (main=0x8049110, argc=1, ubp_av=0xbffff644, init=0x804afb0, fini=0x804b020, rtld_fini=0x47345900 <_dl_fini>, stack_end=0xbffff63c) at libc-start.c:226 #15 0x08049ead in ?? ()
So bug is that num_src in compile_instruction get overwritten with ridiculously big value and that after at one point in the loop &inst->SrcReg[i] point to data that doesn't make sense and lead to segfault. translate_src is corrupting the stack or at least the num_src value of compile_instruction Again if translate_src is not static or if build without -fno-omit-frame-pointer, or build with the patch previously attached here, things are fine.
Corruption happen when returning src from translate_src
Created attachment 480866 [details] stripped down translate_src that stills exhibits the bug I am failing at creating a standalone reproducer here is a stripped version of translate_src when build with optimization of for translate_src things works (modulo rendering bug because we don't properly translate shader) but when keeping translate_src as static function with -O2 -fno-omit-frame-pointer num_src of compile_instruction is corrupted here is stack trace : Breakpoint 1, translate_src (t=0xbfffd6e0, SrcReg=0x83dce14) at state_tracker/st_mesa_to_tgsi.c:345 345 { (gdb) frame 1 #1 0x003dcecc in compile_instruction (inst=0x83dce10, t=0xbfffd6e0) at state_tracker/st_mesa_to_tgsi.c:702 702 src[i] = translate_src( t, &inst->SrcReg[i] ); (gdb) p num_src $2 = 1 (gdb) watch num_src Hardware watchpoint 2: num_src (gdb) c Continuing. Hardware watchpoint 2: num_src Old value = 1 New value = 3145728 0x003dc467 in translate_src (t=<optimized out>, SrcReg=0x83dce14) at state_tracker/st_mesa_to_tgsi.c:390 390 return src; (gdb) bt #0 0x003dc467 in translate_src (t=<optimized out>, SrcReg=0x83dce14) at state_tracker/st_mesa_to_tgsi.c:390 #1 0x003dcecc in compile_instruction (inst=0x83dce10, t=0xbfffd6e0) at state_tracker/st_mesa_to_tgsi.c:702 #2 st_translate_mesa_program (ctx=0x83684a0, procType=0, ureg=0x83b34a0, program=0x83aad80, numInputs=1, inputMapping=0xbfffebe8, inputSemanticName=0xbfffec8c "\001QY", inputSemanticIndex=0xbfffecac "", interpMode=0xbfffeb68, numOutputs=1, outputMapping=0xbfffec60, outputSemanticName=0xbfffeccc "\001\354\377\277\210\361\071\b\210\"9\b", outputSemanticIndex=0xbfffecec "", passthrough_edgeflags=0 '\000') at state_tracker/st_mesa_to_tgsi.c:1193 #3 0x003e2622 in st_translate_fragment_program (st=0x83a6740, stfp=0x83aad80) at state_tracker/st_program.c:445 #4 0x003cc6d2 in translate_fp (stfp=0x83aad80, st=0x83a6740) at state_tracker/st_atom_shader.c:64 #5 update_fp (st=0x83a6740) at state_tracker/st_atom_shader.c:175 #6 0x003c9c12 in st_validate_state (st=0x83a6740) at state_tracker/st_atom.c:172 #7 0x003d0899 in st_Clear (ctx=0x83684a0, mask=1) at state_tracker/st_cb_clear.c:464 #8 0x00394002 in _mesa_Clear (mask=<optimized out>) at main/clear.c:241 #9 0x0015600b in glClear (mask=16384) at ../../../src/mapi/glapi/glapitemp.h:1102 #10 0x08048c7f in Draw () at tri-flat.c:73 #11 0x00215c7e in processWindowWorkList (window=<optimized out>) at glut_event.c:1307 #12 0x00216dd1 in __glutProcessWindowWorkLists () at glut_event.c:1358 #13 glutMainLoop () at glut_event.c:1379 #14 0x08048b6f in main (argc=1, argv=0xbffff1b4) at tri-flat.c:136
Created attachment 480867 [details] Asm of buggy build of previous striped down translate_src
Created attachment 480868 [details] Asm of working build of previous striped down translate_src (O0 for translate_src)
Ok, good to know that. Now, could you find out which of the translate_src calls it was that overwrote num_src? Is it in the for (i = 0; i < num_src; i++) src[i] = translate_src( t, &inst->SrcReg[i] ); loop? I guess if you put a breakpoint before this loop (but after num_src has been written), then p &num_src and if it lives on the stack, add a watchpoint: watch *(int *)$N (where N was the last $N gdb printed for p &num_src), add another breakpoint somewhere after this loop and on that second breakpoint always delete the watchpoint and on the first one reinsert it? I'll inspect the copying of the return value tomorrow if there is no overrun there.
Created attachment 480870 [details] Preprocessed output of modified st_mesa_to_tgsi.c
Ah, you've just done most of that. Can you disas 0x003dc467-32,0x003dc467+32 so that I know how to map that address back to assembly?
Created attachment 480873 [details] gdb session when watching num_src on stack It happen in the first call of translate_src which is call from for (i = 0; i < num_src; i++) src[i] = translate_src( t, &inst->SrcReg[i] ); in compile_instruction
0x003dc467 in translate_src (t=<optimized out>, SrcReg=0x83dce14) at state_tracker/st_mesa_to_tgsi.c:390 390 return src; (gdb) Continuing. Breakpoint 1, translate_src (t=0xbfffd6e0, SrcReg=0x83dce1c) at state_tracker/st_mesa_to_tgsi.c:345 345 { (gdb) disas 0x003dc467-32,0x003dc467+32 Dump of assembler code from 0x3dc447 to 0x3dc487: 0x003dc447 <translate_src+39>: std 0x003dc448 <translate_src+40>: (bad) 0x003dc449 <translate_src+41>: decl 0x458be855(%ebx) 0x003dc44f <translate_src+47>: in $0x89,%al 0x003dc451 <translate_src+49>: push %esi 0x003dc452 <translate_src+50>: add $0x8b,%al 0x003dc454 <translate_src+52>: push %ebp 0x003dc455 <translate_src+53>: in (%dx),%al 0x003dc456 <translate_src+54>: mov %edx,0x8(%esi) 0x003dc459 <translate_src+57>: mov -0x10(%ebp),%edx 0x003dc45c <translate_src+60>: mov %edx,0xc(%esi) 0x003dc45f <translate_src+63>: mov -0xc(%ebp),%edx 0x003dc462 <translate_src+66>: mov %eax,(%esi) 0x003dc464 <translate_src+68>: mov %edx,0x10(%esi) 0x003dc467 <translate_src+71>: mov %eax,%edx 0x003dc469 <translate_src+73>: shr $0x8,%ax 0x003dc46d <translate_src+77>: and $0xf,%edx 0x003dc470 <translate_src+80>: and $0xfffffffc,%eax 0x003dc473 <translate_src+83>: or $0x40,%edx 0x003dc476 <translate_src+86>: or $0xe,%eax 0x003dc479 <translate_src+89>: mov %dl,(%esi) 0x003dc47b <translate_src+91>: mov %al,0x1(%esi) 0x003dc47e <translate_src+94>: lea 0x34(%esp),%esp 0x003dc482 <translate_src+98>: mov %esi,%eax 0x003dc484 <translate_src+100>: pop %esi 0x003dc485 <translate_src+101>: pop %ebp 0x003dc486 <translate_src+102>: ret End of assembler dump. (gdb)
So far this to me smells like a bug in the caller of translate_src. sizeof (struct ureg_src) == 20 (note, if the layout of this structure isn't part of mesa ABI nor mandated by hw, I'd once this is resolved strongly advise to reorder the fields for much better packing, e.g. by putting all the 4 16 bit bitfields last, the size would be 12 bytes instead of 20), %esi above is a copy of %eax from the beginning of the function which holds address of caller reserved return value slot, which for the 20 byte struct should be 20 byte large. But apparently in the caller the return slot lives at offset %ebp-5080, while num_src lives at offset %ebp-5064, i.e. there are just 16 bytes reserved for that slot.
I believe you're right, adding temporary to compile_instruction avoid the num_src corruption. Also working build (either not static translate_src or without fno-omit-frame-pointer) have the num_src & src stored way appart. Let me know if you want me to test a gcc patch.
Tracking this upstream (http://gcc.gnu.org/PR47893 ). For structures that aren't part of public Mesa ABI and aren't used for mmapped HW IO, I'd strongly suggest playing around with pahole from dwarves packages. You could certainly decrease size of many structures that way, this problem would surely go away for you too and code which abuses returning of aggregates and passing them by value so often could be certainly significantly improved.
Why bug closed when update with fix not released?
*** Bug 677842 has been marked as a duplicate of this bug. ***
re-opening as this is an f15 alpha nth issue, we need to track downstream progress. jakub, is there a chance we can get a build of gcc with the proposed fix?
transferring flags from dupe
Today? Very unlikely, after I bootstrap/regtest the patch (currently ongoing) it needs to be reviewed upstream, then several hours of building gcc, then several days until it actually hits the buildroots, then mesa would need to be rebuilt. If you want workaround, build that source file in mesa with -fomit-frame-pointer, or -fstrict-aliasing instead of -fno-strict-aliasing, or with -O0 instead of -O2, or fix the ordering of the fields in ureg_src struct so that it doesn't occupy 20 bytes but just 12. All of those would fix this on the mesa side, and the last one is IMNSHO very desirable no matter what.
Thanks. Jerome / Ajax, can we make that change to mesa, or is it too much?
Package mesa-7.10-0.27.fc15: * should fix your issue, * was pushed to the Fedora 15 updates-testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing mesa-7.10-0.27.fc15' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/mesa-7.10-0.27.fc15 then log in and leave karma (feedback).
mesa-7.10-0.27.fc15 has been pushed to the Fedora 15 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update mesa'. You can provide feedback for this update here: https://admin.fedoraproject.org/updates/mesa-7.10-0.27.fc15
mesa-7.10-0.27.fc15 has been pushed to the Fedora 15 stable repository. If problems still persist, please make note of it in this bug report.
*** Bug 679404 has been marked as a duplicate of this bug. ***
*** Bug 679528 has been marked as a duplicate of this bug. ***