Description of problem: The location expressions produced for local variables and parameters are incorrect before the prologue completes, but there is no indication in the form of a location list. Version-Release number of selected component (if applicable): gcc-4.0.1-6 How reproducible: 100% Steps to Reproduce: 1. gcc -g -O2 -o foobar foobar.c 2. 3. Actual results: Expected results: Additional info: Trivial test case attached. This demonstrates the problem for stack locals and stack-passed arguments. In the function "foo", we get: [ d0] subprogram external name "foo" decl_file 1 decl_line 3 prototyped type [ 3b] low_pc 0x080483d4 high_pc 0x080483ee frame_base 1 byte block [ 0] reg5 [ e7] formal_parameter name "x" decl_file 1 decl_line 2 type [ 3b] location 2 byte block [ 0] breg5 8 [ f3] formal_parameter name "y" decl_file 1 decl_line 2 type [ 3b] location 1 byte block [ 0] reg0 i.e., no location lists, so each expression implicitly covers the range from low_pc to high_pc. For the prologue instructions, the expressions are incorrect. Note that for stack locations, it would suffice to use fbreg expressions and have the frame_base give a location list that uses sp instead of fp during the prologue code. This may be more compact, and perhaps easier to generate, than a location list for each variable that computes differently during the prologue code. I recall seeing similar bugginess when location lists are generated rather than single expressions. I haven't stumbled across the trivial test case for that.
Created attachment 117591 [details] test case
I think 2 changes are needed: 1) generate DW_AT_frame_base location list even for -fno-omit-frame-pointer code, if the frame pointer is not constant through the entire function. With -fomit-frame-pointer, correct DW_AT_frame_base location list is generated which covers the whole function 2) for NOTE_INSN_VAR_LOCATION's involving stack pointer or frame pointer, if they are live accross prologues or epilogues, transform them into DW_OP_fbreg rather than DW_OP_breg*. Now, based_loc_descr attempts to do that: if (reg == fp_reg && can_use_fbreg) loc_result = new_loc_descr (DW_OP_fbreg, offset, 0); so the question is just why this is not working.
Created attachment 117598 [details] gcc41-dwarf2-usefbreg.patch Seems loc_descriptor_from_tree_1 calls loc_descriptor with can_use_fbreg set to false unconditionally, which would explain 2).
For 1) I think var-tracking.c needs to be enhanced to track setting of frame pointer in the prologue if frame_pointer_needed and perhaps reuse the stack pointer tracking code to find out the ___frame_base_decl locations in the prologue and eventually in the epilogue as well.
Created attachment 117605 [details] gcc41-vartrack-fp-prologue.patch Patch I have been playing with for 1). With both patches I get following debug info for the foo routine (compiled with -m32 -O2 -g -dA on x86-64): <1><d6>: Abbrev Number: 13 (DW_TAG_subprogram) DW_AT_external : 1 DW_AT_name : foo DW_AT_decl_file : 1 DW_AT_decl_line : 3 DW_AT_prototyped : 1 DW_AT_type : <3d> DW_AT_low_pc : 0x70 DW_AT_high_pc : 0x8c DW_AT_frame_base : 0xc0 (location list) <2><ef>: Abbrev Number: 5 (DW_TAG_formal_parameter) DW_AT_name : x DW_AT_decl_file : 1 DW_AT_decl_line : 2 DW_AT_type : <3d> DW_AT_location : 2 byte block: 91 8 (DW_OP_fbreg: 8) <2><fb>: Abbrev Number: 5 (DW_TAG_formal_parameter) DW_AT_name : y DW_AT_decl_file : 1 DW_AT_decl_line : 2 DW_AT_type : <3d> DW_AT_location : 1 byte block: 50 (DW_OP_reg0) <2><106>: Abbrev Number: 10 (DW_TAG_variable) DW_AT_name : s DW_AT_decl_file : 1 DW_AT_decl_line : 4 DW_AT_type : <bf> DW_AT_location : 3 byte block: 91 80 7f (DW_OP_fbreg: -128) Contents of the .debug_loc section: ... 000000c0 00000070 00000071 (DW_OP_breg4: -4) 000000c0 00000071 00000073 (DW_OP_breg4: 0) 000000c0 00000073 0000008c (DW_OP_breg5: 0) 000000c0 <End of list> DW_OP_breg{4,5} + offset matches what is -fvar-tracking emitting for -m32 -O2 -fomit-frame-pointer code, is that correct? Couldn't e.g. DW_OP_breg{4,5}: 0 be replaced with DW_OP_reg{4,5}? Now, above DW_OP_reg0 for y sounds definitely wrong, no idea where it came up from yet, but this comes up with both stock 4.0.1-RH, HEAD and HEAD patched with these 2 patches. That must be something in dwarf2out, since e.g. with unpatched HEAD I see only 2 var location notes in foo (both at the very beginning of prologue): (note 42 1 43 0 ( x (expr_list:REG_DEP_TRUE (mem/i:SI (plus:SI (reg/f:SI 6 bp) (const_int 8 [0x8])) [3 x+0 S4 A32]) (const_int 0 [0x0]))) NOTE_INSN_VAR_LOCATION) (note 43 42 7 0 ( y (expr_list:REG_DEP_TRUE (mem/i:SI (plus:SI (reg/f:SI 6 bp) (const_int 12 [0xc])) [3 y+0 S4 A32]) (const_int 0 [0x0]))) NOTE_INSN_VAR_LOCATION) so I'd guess it ought to be DW_OP_fbreg: 12.
The problem with "y" variable above is that (unlike e.g. "x") it gets assigned a hard register for 2 instructions, so the PARM_DECL's DECL_RTL becomes (reg/v:SI 0 ax [orig:60 y ] [60]). add_location_or_const_value_attribute only uses the -fvar-tracking gathered info if it has more than one entry: /* If it truly has multiple locations, the first and last node will differ. */ if (loc_list && loc_list->first != loc_list->last) ... but in this case it has only one. I wonder if we shouldn't after: /* Try to get some constant RTL for this decl, and use that as the value of the location. */ rtl = rtl_for_decl_location (decl); if (rtl && (CONSTANT_P (rtl) || GET_CODE (rtl) == CONST_STRING)) { add_const_value_attribute (die, rtl); return; } attempt to use loc_list->first->var_loc_note if loc_list != NULL.
Created attachment 117612 [details] gcc41-dwarf2out-prefer-1elt-vartracking.patch Turns out such code is there, just loc_descriptor_from_tree is preferred over it. If we change the order as done in this patch, we get DW_OP_fbreg + 12, which is correct. Anyway, I'll stop here and let Richard look at the problem.
Patches 1 and 3 look good. I'll approve those now. As for patch 2, I agree with the goal, but the implementation will pretty much only work for i386 and similar cisc machines. In general we'd need something as complicated as the CFA tracking code, and I think it would be a mistake to replicate this. According to the dwarf3 draft, it looks like the Right Way for us to represent this is to use DW_OP_call_frame_cfa in DW_AT_frame_base. Naturally this will require changes elsewhere in the toolchain, but it seems like something that we'd have wanted to do anyway. It'll be mildly tricky to get back to the CFA from the register-eliminated rtl form that we have, but I think we can push virtual_cfa_rtx through the whole virtual-register instantiation and register-elimination path and come up with a steady-state offset from either stack_pointer_rtx or hard_frame_pointer_rtx, and subtract out the offset when emitting the frame-base offset.
Ok, I'll bootstrap/regtest patches 1 and 3 now, will you work on a patch for 2? I admit I was testing patch 2 just on i386/x86_64, but have been looking at -dV dumps also on ia64/ppc/ppc64/s390/s390x and they all have a set fp sp+const insn.
DW_OP_call_frame_cfa is an ultra-new draft addition to DWARF3. I was not aware of it before today. Probably there are no DWARF consumers anywhere that handle it. There is a lot of work on the consumers that would be required to cope with this. It is probably feasible in the gcc 4.1 timeframe since the building blocks will be needed for other purposes anyway. Right now, the consumers based on elfutils are far from ready for that; gdb already has the pieces, so supporting it there is probably not a whole lot of work (and in some months elfutils will be at that stage as well). But right now, it's a DWARF feature that does not even exist in any published spec. It probably has to be a -f or -g2 or something to enable emitting this, given that there are other consumers out there too. Conversely, everything is already prepared to cope with location lists. So if that solution is possible in the interim, that would be best for now. If in the long run we need to abandon that and use DW_OP_call_frame_cfa so as not to hamper the optimizer or complicate compiler maintenance, then that will be fine later on.
(In reply to comment #9) > ia64/ppc/ppc64/s390/s390x ... they all have a set fp sp+const insn. You only tested small frames. With larger frames you'll wind up with the addition split across 2 to 3 instructions.
(In reply to comment #10) > Conversely, everything is already prepared to cope with location lists. > So if that solution is possible in the interim, that would be best for now. Ok. I'll see if I can come up with a way to translate the frame-unwind info for the CFA into a location list.
Created attachment 117666 [details] First attempt at location list creation from cfi data This patch creates a frame_base location list from the CFI data that we've already collected. I've started testing on it, but I won't complete it before leaving for a long weekend in a dozen hours or so. I'm posting it here for folks to experiment with and tell me about problems encountered.
Created attachment 117791 [details] Fix for "First attempt at location list creation from cfi data" As noted by Roland, 117666 patch creates .debug_loc duplicates and can result in other problems. I wrote 2 alternative patches on top of that patch that result in the same output for Roland's testcase.
Created attachment 117792 [details] Alternative fix for "First attempt at location list creation from cfi data" The patch I attached a few seconds ago remembers labels and emits the loc list entries late, when were are really sure that's what we want to output. This alternative patch creates the loc list entries early and keeps updating their ->end labels.
There is another problem in var-tracking.c I think. The var-tracking.c change removes frame_base_decl setup and computing, which is surely ok. But it also removes frame_stack_adjust = -prologue_stack_adjust (); in vt_initialize, but frame_stack_adjust has not been removed and is still used in vt_stack_adjustments. I haven't studied the code in detail yet, but I'd say either we need it, in which case we shouldn't remove prologue_stack_adjust function and just re-add if (!frame_pointer_needed) frame_stack_adjust = -prologue_stack_adjust (); to vt_initialize. Or it is not needed, in which case vt_stack_adjustments and a bunch of other routines would need to go as well.
You're right. There's a lot more code that can be deleted. One could argue that we're losing functionality here, because the CFI description isn't as exact as it ought to be. But I don't see that keeping the var-tracking code is really the solution to that. I do recall that Zdenek had some big CFI rewrite, once upon a time. I don't recall what if anything was wrong with it, or if it was just such a big change that no one wanted to review it. Perhaps I'll dig that up and have a look. The thing we have to think about now is what we want to do with these patches. Given we're in stage3, we'd have to show that there are bugs being fixed in order to apply it to mainline. Or we could just keep it for gcc-4.?-rhl and apply it when 4.2 opens.
Oh, I forgot to mention -- I prefer the first of Jakub's two follow-on fixes. Plus, I did finally get finished testing vs gdb, and didn't see any new failures.
From what I can see, var-tracking.c doesn't use the stack tracking code just for frame_base (which would mean it is unnecessary now with Richard's patch), but also to adjust memory locations involving stack_pointer_rtx, see bb_stack_adjust_offset and how it calls adjust_stack_reference. If we delete the stack tracking code, either they would stay unadjusted (which would mean incorrect debug info), or we would need to find a way how to map back frame base location info (converted from gathered CFA) back to basic block's microoperations var-tracking.c uses. Once the patch is finalized, I guess we should mail it to gcc-patches in any case and probably apply to gcc-4_0-rhl-branch and later on rhl-4_1-branch as well and queue it up for 4.2.
I can see that it does that, but not why. It's not like eliminate_regs knows about complete stack variability, so why would we wind up with DECL_RTL with variability in it? This *looks* way over-engineered. I guess I'll see if I can trigger some ugly cases with values live in memory and registers over different regions, with -fomit-frame-pointer and -fno-accumulate-outgoing-args...
Ug. Of course the existing var-tracking code never bothers with DECL_RTL, merely examines the rtl stream and takes whatever's there. I appears to create its own CFA-like thing just below the dynamic extent of the stack pointer, so that all offsets are positive from that point. And since it's not using the value from DECL_RTL, and the fact that reload did take the variable stack pointer into account, that leaves us with bogus debug data -- regardless of whether the var-tracking stack offset code is removed or not. The assumption made by based_loc_descr that we receive only values relative to the stable-state frame pointer has been violated. I'll have to give this some thought.
Created attachment 117890 [details] Second try for cfa-based location list A rearrangement wherein we're sure we're dealing with canonical frame addresses and offsets. We delay register elimination until the very last moment; var-tracking rewrites addresses based on arg_pointer_rtx. AFAICS so far, this creates valid data for everything except for the epilogue. The next task is to truncate location lists referring to memory at the beginning of the epilogue.
A little bit off-topic, the following testcase on x86_64 -O2 -g -dA shows a bunch of missing DW_AT_location, where at least in some cases GCC perhaps could supply them. Not sure how hard would that be. /* { dg-do compile } */ /* { dg-options "-O2 -g -dA" } */ struct A { int a; }; int vari; /* fn1arg2 here doesn't have DW_AT_location, although it becomes alias to fn2arg2. */ static int fn1 (struct A *fn1arg1, int fn1arg2, int fn1arg3) { if (fn1arg1->a) return fn1arg2; switch (fn1arg2) { case 19: case 22: if (fn1arg3) return 23; return 22; case 20: return 23; } return fn1arg2; } unsigned int fn2 (struct A *fn2arg1, unsigned int fn2arg2, unsigned long fn2arg3, void *fn2arg4) { fn2arg2 = fn1 (fn2arg1, fn2arg2, fn2arg4 == ((void *) 0)); if (fn2arg2 == (fn2arg3 & 0xffffffff)) fn2arg2 = 64; return fn2arg2; } int fn3 (void) { extern void fn4 (int); /* varj has no DW_AT_location, although at least for a few insns it it could be said to live in vari. */ int varj = vari; fn4 (varj); return 0; } int fn5 (void) { extern int fn6 (void); int vark = fn6 (); /* varl has no DW_AT_location, although it is all the time an alias to varl. */ int varl = vark; fn3 (); return varl; }
The patch is in gcc-4.0.1-10, closing.