Bug 743343

Summary: gawk + byacc = YACC Stack Overflow regression
Product: Red Hat Enterprise Linux 6 Reporter: Vojtech Vitek <vvitek>
Component: byaccAssignee: Petr Machata <pmachata>
Status: CLOSED ERRATA QA Contact: qe-baseos-tools-bugs
Severity: high Docs Contact:
Priority: high    
Version: 6.1CC: dickey, fche, hripps, konrad, mnewsome, mnowak, mpolacek, vvitek
Target Milestone: rcKeywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 743242 Environment:
Last Closed: 2012-06-20 11:57:10 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 743242    
Attachments:
Description Flags
Le fix none

Comment 2 RHEL Program Management 2011-10-04 16:48:48 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 3 Vojtech Vitek 2011-10-05 00:34:30 UTC
Version-Release number of selected component:
byacc-1.9.20070509-6.1.el6

Comment 4 Petr Machata 2011-10-05 13:26:00 UTC
The problem arises because of deep structure of else-ifs.  The grammar rule is as follows:

if_statement : /* ... */
	| LEX_IF '(' exp r_paren opt_nls statement
	     LEX_ELSE opt_nls statement /* ... */

This is right-recursive, and costs stack space.  Maximum stack size can be adjusted by putting the following line into awkgram.y:

#define YYMAXDEPTH 5000 /* default is 500 in yacc */

But since right recursion spends stack, there will always be a hard limit on number of else-ifs that you can parse.  That's an inherent shortcoming of the LR algorithm.

If that ever worked, it means that either the rule used left recursion back then (in principle it should be possible to rewrite the rule using left recursion, but in practice one may need to restructure the grammar to resolve lookahead conflicts); or it was compiled against bison, which sets much larger maximum stack depth, 10000.

Comment 5 Petr Machata 2011-10-05 13:32:34 UTC
Oh, but yacc actually used to set YYMAXDEPHT to 10K _too_.  Interesting, I'm looking into why would they decrease the value.

Comment 6 Konrad J Hambrick 2011-10-05 13:40:21 UTC
Petr --

Did RH change the build to link against byacc instead of bison on 2010-06-25 ?

From gawk.spec:

* Fri Jun 25 2010 Jan Zeleny <jzeleny> - 3.1.7-5
- fixed two syntax issues (#607915, #607916), added byacc to BuildRequires

Am I reading the CHANGELOG from the SPEC file correctly ?

gawk defaults to link bison by default.  From gawk-3.1.7 configure --help:

Some influential environment variables:
  CC          C compiler command
  CFLAGS      C compiler flags
  LDFLAGS     linker flags, e.g. -L<lib dir> if you have libraries in a
              nonstandard directory <lib dir>
  LIBS        libraries to pass to the linker, e.g. -l<library>
  CPPFLAGS    (Objective) C/C++ preprocessor flags, e.g. -I<include dir> if
              you have headers in a nonstandard directory <include dir>
  CPP         C preprocessor
  YACC        The `Yet Another C Compiler' implementation to use. Defaults to
              the first program found out of: `bison -y', `byacc', `yacc'.
  YFLAGS      The list of arguments that will be passed by default to $YACC.
              This script will default YFLAGS to the empty string to avoid a
              default value of `-d' given by some make applications.

Again, the same machine-generated ProcessParms.awk script has been running
without issue on RH machines all the way back to RedHat 4.2 ...

Thanks.

-- kjh

Comment 7 Petr Machata 2011-10-05 13:48:16 UTC
Created attachment 526494 [details]
Le fix

Can't find anything relevant in the documentation or otherwise.  The upstream, naturally, doesn't use a VCS, so we are out of luck here, too.  Dunno.  I'll just bump it back to 10K, it's a maximum value, yacc still allocates it dynamically.

Note that the above about the recursion still holds.  Even for max of 10K, you can still construct a source file that overflows.

Comment 8 Petr Machata 2011-10-05 13:53:41 UTC
(In reply to comment #6)
> Did RH change the build to link against byacc instead of bison on 2010-06-25 ?
> 
> From gawk.spec:
> 
> * Fri Jun 25 2010 Jan Zeleny <jzeleny> - 3.1.7-5
> - fixed two syntax issues (#607915, #607916), added byacc to BuildRequires

Maybe.  It's hard to be sure, this might simply be a formalization of already existing rule.  The build roots contain a lot of packages by default, and sometimes you just don't need to spell out all the dependencies.  For example, nobody lists gcc as a BuildRequires, because it's simply always there, by default.  This might be one of these things.  Jan Zeleny would know better, I'll have him chime in.

But yacc upstream really changed the max from 10K to 500 for some reason, and this actually is a change that, other things being equal, would result in awk working before and failing now on that test file.

Comment 9 Petr Machata 2011-10-05 14:05:27 UTC
Actually, you are right!  We used bison for builds in RHEL 5 throughout.  Then in RHEL 6, first several builds are with neither bison nor byacc, which means it would presumably use the pre-distributed parser, so the resulting binary would contain whatever limit corresponds to the tool that upstream used to generate that parser.  In gawk-3.1.7-5.el6 we started pulling in byacc instead, which has a limit of 500.  That's the change documented in the spec file.

Comment 10 Konrad J Hambrick 2011-10-05 14:18:48 UTC
Petr --

I agree ... I've been lucky up to now with the number of  } else if {
constructs.

I've changed my awk script-generator to replace the insanely long series of 

   if()
   { 
      ... 
   }
   else if ()
   { 
      ... 
   } 
   ...

constructs with 

   if ()
   { 
      ... ; 
      continue ; 
   } 
   if () 
   { 
      ... ; 
      continue ; 
   } 
   ...

This approach works for ProcessParms.awk ( my original attachment ) but now some of my 'main' scripts hit the same wall :(

-- kjh

Comment 11 Konrad J Hambrick 2011-10-06 11:33:10 UTC
Yikes YACC !

From 10,000 to 500 on the stack depth.

I suppose I am the only RHEL 6 user to misuse gawk so badly :)

-- kjh

Comment 17 errata-xmlrpc 2012-06-20 11:57:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0749.html

Comment 18 Thomas E. Dickey 2013-09-25 08:22:36 UTC
Note on comment #8: byacc did not "change" the limit; probably what
is referred to is a patch by Red Hat which was overlooked when switching
to the version which I maintain.  Ironically, the answer to that would
be in Red Hat's vcs, though I made a to-do item to check for errata of
this type (see also http://invisible-island.net/personal/cm-tools.html).