Bug 743343 - gawk + byacc = YACC Stack Overflow regression
gawk + byacc = YACC Stack Overflow regression
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: byacc (Show other bugs)
Unspecified Unspecified
high Severity high
: rc
: ---
Assigned To: Petr Machata
: Reopened
Depends On:
Blocks: 743242
  Show dependency treegraph
Reported: 2011-10-04 12:19 EDT by Vojtech Vitek
Modified: 2015-05-04 21:36 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 743242
Last Closed: 2012-06-20 07:57:10 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Le fix (523 bytes, patch)
2011-10-05 09:48 EDT, Petr Machata
no flags Details | Diff

  None (edit)
Comment 2 RHEL Product and Program Management 2011-10-04 12:48:48 EDT
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
Comment 3 Vojtech Vitek 2011-10-04 20:34:30 EDT
Version-Release number of selected component:
Comment 4 Petr Machata 2011-10-05 09:26:00 EDT
The problem arises because of deep structure of else-ifs.  The grammar rule is as follows:

if_statement : /* ... */
	| LEX_IF '(' exp r_paren opt_nls statement
	     LEX_ELSE opt_nls statement /* ... */

This is right-recursive, and costs stack space.  Maximum stack size can be adjusted by putting the following line into awkgram.y:

#define YYMAXDEPTH 5000 /* default is 500 in yacc */

But since right recursion spends stack, there will always be a hard limit on number of else-ifs that you can parse.  That's an inherent shortcoming of the LR algorithm.

If that ever worked, it means that either the rule used left recursion back then (in principle it should be possible to rewrite the rule using left recursion, but in practice one may need to restructure the grammar to resolve lookahead conflicts); or it was compiled against bison, which sets much larger maximum stack depth, 10000.
Comment 5 Petr Machata 2011-10-05 09:32:34 EDT
Oh, but yacc actually used to set YYMAXDEPHT to 10K _too_.  Interesting, I'm looking into why would they decrease the value.
Comment 6 Konrad J Hambrick 2011-10-05 09:40:21 EDT
Petr --

Did RH change the build to link against byacc instead of bison on 2010-06-25 ?

From gawk.spec:

* Fri Jun 25 2010 Jan Zeleny <jzeleny@redhat.com> - 3.1.7-5
- fixed two syntax issues (#607915, #607916), added byacc to BuildRequires

Am I reading the CHANGELOG from the SPEC file correctly ?

gawk defaults to link bison by default.  From gawk-3.1.7 configure --help:

Some influential environment variables:
  CC          C compiler command
  CFLAGS      C compiler flags
  LDFLAGS     linker flags, e.g. -L<lib dir> if you have libraries in a
              nonstandard directory <lib dir>
  LIBS        libraries to pass to the linker, e.g. -l<library>
  CPPFLAGS    (Objective) C/C++ preprocessor flags, e.g. -I<include dir> if
              you have headers in a nonstandard directory <include dir>
  CPP         C preprocessor
  YACC        The `Yet Another C Compiler' implementation to use. Defaults to
              the first program found out of: `bison -y', `byacc', `yacc'.
  YFLAGS      The list of arguments that will be passed by default to $YACC.
              This script will default YFLAGS to the empty string to avoid a
              default value of `-d' given by some make applications.

Again, the same machine-generated ProcessParms.awk script has been running
without issue on RH machines all the way back to RedHat 4.2 ...


-- kjh
Comment 7 Petr Machata 2011-10-05 09:48:16 EDT
Created attachment 526494 [details]
Le fix

Can't find anything relevant in the documentation or otherwise.  The upstream, naturally, doesn't use a VCS, so we are out of luck here, too.  Dunno.  I'll just bump it back to 10K, it's a maximum value, yacc still allocates it dynamically.

Note that the above about the recursion still holds.  Even for max of 10K, you can still construct a source file that overflows.
Comment 8 Petr Machata 2011-10-05 09:53:41 EDT
(In reply to comment #6)
> Did RH change the build to link against byacc instead of bison on 2010-06-25 ?
> From gawk.spec:
> * Fri Jun 25 2010 Jan Zeleny <jzeleny@redhat.com> - 3.1.7-5
> - fixed two syntax issues (#607915, #607916), added byacc to BuildRequires

Maybe.  It's hard to be sure, this might simply be a formalization of already existing rule.  The build roots contain a lot of packages by default, and sometimes you just don't need to spell out all the dependencies.  For example, nobody lists gcc as a BuildRequires, because it's simply always there, by default.  This might be one of these things.  Jan Zeleny would know better, I'll have him chime in.

But yacc upstream really changed the max from 10K to 500 for some reason, and this actually is a change that, other things being equal, would result in awk working before and failing now on that test file.
Comment 9 Petr Machata 2011-10-05 10:05:27 EDT
Actually, you are right!  We used bison for builds in RHEL 5 throughout.  Then in RHEL 6, first several builds are with neither bison nor byacc, which means it would presumably use the pre-distributed parser, so the resulting binary would contain whatever limit corresponds to the tool that upstream used to generate that parser.  In gawk-3.1.7-5.el6 we started pulling in byacc instead, which has a limit of 500.  That's the change documented in the spec file.
Comment 10 Konrad J Hambrick 2011-10-05 10:18:48 EDT
Petr --

I agree ... I've been lucky up to now with the number of  } else if {

I've changed my awk script-generator to replace the insanely long series of 

   else if ()

constructs with 

   if ()
      ... ; 
      continue ; 
   if () 
      ... ; 
      continue ; 

This approach works for ProcessParms.awk ( my original attachment ) but now some of my 'main' scripts hit the same wall :(

-- kjh
Comment 11 Konrad J Hambrick 2011-10-06 07:33:10 EDT
Yikes YACC !

From 10,000 to 500 on the stack depth.

I suppose I am the only RHEL 6 user to misuse gawk so badly :)

-- kjh
Comment 17 errata-xmlrpc 2012-06-20 07:57:10 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

Comment 18 Thomas E. Dickey 2013-09-25 04:22:36 EDT
Note on comment #8: byacc did not "change" the limit; probably what
is referred to is a patch by Red Hat which was overlooked when switching
to the version which I maintain.  Ironically, the answer to that would
be in Red Hat's vcs, though I made a to-do item to check for errata of
this type (see also http://invisible-island.net/personal/cm-tools.html).

Note You need to log in before you can comment on or make changes to this bug.