69254 – long regexp match causes perl coredump

Bug 69254 - long regexp match causes perl coredump

Summary: long regexp match causes perl coredump

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	perl
Sub Component:
Version:	rawhide
Hardware:	i386
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	Warren Togami
QA Contact:	David Lawrence
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	67218 79579 CambridgeTarget
TreeView+	depends on / blocked

Reported:	2002-07-19 17:43 UTC by Jonathan Kamens
Modified:	2007-11-30 22:10 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2005-09-12 04:33:43 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
perl strace output (12.58 KB, text/plain) 2003-01-15 14:56 UTC, Jay Turner	no flags	Details
View All

Description Jonathan Kamens 2002-07-19 17:43:27 UTC

With perl-5.8.0-42, this perl script coredumps:


#!/usr/bin/perl

$lines = 906;

$_ = "foo\n" x $lines . "\n\n*** EOOH ***\n" . "foo\n" x $lines;

/^((?:.+\n)*)\n*\*\*\* EOOH \*\*\*\n((?:.+\n)*)\n+/;

If this is related to a stack overflow or something like that, you may have to
increase the setting of $lines on your system to see the coredump.

Decreasing $lines makes the coredump go away, thus confirming that this is some
sort of memory overrun.

Comment 1 Chip Turner 2002-08-21 17:13:23 UTC

I can't reproduce this with perl-5.8.0-48, even with $lines = 65536.  can you
confirm whether it still exists or not?

Comment 2 Jonathan Kamens 2002-08-22 02:45:27 UTC

It still happens for me with perl-5.8.0-49.

Comment 3 Tim Waugh 2002-08-28 16:01:42 UTC

I can't reproduce this problem either (with perl-5.8.0-50).  What locale are 
you using, and what does the stack trace look like when it crashes?

Comment 4 Jonathan Kamens 2002-08-28 16:05:59 UTC

It's still happening for me with -50.  I have LC_COLLATE set to C and LANG set
to en_US.  No other locale environment variables are set.  It looks like the
stack is trashed in the core dump:

(gdb) where
#0  0x400e6508 in S_regmatch ()
   from /usr/lib/perl5/5.8.0/i386-linux-thread-multi/CORE/libperl.so
Cannot access memory at address 0xbfe00fd0

Comment 5 Anthony 2002-09-05 08:24:23 UTC

See also http://bugs.debian.org/cgi-bin/bugreport.cgi?archive=no\&bug=158300
which appears related. 

There are some additional test cases in that bug which might help.

Comment 6 Chip Turner 2002-12-15 23:27:30 UTC

the latest rawhide perl, perl-5.8.0-70, passes all of the test cases above as
well as those in debian bugzilla

Comment 7 Jay Turner 2003-01-15 14:55:52 UTC

I'm still seeing the segfault running the above code with perl-5.8.0-83. 
Attaching strace output.

Comment 8 Jay Turner 2003-01-15 14:56:49 UTC

Created attachment 89378 [details]
perl strace output

Comment 9 Elliot Lee 2003-02-24 17:33:03 UTC

Updating from glibc-2.3.1-46 to glibc-2.3.1-51 fixed the segfault for me.

Comment 10 Elliot Lee 2003-02-24 17:35:15 UTC

Umm, actually appears to be related to NPTL vs. Linuxthreads - the glibc upgrade
turned NPTL back on, and once I turned NPTL off, the segfault came back.

Since NPTL is the default, this is low priority.

Comment 11 Bill Nottingham 2003-07-30 23:08:35 UTC

Closing out some bugs that have been in MODIFIED state. Please reopen if they
persist.

Comment 12 Jonathan Kamens 2003-07-31 02:31:12 UTC

This bug persists for me with perl-5.8.1-90.rc2.1, glibc-2.3.2-57, and a
non-NPTL kernel, but only if I increase $lines in the test script to 3447.  With
$lines set to 3446, Perl doesn't segfault.

Comment 13 Leonard den Ottolander 2004-04-22 16:26:03 UTC

Reproducible on a 256 MB machine, Fedora Core 1, perl-5.8.3-16, $lines
= 10000.

Not attaching core dump (> 10 MB) ;)

Comment 14 Martin Ward 2005-05-23 13:41:54 UTC

This bug can be reproduced with $lines = 4366 on perl v5.8.5 built for
i386-linux-thread-multi running under Mandrake 10.1 (Linux version 2.6.8.1-10mdk)

Comment 15 Warren Togami 2005-09-11 11:09:52 UTC

Closing because reports are from old Fedora, and a Mandrake report is
meaningless. Open a NEW report if this is still an issue with a supported Fedora
version.

Comment 16 Eric Hopper 2005-09-11 16:33:49 UTC

Well, this coredumps for me on FC-4.

With lines set to 3929 it crashes, and with lines set to 3928 it doesn't.  I'm
not sure whether or not this actually represents a problem.  After all, I can
make practically anything coredump by allocating too much memory or whatever and
it doesn't represent a security problem.  Changing my stacksize ulimit will
change the number of lines at which the problem happens.

Here is my system data.

$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
file size               (blocks, -f) unlimited
pending signals                 (-i) 16383
max locked memory       (kbytes, -l) 32
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 16383
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
$ rpm -q perl
perl-5.8.6-15
$ uname -rvm
2.6.12-1.1447_FC4smp #1 SMP Fri Aug 26 21:03:12 EDT 2005 x86_64
$ cat /etc/redhat-release
Fedora Core release 4 (Stentz)

Comment 17 Jonathan Kamens 2005-09-11 17:05:35 UTC

"After all, I can make practically anything coredump by allocating too much
memory or whatever" -- I disagree completely.  A program is not supposed to
coredump if you ask it to do something which takes too much memory.  It's
supposed to tell you that it couldn't do it.  Any program which coredumps is by
definition buggy.

Comment 18 Warren Togami 2005-09-11 21:54:47 UTC

It would be helpful if someone could install the debuginfo package and get a gdb
backtrace to attach here, along with the perl code used to reproduce that backtrace.

Comment 19 Jonathan Kamens 2005-09-12 04:13:43 UTC

I am confused.  Why do you need us to provide a gdb backtrace when I've already
provided (over three years ago) a 3-line perl script with which you can
trivially duplicate the problem?  Wouldn't it be more useful for debugging for
you to reproduce the problem yourself in a running perl instance inside gdb
rather than having someone provide a static gdb backtrace which it seems that
you could just as easily produce yourself?

Comment 20 Eric Hopper 2005-09-12 04:22:09 UTC

Not to mention that the backtrace is over 15000 frames long.

Comment 21 Warren Togami 2005-09-12 04:33:43 UTC

Good point, a stack trace isn't useful.

I talked with a perl expert, and he said that you can only trigger this behavior
with poor regexp coding.  Nesting expressions in this way is inherently
exponential in complexity which is discussed in man perlretut.

What do you expect to happen when the stack grows to an unmanagable size due to
a problematic regexp?  Perhaps perl could be improved to fail with an error
message, but that wouldn't be any different from perl stopping in this
situation.  perl is doing exactly as it is designed to do, and the stack is of
finite size.

This is not a Fedora specific problem, so you should file this upstream. 
However I don't think upstream can really do anything about this.

Comment 22 Eric Hopper 2005-09-12 05:23:21 UTC

Python had a similar sort of problem with isinstance.  I found it, and I made
the same argument as jik.ma.us.  The Python people agreed with
me, and in thinking about it carefully, they were right to.  Language
interpreters especially should not just core dump because of a stack overflow. 
But upstream at the perl maintainers is the right place to deal with it.

Since I don't actually use perl if I can help it, I'll leave it to someone else.
 I just tested because I'm watching all the 'make Cambridge better' dependencies
and I didn't like the reason it was being closed.  :-)

Note You need to log in before you can comment on or make changes to this bug.