Bug 844375 - Endless loop inside pcre_exec with PCRE_MULTILINE set
Summary: Endless loop inside pcre_exec with PCRE_MULTILINE set
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: pcre
Version: 17
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: Petr Pisar
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-07-30 12:51 UTC by Daniel Kopeček
Modified: 2013-07-04 06:56 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-04 06:56:26 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
testing program (900 bytes, text/plain)
2012-07-30 12:51 UTC, Daniel Kopeček
no flags Details
input (2.86 MB, text/plain)
2012-07-30 12:52 UTC, Daniel Kopeček
no flags Details

Description Daniel Kopeček 2012-07-30 12:51:00 UTC
Created attachment 601256 [details]
testing program

Description of problem:
Certain inputs cause an endless loop inside pcre_exec executed with PCRE_MULTILINE set.

Version-Release number of selected component (if applicable):
pcre-8.21-5.fc17.x86_64

How reproducible:
Always with the attached input & testing source.

Steps to Reproduce:
1. gcc -o pcre_bug pcre_bug.c
2. cat pcre_bug_input.txt | ./pcre_bug '^[^#]*(umask[     ]+[0-7][0-7][0-7])' 'm'
  
Actual results:
Endless loop, 100% cpu usage

Expected results:
A correct result or error in case of invalid parameters/input.

Comment 1 Daniel Kopeček 2012-07-30 12:52:36 UTC
Created attachment 601258 [details]
input

Comment 2 Daniel Kopeček 2012-07-30 13:24:18 UTC
Well, it might just be a very time consuming expression. The '[^#]*' is causing problems.

Comment 3 Petr Pisar 2012-07-30 16:06:19 UTC
Halting problem is undecidable :) There will be always patterns and inputs when the automaton performs badly.

If you could `reproduce' the issue with pcretest, or to simplify the test case, it would help me.

What are the white characters between `umask[' and `]+'? Five spaces (U+0020)?

Comment 4 Daniel Kopeček 2012-07-30 16:26:47 UTC
I think it is the "performs badly" case. The white characters are spaces. It doesn't really matter what is between those brackets. How do I feed an arbitrary and large file to the pcretest tool?

"An empty line signals the end of the data lines, at which point  a  new
 regular  expression is read. The regular expressions are given enclosed
 in any non-alphanumeric delimiters other than backslash, for example:"

That's a problem, because the file contains many empty lines. However, I've tested different sizes of the input files and it seems that the time needed to processes the file is growing exponentially. 100KiB is ok, ..., 3MiB takes forever...

Comment 5 Fedora End Of Life 2013-07-04 02:42:58 UTC
This message is a reminder that Fedora 17 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 17. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '17'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 17's end of life.

Bug Reporter:  Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 17 is end of life. If you 
would still like  to see this bug fixed and are able to reproduce it 
against a later version  of Fedora, you are encouraged  change the 
'version' to a later Fedora version prior to Fedora 17's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 6 Petr Pisar 2013-07-04 06:56:26 UTC
The same effect can be triggered with expression '^[^#]*abcd'.

Your 8-MB file consists of 50394 lines. None of the characters in the file is '#'.

Thus the matching engine has to check each character that it's not a '#' and if the character is followed by 'abcd'.

The file does not contain 'abcd' (in your case 'umask'). So it has to prove the file does not contain the search string. I.e. it has to do 8 millions string comparisons. And that takes a long time.

A lot of time consumes stack nesting because '^[^#]*abc' is significantly faster because these short literals are in-lined by pcre. Also replacing '#' with more frequent character helps. (On the other hand rare character ';' suffers too.)

You can ask upstream <pcre-dev> for some optimization hints for your specific query. But I have to close this issue without a fix.


Note You need to log in before you can comment on or make changes to this bug.