Bug 86107

Summary: Subexpression and newline problem in regex (glibc)
Product: [Retired] Red Hat Linux Reporter: Ben Kao <bkao5>
Component: glibcAssignee: Jakub Jelinek <jakub>
Status: CLOSED CURRENTRELEASE QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 8.0CC: bkao5, fweimer
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: 9.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-04-23 18:16:05 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Std C code illustrating a bug in regex of RH8.0 none

Description Ben Kao 2003-03-14 06:47:42 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)

Description of problem:
Regex behavior has changed from RH7.3 (libc-2.2.5) to RH8.0 (libc-2.2.93) where 
it is broken.

The bug is related to newlines in searched text and subexpression quantifiers 
(*, {2}, etc.)

When I use the character class [:space:] to match spaces and newlines, a 
newline in the search text causes matching to stop immediately after the 
newline (in the text: "100"\ncellpadding my regex will find "100"\n).  When I 
substitute a space for the newline ("100" cellpadding), matching will continue 
on as it should.

To further complicate matters, for some reason, if you specify the exact number 
of subexpressions that exist in the search text, it will continue matching to 
get the expected result.

The following program compiles two patterns.  Pattern1 shows the regex is 
broken b/c the '*' quantifier does not work as expected.  Pattern2 shows a 
curious condition that surprisingly produces proper output.

Version-Release number of selected component (if applicable):
libc-2.2.93

How reproducible:
Always

Steps to Reproduce:
Compile and run attached program.  It's output shows (1) the broken output and 
(2) a curious condition that produces the proper output (though it is not a 
workaround).
    

Additional info:

Comment 1 Ben Kao 2003-03-14 06:51:16 UTC
Created attachment 90596 [details]
Std C code illustrating a bug in regex of RH8.0

Comment 2 Ulrich Drepper 2003-04-22 08:14:33 UTC
With RHL9 I get this output:

Testing pattern1:
(0, 35): <table width="100"
cellpadding="2">
Testing pattern2:
(0, 35): <table width="100"
cellpadding="2">

The same output and it looks OK.  Can you install RHL9 and try it?

Comment 3 Ben Kao 2003-04-23 18:16:05 UTC
Confirm, problem appears to be fixed in Red Hat 9.0.