326151 – Regular expression that matchs start of paragraph and removes content gets compared against remaining content again

Bug 326151 - Regular expression that matchs start of paragraph and removes content gets compared against remaining content again

Summary: Regular expression that matchs start of paragraph and removes content gets co...

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	openoffice.org
Sub Component:
Version:	7
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	Caolan McNamara
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-10-10 11:59 UTC by Jan Pazdziora (Red Hat)
Modified:	2007-11-30 22:12 UTC (History)
CC List:	0 users
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2007-10-16 08:37:29 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
OpenOffice.org	82473	0	None	None	None	Never

Description Jan Pazdziora (Red Hat) 2007-10-10 11:59:59 UTC

Description of problem:

If you have string "ABC DEF GHI" and want to remove the first word, you will use
Ctrl+F, Search for "^[^ ]+", empty Replace with string, and in More Options
check Regular expressions.

The result of "Replace All" is " DEF GHI", with space at the beginning of the
string.

If you would like to remove the first word _and_ the leading space, you'll use
regular expression "^[^ ]+ " -- note that here we've appended a space character
to the regular expression. However, the result of such search / replace is
"GHI". So appending a space to a regular expression somehow changed the meaning
of that "^[^ ] +" that preceded it.

Version-Release number of selected component (if applicable):

openoffice.org-calc-2.2.1-18.2.fc7.x86_64
openoffice.org-writer-2.2.1-18.2.fc7.x86_64

How reproducible:

Deterministic.

Steps to Reproduce:
1. Create new text document.
2. Write string "ABC DEF GHI".
3. Try to remove the first word including the space after it (ABC ) with regular
expression "^[^ ]+ ".
  
Actual results:

"ABC DEF " gets removed.

Expected results:

Only "ABC " should have got removed.

Additional info:

The problem happens both in the writter and in calc.

Comment 1 Caolan McNamara 2007-10-10 12:55:52 UTC

It seems that after the first replace the remaining string then matches the
pattern, and the replace all is getting re-run after the first replace where it
now matches the pattern again and so the second replace takes place.

i.e. for replace all, instead of 
echo ABCZDEFZGHI | sed -r -e 's/^[^Z]+Z//'
we have
effectively 
echo ABCZDEFZGHI | sed -r -e 's/^[^Z]+Z//' | sed -r -e 's/^[^Z]+Z//'

Here's a more concise example, i.e. search string of 
^.{3}
and no replace string to remove the first 3 characters, with the above example
it keeps removing a block of 3 characters until only 2 are left with "replace all"

Comment 2 Caolan McNamara 2007-10-16 08:37:29 UTC

Moving upstream, affects us all: i.e.
http://www.openoffice.org/issues/show_bug.cgi?id=82473

Note You need to log in before you can comment on or make changes to this bug.