Red Hat Bugzilla – Bug 179698
grep -w broken in multibyte locales
Last modified: 2010-03-17 06:32:36 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/417.9 (KHTML, like Gecko) Safari/417.8
Description of problem:
grep -w appears not to work for any non-UTF-8 multibyte locale; matches are often not reported when
they should be.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
echo za a | LANG=ja_JP.eucjp grep -w a
Actual Results: No output.
Expected Results: Output: "za a".
The command gives the expected output on FC3.
Thanks for the report.
* this is for non-UTF-8 multibyte locales. UTF-8 input works correctly.
* FC-3 gives the same output for me (grep-2.5.1-31.4).
Created attachment 124869 [details]
Potential fix for grep -w problem
This is a potential fix for the grep -w problem in non-UTF-8 multibyte locales.
It attempts to correctly locate the multibyte character preceeding the matched
string by (slowly) scanning forward from the start of the buffer until it
reaches the match position. Using "last_char" instead of performing this search
seems to be incorrect.
Thanks! Fixed in grep-2.5.1-53.