From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6a) Gecko/20031014 Firebird/0.7+ Description of problem: In at least some UTF-8 locales, the open bracket character ('[') is included in the set of alphabetic characters. This leads to matches on word boundaries breaking, for example. The closing bracket is *not* included in the set of alphabetic characters, neither are parantheses or braces. Version-Release number of selected component (if applicable): glibc-2.3.2-27.9 How reproducible: Always Steps to Reproduce: 1. echo [ | LANG=en_AU.UTF-8 grep -E "[[:alpha:]]" - Actual Results: The echoed string is matched (so '[' is returned). Expected Results: Nothing should have been matched. Additional info: Replace en_AU.UTF-8 with just en_AU and nothing is matched. Replace en_AU.UTF-8 with de_DE.UTF-8 and the match happens. Replace en_AU.UTF-8 with de_DE and nothing is matched. Replace the '[' with ']' in all cases and nothing is matched. The situation where we originally discovered this was when we were running a search like echo "offset" | grep -w "a[offset]" and it would only work in some locales.
Hmm ... the last example was overly simplified and does not work in any locale. But put something like "a[offset] = 6;" into a file called foo and run grep -w offset foo and it doesn't work in en_AU.UTF-8 (my default locale), but does work in C and en_AU, etc.
Given that: #include <locale.h> #include <ctype.h> #include <stdio.h> #include <stdlib.h> #include <regex.h> int main (void) { regex_t re; regmatch_t rm[2]; setlocale (LC_ALL, ""); if (isalpha ('[')) abort (); if (regcomp (&re, "[[:alpha:]]", 0) || !regexec (&re, "[", 2, rm, 0)) abort (); return 0; } doesn't abort in LC_ALL=en_AU.UTF-8 nor any other locale that I've tried, I'd say this has nothing to do with glibc but grep. echo [ | LANG=en_AU.UTF-8 sed -n "/[[:alpha:]]/p" doesn't print anything either.
dfa bug.
Created attachment 95604 [details] grep-2.5.1-bracket.patch Here is a potential fix.
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2004-079.html