Bug 635748
| Summary: | regex: [A-z] detected as empty range with en_US.UTF-8 | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Jaroslav Škarvada <jskarvad> |
| Component: | grep | Assignee: | Jaroslav Škarvada <jskarvad> |
| Status: | CLOSED WORKSFORME | QA Contact: | BaseOS QE - Apps <qe-baseos-apps> |
| Severity: | medium | Docs Contact: | |
| Priority: | low | ||
| Version: | 6.0 | CC: | hongjiu.lu, jakub, jskarvad, lkundrak, pbonzini, schwab, syeghiay |
| Target Milestone: | rc | Keywords: | Reopened |
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 583011 | Environment: | |
| Last Closed: | 2010-11-24 11:25:50 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 583011 | ||
| Bug Blocks: | |||
|
Description
Jaroslav Škarvada
2010-09-20 15:55:56 UTC
Not a bug. Nothing to fix here. Upon further analysis, there is a bug in grep too: $ sed '/[A-Z]/p' z z $ grep '/[A-Z]/p' z $ The problem here is that grep's DFA matcher is trying to use strcoll for single-byte matches, instead of glibc's own rules (whatever they are). At the same time, grep relies on glibc to ascertain the validity of regular expressions, thus giving the inconsistent behavior. I'm reassigning this to grep. Any changes in glibc regex, such as the ones suggested in bug 583011 comment #24, are anyway too wide in scope for RHEL6. Re comment 8: sed '/[A-Z]/p' .. Print the current pattern space (not match!) Observing same regexp matching in sed and grep: $ echo z | sed 's/[A-Z]/1/' a $ echo Z | sed 's/[A-Z]/1/' 1 Sorry, what I meant is: $ echo z | sed -n '/[A-Z]/p' $ echo z | grep '[A-Z]' # (2) z It is a bug even considering what is documented in the Migration Guide; see this: $ echo 00z | egrep '(.)\1[A-Z]' $ which is inconsistent with (2) above. Fixed by upstream commit 99d3c7e1308beb1ce9a3c535ca4b6581ebd653ee. Re comment 11: I am unable to reproduce with grep-2.6.3-2.el6 (currently in RHEL-6): $ echo z | sed -n '/[A-Z]/p' $ echo z | grep '[A-Z]' $ echo 00z | egrep '(.)\1[A-Z]' $ Are you trying both en_US.UTF-8 and cs_CZ.UTF-8? Yes, both variants: $ echo z | LC_ALL=en_US.UTF-8 sed -n '/[A-Z]/p' $ echo z | LC_ALL=en_US.UTF-8 grep '[A-Z]' $ echo 00z | LC_ALL=en_US.UTF-8 egrep '(.)\1[A-Z]' $ $ echo z | LC_ALL=cs_CZ.UTF-8 sed -n '/[A-Z]/p' $ echo z | LC_ALL=cs_CZ.UTF-8 grep '[A-Z]' $ echo 00z | LC_ALL=cs_CZ.UTF-8 egrep '(.)\1[A-Z]' $ |