| Summary: | gawk regular expressions when using '-' in alpha range will match out of the given range | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Mel <mel> |
| Component: | gawk | Assignee: | Vojtech Vitek <vvitek> |
| Status: | CLOSED NOTABUG | QA Contact: | BaseOS QE - Apps <qe-baseos-apps> |
| Severity: | low | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 5.5 | CC: | hripps |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2011-04-28 08:21:16 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
Fixed with export LANG=C This is not a bug, but expected behaviour of non-C locales, especially their different collation order. (In reply to comment #0) > Additional info: > '[A-Z]' acts as if it is '[A-Zb-z]' > '[a-z]' acts as if it is '[A-Ya-z]' > '[BC]' works, but '[B-C]' acts as if it is '[B-Cc]' Note, that locale collation order can be something like "AaBbCcDd..", so [A-Z] range can expand to [AaBbCcDd..] and [B-C] can expand to [BbCc]. This behaviour was coincidently discussed upstream few days ago, see: http://lists.gnu.org/archive/html/bug-gnu-utils/2011-04/msg00021.html Citation of Aharon Robbins (Wed, 27 Apr 2011 21:48:41 +0300): > I do agree that the behavior is suprising, disconcerting, undesirable, > and so on. For this reason, the upcoming version of gawk translates > ranges of the form [d-h] into '[defgh]' before compiling the regular > expression. Advice: To get plain ASCII ordering, you can use either [[:upper:]] or [[:lower:]] for entire ranges, or you can use explicit ranges, such as [CDEFG], or finally you can use LC_ALL=C as a quick workaround. Closing as NOTABUG. |
Description of problem: gawk regular expressions when using '-' in alpha range match out of the given range, for example [A-Z] will match [A-Z] and [b-z] Version-Release number of selected component (if applicable): gawk-3.1.5-14 bug also exists in gawk-3.1.1-9 How reproducible: echo 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'| awk '{gsub(/[A-Z]/,".");print $0}' echo 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'| awk '{gsub(/[a-z]/,".");print $0}' cat << MEL >foo ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz MEL awk '/[A-Z]/{print "UPPER: " $0}' foo awk '/[a-z]/{print "lower: " $0}' foo Steps to Reproduce: 1. use a regular expression range using '-' in gawk to match just upper or lower case (ie: '[A-Z]' or '[a-z]' or '[b-d]', etc) 2. 3. Actual results: echo 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'| awk '{gsub(/[A-Z]/,".");print $0}' ..........................a......................... echo 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'| awk '{gsub(/[a-z]/,".");print $0}' .........................Z.......................... Expected results: echo 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'| awk '{gsub(/[A-Z]/,".");print $0}' ..........................abcdefghijklmnopqrstuvwxyz echo 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'| awk '{gsub(/[a-z]/,".");print $0}' ABCDEFGHIJKLMNOPQRSTUVWXYZ.......................... Additional info: '[A-Z]' acts as if it is '[A-Zb-z]' '[a-z]' acts as if it is '[A-Ya-z]' '[BC]' works, but '[B-C]' acts as if it is '[B-Cc]' This has been broken for a while. I tested gawk-3.1.1-9, and it has the same problem.