Bug 187574
Summary: | rhel3: sed y command will not accept the newline escape sequence | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | Michael Gilbert <michael.s.gilbert> |
Component: | sed | Assignee: | Petr Machata <pmachata> |
Status: | CLOSED ERRATA | QA Contact: | Ben Levenson <benl> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 3.0 | CC: | mnewsome |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | RHBA-2006-0362 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2006-07-20 15:06:46 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 181405 |
Description
Michael Gilbert
2006-04-01 03:02:04 UTC
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2006-0362.html escape sequences are not yet properly parsed in the replacement text when running in a non-multibyte locale. in fact, the translation at sed/compile.c:1276 is performed only if (MB_CUR_MAX > 1) because of sed/compile.c:1246. as a consequence, sed behaviour depends on the particular locale chosen: [root@green-ALT BUILD]# echo aka | env LANG=en_US.UTF-8 ./sed-4.0.7/sed/sed 'y,k,\t,' a a [root@green-ALT BUILD]# echo aka | env LANG=en_US.ISO-8859-1 ./sed-4.0.7/sed/sed 'y,k,\t,' ata [root@green-ALT BUILD]# echo aka | env LANG=C ./sed-4.0.7/sed/sed 'y,k,\t,' ata does that make sense? regards, -- g.b. Ignoring escape sequences in C locale is intentional, because of POSIX compatibility. The ISO-8859-1 locale puzzles me. It may be for similar reason, but I can't say right away. I'll take a look at it. as far as I can tell, MB_CUR_MAX (in stdlib.h) is the maximum number of bytes composing a character in current locale's charset. so its value is 1 in all single-byte character sets ---including ascii for C/POSIX and ISO-8859-1 for en_US--- and > 1 in UTF-8 for en_US.UTF-8. the following code #include <stdio.h> #include <stdlib.h> #include <locale.h> int main() { setlocale(LC_ALL, ""); printf("%d\n", MB_CUR_MAX); return 0; } outputs the following with my glibc (2.3.5) implementation: $ for i in C en_US en_US.UTF-8 ; do env LANG=$i ./a.out ; done 1 1 6 as far as I know, translation of utf8 multibyte sequences can be done independently from escape-sequence parsing, because no ascii char can appear in a valid multibyte utf8 sequence. I really don't know whether a POSIX locale should imply POSIX compliance. however, relying on MB_CUR_MAX might have unseen consequences, e.g. disrupting the test script in your last sed rpm source. ;-) regards, -- giuseppe Please direct all further comments to #200663. |