Created attachment 363816 [details] Patch to fix this issue Description of problem: Using a multibyte character that ends with 0x5c (backslash) can cause sed to report syntax errors. Version-Release number of selected component (if applicable): sed-4.1.5-5 How reproducible: Always Steps to Reproduce: 1. Start with your shell in a UTF-8 locale, eg en-US.UTF-8 (you can probably do this in a different locale, but it definitely works if you start in a UTF-8 locale). 2. Run the follow commands to construct a sed script: U2010=$(echo -ne '\x20\x10' | iconv -f ucs-2be) echo "echo '$U2010' | sed 's/$U2010/hyphen/g'" | iconv -t gbk > /tmp/script 3. Run the shell script in a locale that uses the gbk character set: LC_ALL=zh_CN.gbk sh /tmp/script 2>&1 | iconv -f gbk Actual results: The script reports an error: sed:-e 表达式 #1,字符 13:unterminated `s' command Expected results: The single word "hyphen" Additional info: The error arises because the character U+2010 (HYPHEN) is encoded as \xa9\x5c in the gbk encoding. Sed sees the "\x5c" as a backslash escaping the following character which, in this case, is the "/" that we hope is going to terminate the pattern; it doesn't and so we get a syntax error. Of course, this is just one character in one encoding. There are likely to be many others and this is just one example. I have another example for SJIS, (U+8868) but SJIS isn't a good encoding to use for reporting bugs :-). This bug is alluded to in bug 524837 but it doesn't need a rebase to 4.2 to fix it. The attached patch is taken from sed 4.2 and does in fact fix this problem and can be applied with out needing to rebase.
I confirm that this is the same bug that is fixed by the attached upstream patch (i.e. not a coincidence :-).
As bug 524837 has been declined for the current release, can we have this one fixed please?
Upstream patch 20f68fb1abe862a98bc0378e5bb54d94bb98b8fe is also related to multibyte characters in sed. If it applies to RHEL5 sed, it should be brought in as it can cause an infinite loop.
Created attachment 483233 [details] Fix multibyte characters syntax error and possible infinite loop
Does that attachment work? I haven't tried to compile sed with it, but it contains this: +#define MBSINIT(s) \ + (mb_cur_max == 1 ? 1 : mbsinit ((s)) + There is a missing ")" there
It might be that the other side of the #if is being used. This would be a configuration problem for sed.
I noticed the missing ")" when I was comparing my original patch to the updated one. Even if it's not causing actual compilation problems it would be nice if the patch _looked_ right :-)
(In reply to comment #8) > Does that attachment work? I haven't tried to compile sed with it, but it > contains this: > > +#define MBSINIT(s) \ > + (mb_cur_max == 1 ? 1 : mbsinit ((s)) > + > > There is a missing ")" there Yes, you are right. It was just a typo in original commit. I have actually applied a fixed version of patch to RHEL CVS tree, because running `make' failed with syntax error. Anyway, thanks for the notification.
Created attachment 483248 [details] Fix multibyte characters syntax error and possible infinite loop
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0397.html