From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13 Description of problem: Sed apparently fails to apply search/replace with regular expressions when input lines contain non-ASCII characters. Version-Release number of selected component (if applicable): sed-4.1.5-9.fc8 How reproducible: Always Steps to Reproduce: 1. Dowload the attached text file. In the third line there is a non-ASCII character with value 0xe8. 2. Feed the text file to sed with the following line: $ sed -e "s/\".*$//g" text-with-non-ASCII-char.txt Actual Results: define( define( define("CDEF","Questo ? un bug define( NOTE: the '?' is actually the 0xe8 character code. Expected Results: define( define( define( define( Additional info: I verified the sed package with "rpm -V sed" and found it ok. Apparently sed behaviour is fine on ASCII and utf-8 character sets. I tried the same file with cygwin sed (same 4.1.5 version) and it worked properly.
Created attachment 300253 [details] This file contains a non-ASCII character on the third line. You can use this file to reproduce the bug.
sed uses re_search to find next matching sequence that should be replaced. re_search is a libc function, and respects your current locale. I guess you have UTF-8 locale on your system, as is the default in Fedora. In UTF-8, 0xe8 is not a valid character. When processing such a files, you have to either choose the right iso-8859-x encoding, or simply drop down to C encoding: $ echo -e 'a\xe8a' | LANG=en_US.utf-8 ./sed/sed 's/./!/g' !รจ! $ echo -e 'a\xe8a' | LANG=en_US.iso-8859-1 ./sed/sed 's/./!/g' !!! $ echo -e 'a\xe8a' | LANG=C ./sed/sed 's/./!/g' !!!