From Bugzilla Helper: User-Agent: Mozilla/5.0 Galeon/1.2.10 (X11; Linux i686; U;) Gecko/20030314 Description of problem: The ${var%...}, ${var#...}, ${var:offset} (and maybe other) parameter expansions are broken for UTF-8. Regardless of UTF-8 locales, when a variable contains a UTF-8 string (e.g. a filename) these expansions seem to work on bytes instead of characters, leading to invalid UTF-8 strings and other surprises. Version-Release number of selected component (if applicable): bash-2.05b-34 How reproducible: Always Steps to Reproduce: 1. Set up UTF-8 locales (this is default for many languages), I'll use LC_ALL=cs_CZ.UTF-8 2. Run bash, type a='áé' (two characters: a-with-acute, e-with-acute, they don't seem to be able to survive the posting process) 3. echo -n $a | xxd 4a. echo -n ${a:3} | xxd 4b. echo -n ${a%?} | xxd 4c. echo -n ${a#?} | xxd Actual Results: 3. 0000000: c3a1 c3a9 .... 4a. 0000000: a9 . (the fourth byte) 4b. 0000000: c3a1 c3 ... (one byte removed from the end) 4c. 0000000: a1c3 a9 ... (one byte removed from the begining) Expected Results: 3. the same (just a check) 4a. nothing (because there are only two characters in $a) 4b. 0000000: c3a1 .. (one character removed from the end) 4c. 0000000: c3a9 .. (one character removed from the begining) Additional info:
Created attachment 96743 [details] bash-2.05b-subst.patch Here is a start at fixing it. It only addresses ${var:off[:len]} so far. I've reported this upstream.
Created attachment 96767 [details] bash-2.05b-subst.patch Here is a more complete patch.