Bug 112657 - ${var%...} parameter expansion broken for UTF-8
${var%...} parameter expansion broken for UTF-8
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: bash (Show other bugs)
1
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Tim Waugh
Ben Levenson
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2003-12-26 11:07 EST by David Nečas
Modified: 2007-11-30 17:10 EST (History)
1 user (show)

See Also:
Fixed In Version: 2.05b-35
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-01-05 10:50:33 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
bash-2.05b-subst.patch (891 bytes, patch)
2004-01-02 12:37 EST, Tim Waugh
no flags Details | Diff
bash-2.05b-subst.patch (3.16 KB, patch)
2004-01-05 05:51 EST, Tim Waugh
no flags Details | Diff

  None (edit)
Description David Nečas 2003-12-26 11:07:32 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 Galeon/1.2.10 (X11; Linux i686; U;) Gecko/20030314

Description of problem:
The ${var%...}, ${var#...}, ${var:offset} (and maybe other) parameter
expansions are broken for UTF-8.  Regardless of UTF-8 locales, when a
variable contains a UTF-8 string (e.g. a filename) these expansions
seem to work on bytes instead of characters, leading to invalid UTF-8
strings and other surprises.

Version-Release number of selected component (if applicable):
bash-2.05b-34

How reproducible:
Always

Steps to Reproduce:
1. Set up UTF-8 locales (this is default for many languages), I'll use
LC_ALL=cs_CZ.UTF-8
2. Run bash, type a='áé' (two characters: a-with-acute, e-with-acute,
they don't seem to be able to survive the posting process)
3. echo -n $a | xxd
4a. echo -n ${a:3} | xxd
4b. echo -n ${a%?} | xxd
4c. echo -n ${a#?} | xxd
    

Actual Results:
3. 0000000: c3a1 c3a9                                ....
4a. 0000000: a9                                       .
    (the fourth byte)
4b. 0000000: c3a1 c3                                  ...
    (one byte removed from the end)
4c. 0000000: a1c3 a9                                  ...
    (one byte removed from the begining)

Expected Results:
3. the same (just a check)
4a. nothing
    (because there are only two characters in $a)
4b. 0000000: c3a1                                     ..
    (one character removed from the end)
4c. 0000000: c3a9                                     ..
    (one character removed from the begining)

Additional info:
Comment 1 Tim Waugh 2004-01-02 12:37:49 EST
Created attachment 96743 [details]
bash-2.05b-subst.patch

Here is a start at fixing it.  It only addresses ${var:off[:len]} so far.  I've
reported this upstream.
Comment 2 Tim Waugh 2004-01-05 05:51:07 EST
Created attachment 96767 [details]
bash-2.05b-subst.patch

Here is a more complete patch.

Note You need to log in before you can comment on or make changes to this bug.