Bug 112657 - ${var%...} parameter expansion broken for UTF-8
Summary: ${var%...} parameter expansion broken for UTF-8
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: bash
Version: 1
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Tim Waugh
QA Contact: Ben Levenson
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2003-12-26 16:07 UTC by David Nečas
Modified: 2007-11-30 22:10 UTC (History)
1 user (show)

Fixed In Version: 2.05b-35
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-01-05 15:50:33 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
bash-2.05b-subst.patch (891 bytes, patch)
2004-01-02 17:37 UTC, Tim Waugh
no flags Details | Diff
bash-2.05b-subst.patch (3.16 KB, patch)
2004-01-05 10:51 UTC, Tim Waugh
no flags Details | Diff

Description David Nečas 2003-12-26 16:07:32 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 Galeon/1.2.10 (X11; Linux i686; U;) Gecko/20030314

Description of problem:
The ${var%...}, ${var#...}, ${var:offset} (and maybe other) parameter
expansions are broken for UTF-8.  Regardless of UTF-8 locales, when a
variable contains a UTF-8 string (e.g. a filename) these expansions
seem to work on bytes instead of characters, leading to invalid UTF-8
strings and other surprises.

Version-Release number of selected component (if applicable):
bash-2.05b-34

How reproducible:
Always

Steps to Reproduce:
1. Set up UTF-8 locales (this is default for many languages), I'll use
LC_ALL=cs_CZ.UTF-8
2. Run bash, type a='áé' (two characters: a-with-acute, e-with-acute,
they don't seem to be able to survive the posting process)
3. echo -n $a | xxd
4a. echo -n ${a:3} | xxd
4b. echo -n ${a%?} | xxd
4c. echo -n ${a#?} | xxd
    

Actual Results:
3. 0000000: c3a1 c3a9                                ....
4a. 0000000: a9                                       .
    (the fourth byte)
4b. 0000000: c3a1 c3                                  ...
    (one byte removed from the end)
4c. 0000000: a1c3 a9                                  ...
    (one byte removed from the begining)

Expected Results:
3. the same (just a check)
4a. nothing
    (because there are only two characters in $a)
4b. 0000000: c3a1                                     ..
    (one character removed from the end)
4c. 0000000: c3a9                                     ..
    (one character removed from the begining)

Additional info:

Comment 1 Tim Waugh 2004-01-02 17:37:49 UTC
Created attachment 96743 [details]
bash-2.05b-subst.patch

Here is a start at fixing it.  It only addresses ${var:off[:len]} so far.  I've
reported this upstream.

Comment 2 Tim Waugh 2004-01-05 10:51:07 UTC
Created attachment 96767 [details]
bash-2.05b-subst.patch

Here is a more complete patch.


Note You need to log in before you can comment on or make changes to this bug.