Bug 1413716 - RFE: bug on input buffer boundary and/or temporary composing buffer of multibyte characters
Summary: RFE: bug on input buffer boundary and/or temporary composing buffer of multib...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: ksh
Version: 7.2
Hardware: All
OS: Linux
low
low
Target Milestone: rc
: 7.4
Assignee: Siteshwar Vashisht
QA Contact: BaseOS QE - Apps
URL:
Whiteboard:
Depends On:
Blocks: 1420851 1417886
TreeView+ depends on / blocked
 
Reported: 2017-01-16 19:03 UTC by Paulo Andrade
Modified: 2020-06-11 13:12 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1417886 (view as bug list)
Environment:
Last Closed: 2017-04-25 12:35:30 UTC
Target Upstream Version:


Attachments (Terms of Use)
ksh-20120801-mbchar.patch (1.53 KB, patch)
2017-01-16 19:03 UTC, Paulo Andrade
no flags Details | Diff
iso.sh (78 bytes, text/plain)
2017-01-16 19:09 UTC, Paulo Andrade
no flags Details
utf.sh (80 bytes, application/x-shellscript)
2017-01-16 19:13 UTC, Paulo Andrade
no flags Details
iso_and_utf_sh.tar (10.00 KB, application/x-tar)
2017-01-17 11:38 UTC, Paulo Andrade
no flags Details

Description Paulo Andrade 2017-01-16 19:03:51 UTC
Created attachment 1241368 [details]
ksh-20120801-mbchar.patch

If iso8859-x characters are found in certain positions of
input, ksh parsing may get confused.

The problem is that the parser frequently reads a byte
with fcmbget() to "peek" the next input character, and then
then calls fcseek(-LEN) where LEN is the amount of bytes read,
to reset the input.

The problem is that _fcmbget() has an local static buffer to
compose multibyte characters, and fcseek() does not know about
it.

The code is far more complex than just needing to make the
"compose buffer" in _fcmbget() file static, make fcseek() a
function, etc.

The proposed patch works for the test case where it causes the
problem being reported, as well as for utf8 characters, encoding
latin-n characters.

Test cases follow with explanations, as attachments.

*Note* that this patch right now is an RFE, as it might be
required more tests to validate it.

Comment 1 Paulo Andrade 2017-01-16 19:09:17 UTC
Created attachment 1241369 [details]
iso.sh

  Test example:

# Need to use bash, or ksh with proposed patch
$ bash iso.sh > a.sh
$ ksh -x a.sh 2>&1 | tail -5
+ VAR8593=$'/a/\xe7foo/ab\xe3c'
+ VAR8594=$'/a/\xe7foo/ab\xe3c'
+ VAR8595=$'/a/\xe7foo/ab\xe3c'
+ VAR8596=$'/a/\xe7foo/ab\xe3c\nVAR8597=/a/\xe7foo/ab\xe3c'
a.sh: line 8596: ": invalid variable name

with ksh with proposed patch it works.

Comment 2 Paulo Andrade 2017-01-16 19:13:06 UTC
Created attachment 1241371 [details]
utf.sh

  This example is just to validate that with or without the
proposed patch, the output is the same, for example:

$ bash /tmp/utf.sh > a.sh
$ tail -3 a.sh 
VAR65534="/a/çfoo/abãc"
VAR65535="/a/çfoo/abãc"
VAR65536="/a/çfoo/abãc"
$ ksh -x a.sh > old.sh 2>&1
$ arch/linux.i386-64/src/cmd/ksh93/ksh -x a.sh > new.sh 2>&1
$ tail -3 old.sh
+ VAR65534=$'/a/\u[e7]foo/ab\u[e3]c'
+ VAR65535=$'/a/\u[e7]foo/ab\u[e3]c'
+ VAR65536=$'/a/\u[e7]foo/ab\u[e3]c'
$ diff -u old.sh new.sh
<<empty>>

Comment 3 Siteshwar Vashisht 2017-01-17 08:31:35 UTC
Paulo,

I tried to reproduce this bug on a vanilla rhel-7.3 system, but I did not get the error 'a.sh: line 8596: ": invalid variable name'. 

Here is the output from my system :

[0 root@qeos-82 bug1413716]# rpm -q ksh
ksh-20120801-26.el7.x86_64
[0 root@qeos-82 bug1413716]# cat iso.sh 
#!/bin/sh

for i in $(seq 1 65536); do
    echo "VAR$i="'"/a/çfoo/abãc"'
done
[0 root@qeos-82 bug1413716]# bash iso.sh > a.sh
[0 root@qeos-82 bug1413716]# ksh -x a.sh 2>&1 | tail -5
+ VAR65532=$'/a/\u[e7]foo/ab\u[e3]c'
+ VAR65533=$'/a/\u[e7]foo/ab\u[e3]c'
+ VAR65534=$'/a/\u[e7]foo/ab\u[e3]c'
+ VAR65535=$'/a/\u[e7]foo/ab\u[e3]c'
+ VAR65536=$'/a/\u[e7]foo/ab\u[e3]c'
[0 root@qeos-82 bug1413716]# 


Am I missing something in the reproducer steps ?

Comment 4 Paulo Andrade 2017-01-17 11:38:02 UTC
  Hi Sitesh,

  Somehow it got converted to utf8. Make sure
the iso.sh file has iso8859-1 characters.

  For example, the "ç" character, in utf8 ksh
shows it embedded in strings as "\u[e3]", but
if in iso8859-1 it shows as "\xe3".

  The original problem is due to a system that
creates ksh scripts dynamically, but uses iso
encoding...

Comment 5 Paulo Andrade 2017-01-17 11:38:56 UTC
Created attachment 1241680 [details]
iso_and_utf_sh.tar

I think it was bugzilla that converted it because I selected text mode...

Comment 7 Siteshwar Vashisht 2017-01-30 16:19:27 UTC
Paulo,

Thanks! I am able to reproduce it with attachment from comment 5.

Comment 10 Siteshwar Vashisht 2017-01-31 13:36:45 UTC
Upstream discussion http://lists.research.att.com/pipermail/ast-users/2017q1/004806.html

Comment 13 Siteshwar Vashisht 2017-04-25 11:05:06 UTC
Patch mentioned in comment 12 breaks if we increase size of input file to ksh by increasing length of loop in iso.sh.


Note You need to log in before you can comment on or make changes to this bug.