Bug 1413716

Summary: RFE: bug on input buffer boundary and/or temporary composing buffer of multibyte characters
Product: Red Hat Enterprise Linux 7 Reporter: Paulo Andrade <pandrade>
Component: kshAssignee: Siteshwar Vashisht <svashisht>
Status: CLOSED WONTFIX QA Contact: BaseOS QE - Apps <qe-baseos-apps>
Severity: low Docs Contact:
Priority: low    
Version: 7.2CC: jkejda, pandrade, srandhaw, zpytela
Target Milestone: rcKeywords: FutureFeature
Target Release: 7.4   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1417886 (view as bug list) Environment:
Last Closed: 2017-04-25 12:35:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1417886, 1420851    
Attachments:
Description Flags
ksh-20120801-mbchar.patch
none
iso.sh
none
utf.sh
none
iso_and_utf_sh.tar none

Description Paulo Andrade 2017-01-16 19:03:51 UTC
Created attachment 1241368 [details]
ksh-20120801-mbchar.patch

If iso8859-x characters are found in certain positions of
input, ksh parsing may get confused.

The problem is that the parser frequently reads a byte
with fcmbget() to "peek" the next input character, and then
then calls fcseek(-LEN) where LEN is the amount of bytes read,
to reset the input.

The problem is that _fcmbget() has an local static buffer to
compose multibyte characters, and fcseek() does not know about
it.

The code is far more complex than just needing to make the
"compose buffer" in _fcmbget() file static, make fcseek() a
function, etc.

The proposed patch works for the test case where it causes the
problem being reported, as well as for utf8 characters, encoding
latin-n characters.

Test cases follow with explanations, as attachments.

*Note* that this patch right now is an RFE, as it might be
required more tests to validate it.

Comment 1 Paulo Andrade 2017-01-16 19:09:17 UTC
Created attachment 1241369 [details]
iso.sh

  Test example:

# Need to use bash, or ksh with proposed patch
$ bash iso.sh > a.sh
$ ksh -x a.sh 2>&1 | tail -5
+ VAR8593=$'/a/\xe7foo/ab\xe3c'
+ VAR8594=$'/a/\xe7foo/ab\xe3c'
+ VAR8595=$'/a/\xe7foo/ab\xe3c'
+ VAR8596=$'/a/\xe7foo/ab\xe3c\nVAR8597=/a/\xe7foo/ab\xe3c'
a.sh: line 8596: ": invalid variable name

with ksh with proposed patch it works.

Comment 2 Paulo Andrade 2017-01-16 19:13:06 UTC
Created attachment 1241371 [details]
utf.sh

  This example is just to validate that with or without the
proposed patch, the output is the same, for example:

$ bash /tmp/utf.sh > a.sh
$ tail -3 a.sh 
VAR65534="/a/çfoo/abãc"
VAR65535="/a/çfoo/abãc"
VAR65536="/a/çfoo/abãc"
$ ksh -x a.sh > old.sh 2>&1
$ arch/linux.i386-64/src/cmd/ksh93/ksh -x a.sh > new.sh 2>&1
$ tail -3 old.sh
+ VAR65534=$'/a/\u[e7]foo/ab\u[e3]c'
+ VAR65535=$'/a/\u[e7]foo/ab\u[e3]c'
+ VAR65536=$'/a/\u[e7]foo/ab\u[e3]c'
$ diff -u old.sh new.sh
<<empty>>

Comment 3 Siteshwar Vashisht 2017-01-17 08:31:35 UTC
Paulo,

I tried to reproduce this bug on a vanilla rhel-7.3 system, but I did not get the error 'a.sh: line 8596: ": invalid variable name'. 

Here is the output from my system :

[0 root@qeos-82 bug1413716]# rpm -q ksh
ksh-20120801-26.el7.x86_64
[0 root@qeos-82 bug1413716]# cat iso.sh 
#!/bin/sh

for i in $(seq 1 65536); do
    echo "VAR$i="'"/a/çfoo/abãc"'
done
[0 root@qeos-82 bug1413716]# bash iso.sh > a.sh
[0 root@qeos-82 bug1413716]# ksh -x a.sh 2>&1 | tail -5
+ VAR65532=$'/a/\u[e7]foo/ab\u[e3]c'
+ VAR65533=$'/a/\u[e7]foo/ab\u[e3]c'
+ VAR65534=$'/a/\u[e7]foo/ab\u[e3]c'
+ VAR65535=$'/a/\u[e7]foo/ab\u[e3]c'
+ VAR65536=$'/a/\u[e7]foo/ab\u[e3]c'
[0 root@qeos-82 bug1413716]# 


Am I missing something in the reproducer steps ?

Comment 4 Paulo Andrade 2017-01-17 11:38:02 UTC
  Hi Sitesh,

  Somehow it got converted to utf8. Make sure
the iso.sh file has iso8859-1 characters.

  For example, the "ç" character, in utf8 ksh
shows it embedded in strings as "\u[e3]", but
if in iso8859-1 it shows as "\xe3".

  The original problem is due to a system that
creates ksh scripts dynamically, but uses iso
encoding...

Comment 5 Paulo Andrade 2017-01-17 11:38:56 UTC
Created attachment 1241680 [details]
iso_and_utf_sh.tar

I think it was bugzilla that converted it because I selected text mode...

Comment 7 Siteshwar Vashisht 2017-01-30 16:19:27 UTC
Paulo,

Thanks! I am able to reproduce it with attachment from comment 5.

Comment 10 Siteshwar Vashisht 2017-01-31 13:36:45 UTC
Upstream discussion http://lists.research.att.com/pipermail/ast-users/2017q1/004806.html

Comment 13 Siteshwar Vashisht 2017-04-25 11:05:06 UTC
Patch mentioned in comment 12 breaks if we increase size of input file to ksh by increasing length of loop in iso.sh.