Bug 1413716

Summary:

RFE: bug on input buffer boundary and/or temporary composing buffer of multibyte characters

Product:

Red Hat Enterprise Linux 7

Reporter:

Paulo Andrade <pandrade>

Component:

ksh

Assignee:

Siteshwar Vashisht <svashisht>

Status:

CLOSED WONTFIX

QA Contact:

BaseOS QE - Apps <qe-baseos-apps>

Severity:

low

Docs Contact:

Priority:

low

Version:

7.2

CC:

jkejda, pandrade, srandhaw, zpytela

Target Milestone:

Keywords:

FutureFeature

Target Release:

7.4

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Clones:

1417886 (view as bug list)

Environment:

Last Closed:

2017-04-25 12:35:30 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1417886, 1420851

Attachments:

Description	Flags
ksh-20120801-mbchar.patch	none
iso.sh	none
utf.sh	none
iso_and_utf_sh.tar	none

Description Paulo Andrade 2017-01-16 19:03:51 UTC

Created attachment 1241368 [details]
ksh-20120801-mbchar.patch

If iso8859-x characters are found in certain positions of
input, ksh parsing may get confused.

The problem is that the parser frequently reads a byte
with fcmbget() to "peek" the next input character, and then
then calls fcseek(-LEN) where LEN is the amount of bytes read,
to reset the input.

The problem is that _fcmbget() has an local static buffer to
compose multibyte characters, and fcseek() does not know about
it.

The code is far more complex than just needing to make the
"compose buffer" in _fcmbget() file static, make fcseek() a
function, etc.

The proposed patch works for the test case where it causes the
problem being reported, as well as for utf8 characters, encoding
latin-n characters.

Test cases follow with explanations, as attachments.

*Note* that this patch right now is an RFE, as it might be
required more tests to validate it.

Comment 1 Paulo Andrade 2017-01-16 19:09:17 UTC

Created attachment 1241369 [details]
iso.sh

  Test example:

# Need to use bash, or ksh with proposed patch
$ bash iso.sh > a.sh
$ ksh -x a.sh 2>&1 | tail -5
+ VAR8593=$'/a/\xe7foo/ab\xe3c'
+ VAR8594=$'/a/\xe7foo/ab\xe3c'
+ VAR8595=$'/a/\xe7foo/ab\xe3c'
+ VAR8596=$'/a/\xe7foo/ab\xe3c\nVAR8597=/a/\xe7foo/ab\xe3c'
a.sh: line 8596: ": invalid variable name

with ksh with proposed patch it works.

Comment 2 Paulo Andrade 2017-01-16 19:13:06 UTC

Created attachment 1241371 [details]
utf.sh

  This example is just to validate that with or without the
proposed patch, the output is the same, for example:

$ bash /tmp/utf.sh > a.sh
$ tail -3 a.sh 
VAR65534="/a/çfoo/abãc"
VAR65535="/a/çfoo/abãc"
VAR65536="/a/çfoo/abãc"
$ ksh -x a.sh > old.sh 2>&1
$ arch/linux.i386-64/src/cmd/ksh93/ksh -x a.sh > new.sh 2>&1
$ tail -3 old.sh
+ VAR65534=$'/a/\u[e7]foo/ab\u[e3]c'
+ VAR65535=$'/a/\u[e7]foo/ab\u[e3]c'
+ VAR65536=$'/a/\u[e7]foo/ab\u[e3]c'
$ diff -u old.sh new.sh
<<empty>>

Comment 3 Siteshwar Vashisht 2017-01-17 08:31:35 UTC

Paulo,

I tried to reproduce this bug on a vanilla rhel-7.3 system, but I did not get the error 'a.sh: line 8596: ": invalid variable name'. 

Here is the output from my system :

[0 root@qeos-82 bug1413716]# rpm -q ksh
ksh-20120801-26.el7.x86_64
[0 root@qeos-82 bug1413716]# cat iso.sh 
#!/bin/sh

for i in $(seq 1 65536); do
    echo "VAR$i="'"/a/çfoo/abãc"'
done
[0 root@qeos-82 bug1413716]# bash iso.sh > a.sh
[0 root@qeos-82 bug1413716]# ksh -x a.sh 2>&1 | tail -5
+ VAR65532=$'/a/\u[e7]foo/ab\u[e3]c'
+ VAR65533=$'/a/\u[e7]foo/ab\u[e3]c'
+ VAR65534=$'/a/\u[e7]foo/ab\u[e3]c'
+ VAR65535=$'/a/\u[e7]foo/ab\u[e3]c'
+ VAR65536=$'/a/\u[e7]foo/ab\u[e3]c'
[0 root@qeos-82 bug1413716]# 


Am I missing something in the reproducer steps ?

Comment 4 Paulo Andrade 2017-01-17 11:38:02 UTC

  Hi Sitesh,

  Somehow it got converted to utf8. Make sure
the iso.sh file has iso8859-1 characters.

  For example, the "ç" character, in utf8 ksh
shows it embedded in strings as "\u[e3]", but
if in iso8859-1 it shows as "\xe3".

  The original problem is due to a system that
creates ksh scripts dynamically, but uses iso
encoding...

Comment 5 Paulo Andrade 2017-01-17 11:38:56 UTC

Created attachment 1241680 [details]
iso_and_utf_sh.tar

I think it was bugzilla that converted it because I selected text mode...

Comment 7 Siteshwar Vashisht 2017-01-30 16:19:27 UTC

Paulo,

Thanks! I am able to reproduce it with attachment from comment 5.

Comment 10 Siteshwar Vashisht 2017-01-31 13:36:45 UTC

Upstream discussion http://lists.research.att.com/pipermail/ast-users/2017q1/004806.html

Comment 12 Siteshwar Vashisht 2017-04-24 22:33:07 UTC

https://www.mail-archive.com/ast-developers@lists.research.att.com/msg01953.html

Comment 13 Siteshwar Vashisht 2017-04-25 11:05:06 UTC

Patch mentioned in comment 12 breaks if we increase size of input file to ksh by increasing length of loop in iso.sh.