Bug 1413716
Summary: | RFE: bug on input buffer boundary and/or temporary composing buffer of multibyte characters | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Paulo Andrade <pandrade> | ||||||||||
Component: | ksh | Assignee: | Siteshwar Vashisht <svashisht> | ||||||||||
Status: | CLOSED WONTFIX | QA Contact: | BaseOS QE - Apps <qe-baseos-apps> | ||||||||||
Severity: | low | Docs Contact: | |||||||||||
Priority: | low | ||||||||||||
Version: | 7.2 | CC: | jkejda, pandrade, srandhaw, zpytela | ||||||||||
Target Milestone: | rc | Keywords: | FutureFeature | ||||||||||
Target Release: | 7.4 | ||||||||||||
Hardware: | All | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | |||||||||||||
: | 1417886 (view as bug list) | Environment: | |||||||||||
Last Closed: | 2017-04-25 12:35:30 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 1417886, 1420851 | ||||||||||||
Attachments: |
|
Created attachment 1241369 [details]
iso.sh
Test example:
# Need to use bash, or ksh with proposed patch
$ bash iso.sh > a.sh
$ ksh -x a.sh 2>&1 | tail -5
+ VAR8593=$'/a/\xe7foo/ab\xe3c'
+ VAR8594=$'/a/\xe7foo/ab\xe3c'
+ VAR8595=$'/a/\xe7foo/ab\xe3c'
+ VAR8596=$'/a/\xe7foo/ab\xe3c\nVAR8597=/a/\xe7foo/ab\xe3c'
a.sh: line 8596: ": invalid variable name
with ksh with proposed patch it works.
Created attachment 1241371 [details]
utf.sh
This example is just to validate that with or without the
proposed patch, the output is the same, for example:
$ bash /tmp/utf.sh > a.sh
$ tail -3 a.sh
VAR65534="/a/çfoo/abãc"
VAR65535="/a/çfoo/abãc"
VAR65536="/a/çfoo/abãc"
$ ksh -x a.sh > old.sh 2>&1
$ arch/linux.i386-64/src/cmd/ksh93/ksh -x a.sh > new.sh 2>&1
$ tail -3 old.sh
+ VAR65534=$'/a/\u[e7]foo/ab\u[e3]c'
+ VAR65535=$'/a/\u[e7]foo/ab\u[e3]c'
+ VAR65536=$'/a/\u[e7]foo/ab\u[e3]c'
$ diff -u old.sh new.sh
<<empty>>
Paulo, I tried to reproduce this bug on a vanilla rhel-7.3 system, but I did not get the error 'a.sh: line 8596: ": invalid variable name'. Here is the output from my system : [0 root@qeos-82 bug1413716]# rpm -q ksh ksh-20120801-26.el7.x86_64 [0 root@qeos-82 bug1413716]# cat iso.sh #!/bin/sh for i in $(seq 1 65536); do echo "VAR$i="'"/a/çfoo/abãc"' done [0 root@qeos-82 bug1413716]# bash iso.sh > a.sh [0 root@qeos-82 bug1413716]# ksh -x a.sh 2>&1 | tail -5 + VAR65532=$'/a/\u[e7]foo/ab\u[e3]c' + VAR65533=$'/a/\u[e7]foo/ab\u[e3]c' + VAR65534=$'/a/\u[e7]foo/ab\u[e3]c' + VAR65535=$'/a/\u[e7]foo/ab\u[e3]c' + VAR65536=$'/a/\u[e7]foo/ab\u[e3]c' [0 root@qeos-82 bug1413716]# Am I missing something in the reproducer steps ? Hi Sitesh, Somehow it got converted to utf8. Make sure the iso.sh file has iso8859-1 characters. For example, the "ç" character, in utf8 ksh shows it embedded in strings as "\u[e3]", but if in iso8859-1 it shows as "\xe3". The original problem is due to a system that creates ksh scripts dynamically, but uses iso encoding... Created attachment 1241680 [details]
iso_and_utf_sh.tar
I think it was bugzilla that converted it because I selected text mode...
Paulo, Thanks! I am able to reproduce it with attachment from comment 5. Upstream discussion http://lists.research.att.com/pipermail/ast-users/2017q1/004806.html Patch mentioned in comment 12 breaks if we increase size of input file to ksh by increasing length of loop in iso.sh. |
Created attachment 1241368 [details] ksh-20120801-mbchar.patch If iso8859-x characters are found in certain positions of input, ksh parsing may get confused. The problem is that the parser frequently reads a byte with fcmbget() to "peek" the next input character, and then then calls fcseek(-LEN) where LEN is the amount of bytes read, to reset the input. The problem is that _fcmbget() has an local static buffer to compose multibyte characters, and fcseek() does not know about it. The code is far more complex than just needing to make the "compose buffer" in _fcmbget() file static, make fcseek() a function, etc. The proposed patch works for the test case where it causes the problem being reported, as well as for utf8 characters, encoding latin-n characters. Test cases follow with explanations, as attachments. *Note* that this patch right now is an RFE, as it might be required more tests to validate it.