Bug 195972 - tcsh script error with multibye-char.
Summary: tcsh script error with multibye-char.
Alias: None
Product: Fedora
Classification: Fedora
Component: tcsh   
(Show other bugs)
Version: 5
Hardware: i686
OS: Linux
Target Milestone: ---
Assignee: Miloslav Trmač
QA Contact: Bill Huang
Depends On:
TreeView+ depends on / blocked
Reported: 2006-06-20 01:35 UTC by Nobuhiro Kuhara
Modified: 2007-11-30 22:11 UTC (History)
1 user (show)

Fixed In Version: 6.14-6.fc5.2
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2006-07-24 00:59:17 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
tcsh script (EUC_jp character coexists) (5.96 KB, text/plain)
2006-06-20 01:35 UTC, Nobuhiro Kuhara
no flags Details
Patch to fix the problem. (1.94 KB, patch)
2006-06-29 06:34 UTC, s_h_o_
no flags Details | Diff
Document and fix buffer offset counting (773 bytes, patch)
2006-07-03 01:26 UTC, Miloslav Trmač
no flags Details | Diff
Corrected patch (1.77 KB, patch)
2006-07-03 01:40 UTC, Miloslav Trmač
no flags Details | Diff
Updated patch, to handle encoding mismatches better (2.13 KB, patch)
2006-07-05 14:52 UTC, Miloslav Trmač
no flags Details | Diff

Description Nobuhiro Kuhara 2006-06-20 01:35:49 UTC
Description of problem:
 An unjust error occurs in tcsh

Version-Release number of selected component (if applicable):

How reproducible:
 Always occurred

Steps to Reproduce:
1.Make a shell script with multibyte character.
  I send an example together.(It's EUC_jp environment.)

2.Excecute this script.
3.An error occurs.
Actual results:
 "end: while/foreachã®ä¸­ã§ã¯ããã¾ããã"
 I do not understand how it is displayed in English,sorry.

Expected results:
 normal end.

Additional info:

Comment 1 Nobuhiro Kuhara 2006-06-20 01:35:50 UTC
Created attachment 131163 [details]
tcsh script (EUC_jp character coexists)

Comment 2 Mamoru TASAKA 2006-06-20 03:21:16 UTC
The error occurs as same on development version tcsh-6.14-8
( and FC5 tcsh tcsh-6.14-6.fc5.1 ).

Note: if I convert the attachment 131163 [details] from EUCJP to UTF-8 and execute it,
the another error message is added.

[tasaka1@localhost Linux]$ iconv -f eucjp -t utf8 attachment.cgi > TEMP.csh
[tasaka1@localhost Linux]$ tcsh ./TEMP.csh
�ãããããããããããããããããããããããããããããããããã: ã³ãã³ã
end: while/foreachã®ä¸­ã§ã¯ããã¾ãã.

("ã³ãã³ããè¦ã¤ããã¾ãã" means in English "command not found", and
"while/foreachã®ä¸­ã§ã¯ããã¾ãã."  means "Not in while/foreach").

Comment 3 s_h_o_ 2006-06-29 06:32:13 UTC
The cause of this problem seems wrong calculation of l->f_seek in the function 
btell in sh.lex.c.

btell(l) {
        if (cantell && fseekp >= fbobp && fseekp < feobp) {
            size_t i;

            l->f_seek = fbobp;
            for (i = 0; i < fseekp - fbobp; i++)
                l->f_seek += fclens[i];
        } else

l->f_seek represents byte position. To handle the case when one character has 
more than one bytes (multi-byte character), it uses the array fclens which hold 
the number of bytes in each character.

In this code segment, the initial value of l->f_seek is set to fbobp. This is 
OK at first when fbobp is zero. However, when the first buffer(size is 4096) 
has been used up, the second buffer is used and the pointers are updated as 
follows in the function bgetc.

bgetc() {
        if (fseekp == feobp) {
            fbobp = feobp;

This update of fbobp is necessary for the other part of the program. However, 
the updated fbobp cannot be used as an initial value for the calculation of l-
>f_seek in btell, because the updated fbobp does not represent 'byte position' 
anymore. It represents 'character position'. 

I made a patch to correct the problem. The patch adds a new pointer fbobp2 
which hold byte position. Some of the fbobp in the program are replaced by 
fbobp2 by this patch. 

Comment 4 s_h_o_ 2006-06-29 06:34:08 UTC
Created attachment 131717 [details]
Patch to fix the problem.

Comment 5 Miloslav Trmač 2006-07-03 01:26:34 UTC
Created attachment 131869 [details]
Document and fix buffer offset counting

Thanks for the patch, I'm testing the attached one - the idea is a bit more
complicated, but the resulting code is simpler.

If you have other scripts affected by the bug, can you please test them with
this patch?

Also, can we add the test case in attachment 131163 [details] to the tcsh test suite, to
be distributed with tcsh under a BSD license, please?

Comment 6 Miloslav Trmač 2006-07-03 01:40:44 UTC
Created attachment 131870 [details]
Corrected patch

I'm sorry, attachment 131869 [details] is an incomplete working version; please test this

Comment 7 Nobuhiro Kuhara 2006-07-04 05:36:43 UTC
I am sorry that an answer is late. 
I tested it, but there was not the problem. 
Thank you.

Comment 8 Mamoru TASAKA 2006-07-04 09:54:18 UTC

Sorry to respond late, however, even if I applied the patch 
(attachment 131870 [details]) ,  the problem persists.

If I convert the test script (attachment 131870 [details]) from EUC-JP to UTF-8,
the problem seems to be disappeared (if not applying the patch, the problem
appears even on UTF-8 script). Is this only for me?

Comment 9 Mamoru TASAKA 2006-07-04 09:55:34 UTC

In the comment 8, I meant that the test script is the attachment 131163 [details].

Comment 10 Nobuhiro Kuhara 2006-07-05 05:48:41 UTC
>Also, can we add the test case in attachment 131163 [details] [edit] to the tcsh test 
>suite, to
>be distributed with tcsh under a BSD license, please?

Yes,Of course.

Comment 11 Mamoru TASAKA 2006-07-05 13:44:05 UTC
More precisely, with the patch (attachment 131870 [details]) applied, I meet with below:

[tasaka2@localhost tmp]$ echo $LANG
[tasaka2@localhost tmp]$ ./test-EUCJP.sh
end: while/foreachã®ä¸­ã§ã¯ããã¾ãã.
[tasaka2@localhost tmp]$ ./test-ISO2022JP.sh
[tasaka2@localhost tmp]$ ./test-UTF8.sh
[tasaka2@localhost tmp]$ env LC_ALL=ja_JP.eucJP ./test-EUCJP.sh
[tasaka2@localhost tmp]$ env LC_ALL=ja_JP.eucJP ./test-ISO2022JP.sh
[tasaka2@localhost tmp]$ env LC_ALL=ja_JP.eucJP ./test-UTF8.sh
(character corrupted here): ã³ãã³ããè¦ã¤ããã¾ãã.
end: while/foreachã®ä¸­ã§ã¯ããã¾ãã.

(Note: test-EUCJP.sh is the same as attachment 131163 [details], and test-ISO2022JP.sh,
test-UTF8.sh is the csh script with its charater coding converted into 
ISO-2022-JP and UTF-8)

Comment 12 Miloslav Trmač 2006-07-05 14:52:48 UTC
Created attachment 131932 [details]
Updated patch, to handle encoding mismatches better

Thanks for testing.

Technically, using files in a different encoding than the one specified by
LC_CTYPE is purely an user error - nevertheless it is easy enough to fix for
scripts, please test the updated patch.

This won't help if the script is not in a seekable file, e.g. in
(cat my_script | tcsh), though.

Comment 13 Nobuhiro Kuhara 2006-07-06 04:11:38 UTC

I was not able to patched a file that is "sh.lex.c"


|diff -u tcsh-6.14.00/sh.lex.c tcsh-6.14.00/sh.lex.c
|-- tcsh-6.14.00/sh.lex.c     2006-07-03 03:46:11.000000000 +0200 <-???miss??? 
|++ tcsh-6.14.00/sh.lex.c     2006-07-05 03:46:11.000000000 +0200

Hunk #1 FAILD at 1736.
1 out of 3 hunks FAILD -- saving rejects to file tcsh-6.14.00/sh.lex.c.rej

Thank you.

Comment 14 Mamoru TASAKA 2006-07-06 05:06:10 UTC

I applied the attachment 131932 [details] against rawhide tcsh-6.14-8 and seems to work
WELL for all the case I commented in the comment 11.


Comment 15 Miloslav Trmač 2006-07-06 15:53:54 UTC
Nobuhiro-san, the patch applies cleanly for me to the result of (rpmbuild -bp),
after tcsh-6.14.00-wide-crash.patch (which also touches wide_read()), among

Have you perhaps tried to patch the original tcsh-6.14.00 tarball?  

Comment 16 Nobuhiro Kuhara 2006-07-10 01:04:55 UTC
>Have you perhaps tried to patch the original tcsh-6.14.00 tarball?

I worked normally when a patch was successful and checked it.

Thank you so match!!

Comment 17 Miloslav Trmač 2006-07-10 21:46:01 UTC
Thanks for all the testing, tcsh-6.14-6.fc5.2 is now in the updates-testing

Comment 18 Miloslav Trmač 2006-07-24 00:59:17 UTC
... and published as a final update.

Note You need to log in before you can comment on or make changes to this bug.