195972 – tcsh script error with multibye-char.

Bug 195972 - tcsh script error with multibye-char.

Summary: tcsh script error with multibye-char.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	tcsh
Sub Component:
Version:	5
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Miloslav Trmač
QA Contact:	Bill Huang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2006-06-20 01:35 UTC by Nobuhiro Kuhara
Modified:	2007-11-30 22:11 UTC (History)
CC List:	1 user (show)
Fixed In Version:	6.14-6.fc5.2
Clone Of:
Environment:
Last Closed:	2006-07-24 00:59:17 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
tcsh script (EUC_jp character coexists) (5.96 KB, text/plain) 2006-06-20 01:35 UTC, Nobuhiro Kuhara	no flags	Details
Patch to fix the problem. (1.94 KB, patch) 2006-06-29 06:34 UTC, s_h_o_	no flags	Details \| Diff
Document and fix buffer offset counting (773 bytes, patch) 2006-07-03 01:26 UTC, Miloslav Trmač	no flags	Details \| Diff
Corrected patch (1.77 KB, patch) 2006-07-03 01:40 UTC, Miloslav Trmač	no flags	Details \| Diff
Updated patch, to handle encoding mismatches better (2.13 KB, patch) 2006-07-05 14:52 UTC, Miloslav Trmač	no flags	Details \| Diff
Show Obsolete (2) View All

Description Nobuhiro Kuhara 2006-06-20 01:35:49 UTC

Description of problem:
 An unjust error occurs in tcsh

Version-Release number of selected component (if applicable):
 tcsh-6.14.05

How reproducible:
 Always occurred

Steps to Reproduce:
1.Make a shell script with multibyte character.
  I send an example together.(It's EUC_jp environment.)

2.Excecute this script.
3.An error occurs.
  
Actual results:
 "end: while/foreachã®ä¸ã§ã¯ããã¾ããã"
 I do not understand how it is displayed in English,sorry.

Expected results:
 normal end.

Additional info:

Comment 1 Nobuhiro Kuhara 2006-06-20 01:35:50 UTC

Created attachment 131163 [details]
tcsh script (EUC_jp character coexists)

Comment 2 Mamoru TASAKA 2006-06-20 03:21:16 UTC

The error occurs as same on development version tcsh-6.14-8
( and FC5 tcsh tcsh-6.14-6.fc5.1 ).

Note: if I convert the attachment 131163 [details] from EUCJP to UTF-8 and execute it,
the another error message is added.

[tasaka1@localhost Linux]$ iconv -f eucjp -t utf8 attachment.cgi > TEMP.csh
[tasaka1@localhost Linux]$ tcsh ./TEMP.csh
ï¿½ãããããããããããããããããããããããããããããããããã: ã³ãã³ã
ãè¦ã¤ããã¾ãã.
end: while/foreachã®ä¸ã§ã¯ããã¾ãã.

("ã³ãã³ããè¦ã¤ããã¾ãã" means in English "command not found", and
"while/foreachã®ä¸ã§ã¯ããã¾ãã."  means "Not in while/foreach").

Comment 3 s_h_o_ 2006-06-29 06:32:13 UTC

The cause of this problem seems wrong calculation of l->f_seek in the function 
btell in sh.lex.c.

btell(l) {
 ...
#ifdef WIDE_STRINGS
        if (cantell && fseekp >= fbobp && fseekp < feobp) {
            size_t i;

            l->f_seek = fbobp;
            for (i = 0; i < fseekp - fbobp; i++)
                l->f_seek += fclens[i];
        } else
#endif

l->f_seek represents byte position. To handle the case when one character has 
more than one bytes (multi-byte character), it uses the array fclens which hold 
the number of bytes in each character.

In this code segment, the initial value of l->f_seek is set to fbobp. This is 
OK at first when fbobp is zero. However, when the first buffer(size is 4096) 
has been used up, the second buffer is used and the pointers are updated as 
follows in the function bgetc.

bgetc() {
 ...
        if (fseekp == feobp) {
            fbobp = feobp;

This update of fbobp is necessary for the other part of the program. However, 
the updated fbobp cannot be used as an initial value for the calculation of l-
>f_seek in btell, because the updated fbobp does not represent 'byte position' 
anymore. It represents 'character position'. 

I made a patch to correct the problem. The patch adds a new pointer fbobp2 
which hold byte position. Some of the fbobp in the program are replaced by 
fbobp2 by this patch.

Comment 4 s_h_o_ 2006-06-29 06:34:08 UTC

Created attachment 131717 [details]
Patch to fix the problem.

Comment 5 Miloslav Trmač 2006-07-03 01:26:34 UTC

Created attachment 131869 [details]
Document and fix buffer offset counting

Thanks for the patch, I'm testing the attached one - the idea is a bit more
complicated, but the resulting code is simpler.

If you have other scripts affected by the bug, can you please test them with
this patch?

Also, can we add the test case in attachment 131163 [details] to the tcsh test suite, to
be distributed with tcsh under a BSD license, please?

Comment 6 Miloslav Trmač 2006-07-03 01:40:44 UTC

Created attachment 131870 [details]
Corrected patch

I'm sorry, attachment 131869 [details] is an incomplete working version; please test this
one.

Comment 7 Nobuhiro Kuhara 2006-07-04 05:36:43 UTC

I am sorry that an answer is late. 
I tested it, but there was not the problem. 
Thank you.

Comment 8 Mamoru TASAKA 2006-07-04 09:54:18 UTC

Umm...

Sorry to respond late, however, even if I applied the patch 
(attachment 131870 [details]) ,  the problem persists.

If I convert the test script (attachment 131870 [details]) from EUC-JP to UTF-8,
the problem seems to be disappeared (if not applying the patch, the problem
appears even on UTF-8 script). Is this only for me?

Comment 9 Mamoru TASAKA 2006-07-04 09:55:34 UTC

Sorry:

In the comment 8, I meant that the test script is the attachment 131163 [details].

Comment 10 Nobuhiro Kuhara 2006-07-05 05:48:41 UTC

>Also, can we add the test case in attachment 131163 [details] [edit] to the tcsh test 
>suite, to
>be distributed with tcsh under a BSD license, please?

Yes,Of course.

Comment 11 Mamoru TASAKA 2006-07-05 13:44:05 UTC

More precisely, with the patch (attachment 131870 [details]) applied, I meet with below:

[tasaka2@localhost tmp]$ echo $LANG
ja_JP.UTF-8
[tasaka2@localhost tmp]$ ./test-EUCJP.sh
end: while/foreachã®ä¸ã§ã¯ããã¾ãã.
[tasaka2@localhost tmp]$ ./test-ISO2022JP.sh
AAA
[tasaka2@localhost tmp]$ ./test-UTF8.sh
AAA
[tasaka2@localhost tmp]$ env LC_ALL=ja_JP.eucJP ./test-EUCJP.sh
AAA
[tasaka2@localhost tmp]$ env LC_ALL=ja_JP.eucJP ./test-ISO2022JP.sh
AAA
[tasaka2@localhost tmp]$ env LC_ALL=ja_JP.eucJP ./test-UTF8.sh
(character corrupted here): ã³ãã³ããè¦ã¤ããã¾ãã.
end: while/foreachã®ä¸ã§ã¯ããã¾ãã.

(Note: test-EUCJP.sh is the same as attachment 131163 [details], and test-ISO2022JP.sh,
test-UTF8.sh is the csh script with its charater coding converted into 
ISO-2022-JP and UTF-8)

Comment 12 Miloslav Trmač 2006-07-05 14:52:48 UTC

Created attachment 131932 [details]
Updated patch, to handle encoding mismatches better

Thanks for testing.

Technically, using files in a different encoding than the one specified by
LC_CTYPE is purely an user error - nevertheless it is easy enough to fix for
scripts, please test the updated patch.

This won't help if the script is not in a seekable file, e.g. in
(cat my_script | tcsh), though.

Comment 13 Nobuhiro Kuhara 2006-07-06 04:11:38 UTC

Umm...,

I was not able to patched a file that is "sh.lex.c"

Message:

|diff -u tcsh-6.14.00/sh.lex.c tcsh-6.14.00/sh.lex.c
|-- tcsh-6.14.00/sh.lex.c     2006-07-03 03:46:11.000000000 +0200 <-???miss??? 
|++ tcsh-6.14.00/sh.lex.c     2006-07-05 03:46:11.000000000 +0200

Hunk #1 FAILD at 1736.
1 out of 3 hunks FAILD -- saving rejects to file tcsh-6.14.00/sh.lex.c.rej

Thank you.

Comment 14 Mamoru TASAKA 2006-07-06 05:06:10 UTC

Hi:

I applied the attachment 131932 [details] against rawhide tcsh-6.14-8 and seems to work
WELL for all the case I commented in the comment 11.

Thanks!!

Comment 15 Miloslav Trmač 2006-07-06 15:53:54 UTC

Nobuhiro-san, the patch applies cleanly for me to the result of (rpmbuild -bp),
after tcsh-6.14.00-wide-crash.patch (which also touches wide_read()), among
others.

Have you perhaps tried to patch the original tcsh-6.14.00 tarball?

Comment 16 Nobuhiro Kuhara 2006-07-10 01:04:55 UTC

>Have you perhaps tried to patch the original tcsh-6.14.00 tarball?

I worked normally when a patch was successful and checked it.

Thank you so match!!

Comment 17 Miloslav Trmač 2006-07-10 21:46:01 UTC

Thanks for all the testing, tcsh-6.14-6.fc5.2 is now in the updates-testing
repository.

Comment 18 Miloslav Trmač 2006-07-24 00:59:17 UTC

... and published as a final update.

Note You need to log in before you can comment on or make changes to this bug.