Bug 614000

Summary:	Bad code of Chinese character "料" caused the error ENOENT when use smaba
Product:	Red Hat Enterprise Linux 4	Reporter:	Mark Wu <dwu>
Component:	kernel	Assignee:	Red Hat Kernel Manager <kernel-mgr>
Status:	CLOSED WONTFIX	QA Contact:	Jian Li <jiali>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	4.7	CC:	jiali, jwest, nmurray, tao, yqu
Target Milestone:	rc
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2012-06-14 19:54:30 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Mark Wu 2010-07-13 13:07:56 UTC

Description of problem:
When mounting a smaba share on windows with the option "iocharset=gb2312", it failed to copy a file whose name contains the Chinese character "料" with the error "No such file". But it successes without the option "iocharset=gb2312",

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.mount -t cifs -o  username=deployop,password=Cnsz030471,uid=deployop,gid=deploy,iocharset=gb2312    //10.25.14.185/pa_rel /home/yanbo/testdir
2. cp icss_cccs_report移交材料清单_1.17.0.4.xls /tmp
3.
  
Actual results:
Failed with ENOENT

Expected results:


Additional info:
It also successes when use smbfs as filesystem type instead of cifs,  even with iocharset=gb2312.

Comment 1 Mark Wu 2010-07-13 13:09:39 UTC

From the tcpdump with "iocharset"

2290 2010-07-12 14:14:05.245026 10.11.77.19 10.25.14.185 SMB Trans2 Request, QUERY_PATH_INFO, Query File All Info, Path: \icss\icss_cccs_report_rel\1.17.0\1.17.0.4\icss_cccs_report??????_1.17.0.4.xls

[mark@dhcp-129-118 cifs]$ hexdump -C filename_with_iocharset
00000000  5c 00 69 00 63 00 73 00  73 00 5c 00 69 00 63 00  |\.i.c.s.s.\.i.c.|
00000010  73 00 73 00 5f 00 63 00  63 00 63 00 73 00 5f 00  |s.s._.c.c.c.s._.|
00000020  72 00 65 00 70 00 6f 00  72 00 74 00 5f 00 72 00  |r.e.p.o.r.t._.r.|
00000030  65 00 6c 00 5c 00 31 00  2e 00 31 00 37 00 2e 00  |e.l.\.1...1.7...|
00000040  30 00 5c 00 31 00 2e 00  31 00 37 00 2e 00 30 00  |0.\.1...1.7...0.|
00000050  2e 00 34 00 5c 00 69 00  63 00 73 00 73 00 5f 00  |..4.\.i.c.s.s._.|
00000060  63 00 63 00 63 00 73 00  5f 00 72 00 65 00 70 00  |c.c.c.s._.r.e.p.|
00000070  6f 00 72 00 74 00 fb 79  a4 4e 50 67 be f9 05 6e  |o.r.t..y.NPg...n|
00000080  55 53 5f 00 31 00 2e 00  31 00 37 00 2e 00 30 00  |US_.1...1.7...0.|
00000090  2e 00 34 00 2e 00 78 00  6c 00 73 00 00 00        |..4...x.l.s...|

From the tcpdump without "iocharset":

No.     Time                       Source                Destination           Protocol Info
  5016 2010-07-12 14:19:20.862838 10.11.77.19           10.25.14.185          SMB      Trans2 Request, QUERY_PATH_INFO, Query File All Info, Path: \icss\icss_cccs_report_rel\1.17.0\1.17.0.4\icss_cccs_report??????_1.17.0.4.xls
mark@dhcp-129-118 cifs]$ hexdump -C filename_without_iocharset
00000000  5c 00 69 00 63 00 73 00  73 00 5c 00 69 00 63 00  |\.i.c.s.s.\.i.c.|
00000010  73 00 73 00 5f 00 63 00  63 00 63 00 73 00 5f 00  |s.s._.c.c.c.s._.|
00000020  72 00 65 00 70 00 6f 00  72 00 74 00 5f 00 72 00  |r.e.p.o.r.t._.r.|
00000030  65 00 6c 00 5c 00 31 00  2e 00 31 00 37 00 2e 00  |e.l.\.1...1.7...|
00000040  30 00 5c 00 31 00 2e 00  31 00 37 00 2e 00 30 00  |0.\.1...1.7...0.|
00000050  2e 00 34 00 5c 00 69 00  63 00 73 00 73 00 5f 00  |..4.\.i.c.s.s._.|
00000060  63 00 63 00 63 00 73 00  5f 00 72 00 65 00 70 00  |c.c.c.s._.r.e.p.|
00000070  6f 00 72 00 74 00 fb 79  a4 4e 50 67 99 65 05 6e  |o.r.t..y.NPg.e.n|
00000080  55 53 5f 00 31 00 2e 00  31 00 37 00 2e 00 30 00  |US_.1...1.7...0.|
00000090  2e 00 34 00 2e 00 78 00  6c 00 73 00 00 00        |..4...x.l.s...|

Comparing the field "filename" in this two QUERY_PATH_INFO packets above, we can find that the codes of character "料" are different. In the bad situation, it is translated to "be f9", while in the good one is "99 65". We can verify it with the following test, so the character "料" should be translated to "99 65", not "be f9"
#echo "移交材料清单"|iconv -t unicode|hexdump -C
00000000  ff fe fb 79 a4 4e 50 67  99 65 05 6e 55 53 0a 00  |...y.NPg.e.nUS..|


Have a check with the translation table of cp936 in kernel. Before looking into it, we need get the gb2312 code of "料":
#echo "移交材料清单"|iconv -t gb2312|hexdump -C
00000000  d2 c6 bd bb b2 c4 c1 cf c7 e5 b5 a5 0a           |.............|

So the code "料" in gb2312 is "c1 cf", then we can use the high byte "c1" to locate the table, and use low byte "cf" as an index in the table to translate the character from gb2312 to unicode.

In 2.6.9-78 kernel,
static wchar_t c2u_C1[256] = {
     ...
       0x5BE5,0x8FBD,0x6F66,0xF9BA,0x6482,0x9563,0x5ED6,0xF9BE,/* 0xC8-0xCF */
     ...
};
It's 0xF9BE, which is consistent with the binary of filename in dump in the condition that the byte order is ignored. But in RHEL5 kernel, the code has been fixed to the good one "0x6599"

static wchar_t c2u_C1[256] = {
     ...
       0x5BE5,0x8FBD,0x6F66,0x4E86,0x6482,0x9563,0x5ED6,0x6599,/* 0xC8-0xCF */
     ...
};