Bug 614000
Summary: | Bad code of Chinese character "料" caused the error ENOENT when use smaba | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Mark Wu <dwu> |
Component: | kernel | Assignee: | Red Hat Kernel Manager <kernel-mgr> |
Status: | CLOSED WONTFIX | QA Contact: | Jian Li <jiali> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4.7 | CC: | jiali, jwest, nmurray, tao, yqu |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2012-06-14 19:54:30 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Mark Wu
2010-07-13 13:07:56 UTC
From the tcpdump with "iocharset" 2290 2010-07-12 14:14:05.245026 10.11.77.19 10.25.14.185 SMB Trans2 Request, QUERY_PATH_INFO, Query File All Info, Path: \icss\icss_cccs_report_rel\1.17.0\1.17.0.4\icss_cccs_report??????_1.17.0.4.xls [mark@dhcp-129-118 cifs]$ hexdump -C filename_with_iocharset 00000000 5c 00 69 00 63 00 73 00 73 00 5c 00 69 00 63 00 |\.i.c.s.s.\.i.c.| 00000010 73 00 73 00 5f 00 63 00 63 00 63 00 73 00 5f 00 |s.s._.c.c.c.s._.| 00000020 72 00 65 00 70 00 6f 00 72 00 74 00 5f 00 72 00 |r.e.p.o.r.t._.r.| 00000030 65 00 6c 00 5c 00 31 00 2e 00 31 00 37 00 2e 00 |e.l.\.1...1.7...| 00000040 30 00 5c 00 31 00 2e 00 31 00 37 00 2e 00 30 00 |0.\.1...1.7...0.| 00000050 2e 00 34 00 5c 00 69 00 63 00 73 00 73 00 5f 00 |..4.\.i.c.s.s._.| 00000060 63 00 63 00 63 00 73 00 5f 00 72 00 65 00 70 00 |c.c.c.s._.r.e.p.| 00000070 6f 00 72 00 74 00 fb 79 a4 4e 50 67 be f9 05 6e |o.r.t..y.NPg...n| 00000080 55 53 5f 00 31 00 2e 00 31 00 37 00 2e 00 30 00 |US_.1...1.7...0.| 00000090 2e 00 34 00 2e 00 78 00 6c 00 73 00 00 00 |..4...x.l.s...| From the tcpdump without "iocharset": No. Time Source Destination Protocol Info 5016 2010-07-12 14:19:20.862838 10.11.77.19 10.25.14.185 SMB Trans2 Request, QUERY_PATH_INFO, Query File All Info, Path: \icss\icss_cccs_report_rel\1.17.0\1.17.0.4\icss_cccs_report??????_1.17.0.4.xls mark@dhcp-129-118 cifs]$ hexdump -C filename_without_iocharset 00000000 5c 00 69 00 63 00 73 00 73 00 5c 00 69 00 63 00 |\.i.c.s.s.\.i.c.| 00000010 73 00 73 00 5f 00 63 00 63 00 63 00 73 00 5f 00 |s.s._.c.c.c.s._.| 00000020 72 00 65 00 70 00 6f 00 72 00 74 00 5f 00 72 00 |r.e.p.o.r.t._.r.| 00000030 65 00 6c 00 5c 00 31 00 2e 00 31 00 37 00 2e 00 |e.l.\.1...1.7...| 00000040 30 00 5c 00 31 00 2e 00 31 00 37 00 2e 00 30 00 |0.\.1...1.7...0.| 00000050 2e 00 34 00 5c 00 69 00 63 00 73 00 73 00 5f 00 |..4.\.i.c.s.s._.| 00000060 63 00 63 00 63 00 73 00 5f 00 72 00 65 00 70 00 |c.c.c.s._.r.e.p.| 00000070 6f 00 72 00 74 00 fb 79 a4 4e 50 67 99 65 05 6e |o.r.t..y.NPg.e.n| 00000080 55 53 5f 00 31 00 2e 00 31 00 37 00 2e 00 30 00 |US_.1...1.7...0.| 00000090 2e 00 34 00 2e 00 78 00 6c 00 73 00 00 00 |..4...x.l.s...| Comparing the field "filename" in this two QUERY_PATH_INFO packets above, we can find that the codes of character "料" are different. In the bad situation, it is translated to "be f9", while in the good one is "99 65". We can verify it with the following test, so the character "料" should be translated to "99 65", not "be f9" #echo "移交材料清单"|iconv -t unicode|hexdump -C 00000000 ff fe fb 79 a4 4e 50 67 99 65 05 6e 55 53 0a 00 |...y.NPg.e.nUS..| Have a check with the translation table of cp936 in kernel. Before looking into it, we need get the gb2312 code of "料": #echo "移交材料清单"|iconv -t gb2312|hexdump -C 00000000 d2 c6 bd bb b2 c4 c1 cf c7 e5 b5 a5 0a |.............| So the code "料" in gb2312 is "c1 cf", then we can use the high byte "c1" to locate the table, and use low byte "cf" as an index in the table to translate the character from gb2312 to unicode. In 2.6.9-78 kernel, static wchar_t c2u_C1[256] = { ... 0x5BE5,0x8FBD,0x6F66,0xF9BA,0x6482,0x9563,0x5ED6,0xF9BE,/* 0xC8-0xCF */ ... }; It's 0xF9BE, which is consistent with the binary of filename in dump in the condition that the byte order is ignored. But in RHEL5 kernel, the code has been fixed to the good one "0x6599" static wchar_t c2u_C1[256] = { ... 0x5BE5,0x8FBD,0x6F66,0x4E86,0x6482,0x9563,0x5ED6,0x6599,/* 0xC8-0xCF */ ... }; |