Bug 247489 - CIFS support for multibyte characters : euc-jp for this case.
Summary: CIFS support for multibyte characters : euc-jp for this case.
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.0
Hardware: All
OS: Linux
low
low
Target Milestone: ---
: ---
Assignee: Jeff Layton
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks: 425461 445799
TreeView+ depends on / blocked
 
Reported: 2007-07-09 16:03 UTC by Jose Plans
Modified: 2018-10-20 00:44 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-02-17 18:07:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
proof-of-concept patch 1 (7.56 KB, patch)
2008-07-14 19:11 UTC, Jeff Layton
no flags Details | Diff
patch: add codepage option to CIFS (104.50 KB, patch)
2008-09-11 13:33 UTC, Jeff Layton
no flags Details | Diff

Description Jose Plans 2007-07-09 16:03:57 UTC
Description of problem:

When a CIFS/SMB server non-unicode is configured with Extended Unix Code (EUC)
for Japanese and the server does not support Unicode,
CIFS cannot convert at all the filenames listed; in fact, it shows :

total 1024
-rwxrwSrwt 1 root root 24 May 16 19:18
<82><A8><82><AF><82><A2><82><B1><82>\u0302<A8><96><U+179C2><BF><8F><EE><95><F1>

At the console shown as :
total 1024
-rwxrwSrwt 1 root root 24 May 16 19:18 ???????????????????????

Trying to use 'iocharset' won't work as this is reserver for if the server knows
about unicode. Maybe CIFS needs a second option as smbfs to also specify the
server's codepage? (codepage=<codepage> // convert_cp() ).

In our case CIFS just dumps ascii, I confirmed this by using iconv against it too.

Tests made:

   LANG=jp_JP.UTF-8 mount...
   LANG=jp_JP.EUC-JP mount...
   
   CIFS 1.49 (rawhide) 2.6.22-0.21.rc7.git5.fc8
   srcversion: 30616BA7D30E1F22CF9B850
   
   and
   CIFS 1.45 (RHEL5) 2.6.18-8.1.6.el5
   srcversion:     C1BB8ABDC4FF6849DC27E08
   
How reproducible:
Always.

Steps to Reproduce:
* These steps reproduce the problem at your site.
(1) setup RHEL2.1 and samba
(2) setup smb.conf on the following
 ---
 [global]
       coding system = euc
       client code page = 932  
 ---
(3) create a shared folder on RHEL2.1's samba
(4) create a file including Japanese code in its filename
(5) mount the folder via cifs from RHEL5
 # mount -t cifs -o guest,iocharset=utf8 //xxx.xxx.xxx.xxx/share /mnt/share
 (OK here iocharset doesn't have any effect, you can omit it since it won't be
used if the server caps are not unicode, it will fall back to ascii)
 
(6) execute ls -l on the gnome-terminal which locale is Japanese utf-8
 # ls -l /mnt/share

Actual results:
The file including Japanese code in its file name on RHEL2.1's samba 2.2.x is
not displayed correctly.

Expected results:
The file including Japanese code in its file name on RHEL2.1's samba 2.2.x is
displayed correctly.

Business impact:
There are still a lot of samba 2.2.x servers on our customer's site.
Some NAS systems are still using samba 2.2.x.
Japanese RHEL5 users can not access those samba 2.2.x resources correctly.
There is a possibility that Japanese customers encounter this issue's problem
after buying RHEL5. As a result, there is a possibility that customer
satisfaction decreases.

Additional info:
 # mount -t smbfs -o guest,codepage=932,iocharset=utf8 //xxx.xxx.xxx.xxx/share
/mnt/share

This issue is not present when using smbfs and using codepage=932 and
iocharset=utf8.

Comment 14 Jeff Layton 2008-05-05 20:39:07 UTC
Finally got around to looking at this again. I seem to get the *exact* same
behavior from cifs and smbfs here:

...first smbfs (this is from a RHEL4 machine):

# mount -t smbfs -o guest,codepage=932,iocharset=utf8
//cherry.nrt.redhat.com/share /mnt/rhel21

# ls -l /mnt/rhel21
total 1
-rwxr-xr-x  1 root root 24 May 16  2007 ?????????̂????????

# ls -l /mnt/rhel21 | iconv -f CP932 -t UTF8
total 1
-rwxr-xr-x  1 root root 24 May 16  2007 おけいこのお役立ち情報

...now cifs (from same RHEL4 machine):

[root@dhcp231-224 ~]# mount -t cifs -o guest,iocharset=utf8
//cherry.nrt.redhat.com/share /mnt/rhel21

[root@dhcp231-224 ~]# ls -l /mnt/rhel21
total 1024
-rwxrwSrwt  1 root root 24 May 16  2007 ?????????̂????????

[root@dhcp231-224 ~]# ls -l /mnt/rhel21 | iconv -f CP932 -t UTF8
total 1024
-rwxrwSrwt  1 root root 24 May 16  2007 おけいこのお役立ち情報

...the original problem description says "This issue is not present when using
smbfs and using codepage=932 and iocharset=utf8." The behavior seems to be the
same to me.

Do I need to do some sort of special installation on the client to make this
work correctly with smbfs?


Comment 18 Jeff Layton 2008-05-08 15:40:14 UTC
Ok, I suspect that what we need to do here is the following:

1) add a codepage= option to cifs, and have it set remote_nls, similar to how
local_nls gets set.

2) turn cifs_strtoUCS and cifs_strfromUCS_le into more generic functions that
take the extra step of converting strings to and from the remote_nls. This seems
to be how smbfs handles this (see the convert_cp function). 



Comment 20 Jeff Layton 2008-07-09 17:10:05 UTC
Started looking over this in earnest today...

Gunter Kukkukk posted some patches upstream as a start for this:

-----[snip]-----
In a first step move all repeating code sequences, which use 
  - cifsConvertToUCS() 
to a new function
  - setup_ucs_nls_name()

Then move all repeating code sequences, which use
  - cifs_strtoUCS()
to the new function
  - setup_ucs_nls_name()
passing the "remap" parameter as "0" (false).
-----[snip]-----

I think this we need to do some underlying cleanup first. The problem is that
_a_lot_ of CIFS functions take a nls_codepage arg and a "remap" arg. These are
pretty universally passed from both the local_nls parameters and the
CIFS_MOUNT_MAP_SPECIAL_CHR flag respectively.

If we want to do this, then we'll also need a "remote_nls" parameter, but I
don't want to just add this to the argument list of all these functions.
Instead, we need to change them all to take a cifs_sb arg. Then we can have a
new function that does this conversion (and gets the info out of the cifs_sb
instead of passing these parameters individually).

It's a big cleanup job and I'm still looking at the best way to tackle it that
won't break things...



Comment 21 Jeff Layton 2008-07-09 23:21:22 UTC
While I hate to bump this to the next update again, I'm going to do so. I just
don't see this happening for 5.3. The changes involved are likely to be
substantial and it will need some upstream soak time. Moving to 5.4...


Comment 22 Jeff Layton 2008-07-14 19:11:51 UTC
Created attachment 311754 [details]
proof-of-concept patch 1

Here's a first, very rough, proof-of-concept upstream patchset. This patch:

1) adds a codepage= option similar to smbfs', that fills out a remote_nls field

   in the cifs_sb
2) adds a cifs_convert_nls() function that can convert from one codepage to
   another
3) changes the function that extracts filenames from the cifs dir search
   response to convert from one codepage to another

Tested this by mounting an internal samba server with "-o codepage=cp932". With
that, the files in that share with cp932 names showed up correctly.

This still is going to require a lot of upstream work before this is at all
upstream or RHEL ready. The patch does basically work for readdir, however, so
I think the solution I'm considering should basically work.

The challenge at this point is how best to clean up all of the places that do
conversions to and from unicode to a more generic function that can handle
other codepages. The existing code is a bit of a mess with a lot of duplicate
cut-and-paste code, so we really need to clean that up at the same time.

Comment 23 Jeff Layton 2008-09-11 13:33:56 UTC
Created attachment 316439 [details]
patch: add codepage option to CIFS

This is a more comprehensive patch against current upstream CIFS code. It seems to work for readdir and stat calls against the share on cherry. Unfortunately, I don't have a good place to test this more comprehensively.

The patch is still not complete. Unix symlinks on shares may not work correctly (we still need to determine how to handle the translation here), but it's probably 90% done or so.

Please test this patch and have the customer do so and report back the results. They'll probably need to test this on a recent kernel (rawhide or so).

Comment 24 Jeff Layton 2008-09-11 13:34:40 UTC
Setting to NEEDINFO pending results of testing from reporter.

Comment 25 Jeff Layton 2008-09-14 14:09:49 UTC
Unfortunately, I've found a bug in unicode handling with the new patch that shows up in CreateANDX calls. I'll have to track that down and respin.

Comment 27 RHEL Program Management 2009-02-16 15:45:02 UTC
Updating PM score.

Comment 29 RHEL Program Management 2009-02-17 18:07:27 UTC
Development Management has reviewed and declined this request.  You may appeal
this decision by reopening this request.


Note You need to log in before you can comment on or make changes to this bug.