Description of problem: When a CIFS/SMB server non-unicode is configured with Extended Unix Code (EUC) for Japanese and the server does not support Unicode, CIFS cannot convert at all the filenames listed; in fact, it shows : total 1024 -rwxrwSrwt 1 root root 24 May 16 19:18 <82><A8><82><AF><82><A2><82><B1><82>\u0302<A8><96><U+179C2><BF><8F><EE><95><F1> At the console shown as : total 1024 -rwxrwSrwt 1 root root 24 May 16 19:18 ??????????????????????? Trying to use 'iocharset' won't work as this is reserver for if the server knows about unicode. Maybe CIFS needs a second option as smbfs to also specify the server's codepage? (codepage=<codepage> // convert_cp() ). In our case CIFS just dumps ascii, I confirmed this by using iconv against it too. Tests made: LANG=jp_JP.UTF-8 mount... LANG=jp_JP.EUC-JP mount... CIFS 1.49 (rawhide) 2.6.22-0.21.rc7.git5.fc8 srcversion: 30616BA7D30E1F22CF9B850 and CIFS 1.45 (RHEL5) 2.6.18-8.1.6.el5 srcversion: C1BB8ABDC4FF6849DC27E08 How reproducible: Always. Steps to Reproduce: * These steps reproduce the problem at your site. (1) setup RHEL2.1 and samba (2) setup smb.conf on the following --- [global] coding system = euc client code page = 932 --- (3) create a shared folder on RHEL2.1's samba (4) create a file including Japanese code in its filename (5) mount the folder via cifs from RHEL5 # mount -t cifs -o guest,iocharset=utf8 //xxx.xxx.xxx.xxx/share /mnt/share (OK here iocharset doesn't have any effect, you can omit it since it won't be used if the server caps are not unicode, it will fall back to ascii) (6) execute ls -l on the gnome-terminal which locale is Japanese utf-8 # ls -l /mnt/share Actual results: The file including Japanese code in its file name on RHEL2.1's samba 2.2.x is not displayed correctly. Expected results: The file including Japanese code in its file name on RHEL2.1's samba 2.2.x is displayed correctly. Business impact: There are still a lot of samba 2.2.x servers on our customer's site. Some NAS systems are still using samba 2.2.x. Japanese RHEL5 users can not access those samba 2.2.x resources correctly. There is a possibility that Japanese customers encounter this issue's problem after buying RHEL5. As a result, there is a possibility that customer satisfaction decreases. Additional info: # mount -t smbfs -o guest,codepage=932,iocharset=utf8 //xxx.xxx.xxx.xxx/share /mnt/share This issue is not present when using smbfs and using codepage=932 and iocharset=utf8.
Finally got around to looking at this again. I seem to get the *exact* same behavior from cifs and smbfs here: ...first smbfs (this is from a RHEL4 machine): # mount -t smbfs -o guest,codepage=932,iocharset=utf8 //cherry.nrt.redhat.com/share /mnt/rhel21 # ls -l /mnt/rhel21 total 1 -rwxr-xr-x 1 root root 24 May 16 2007 ?????????̂???????? # ls -l /mnt/rhel21 | iconv -f CP932 -t UTF8 total 1 -rwxr-xr-x 1 root root 24 May 16 2007 おけいこのお役立ち情報 ...now cifs (from same RHEL4 machine): [root@dhcp231-224 ~]# mount -t cifs -o guest,iocharset=utf8 //cherry.nrt.redhat.com/share /mnt/rhel21 [root@dhcp231-224 ~]# ls -l /mnt/rhel21 total 1024 -rwxrwSrwt 1 root root 24 May 16 2007 ?????????̂???????? [root@dhcp231-224 ~]# ls -l /mnt/rhel21 | iconv -f CP932 -t UTF8 total 1024 -rwxrwSrwt 1 root root 24 May 16 2007 おけいこのお役立ち情報 ...the original problem description says "This issue is not present when using smbfs and using codepage=932 and iocharset=utf8." The behavior seems to be the same to me. Do I need to do some sort of special installation on the client to make this work correctly with smbfs?
Ok, I suspect that what we need to do here is the following: 1) add a codepage= option to cifs, and have it set remote_nls, similar to how local_nls gets set. 2) turn cifs_strtoUCS and cifs_strfromUCS_le into more generic functions that take the extra step of converting strings to and from the remote_nls. This seems to be how smbfs handles this (see the convert_cp function).
Started looking over this in earnest today... Gunter Kukkukk posted some patches upstream as a start for this: -----[snip]----- In a first step move all repeating code sequences, which use - cifsConvertToUCS() to a new function - setup_ucs_nls_name() Then move all repeating code sequences, which use - cifs_strtoUCS() to the new function - setup_ucs_nls_name() passing the "remap" parameter as "0" (false). -----[snip]----- I think this we need to do some underlying cleanup first. The problem is that _a_lot_ of CIFS functions take a nls_codepage arg and a "remap" arg. These are pretty universally passed from both the local_nls parameters and the CIFS_MOUNT_MAP_SPECIAL_CHR flag respectively. If we want to do this, then we'll also need a "remote_nls" parameter, but I don't want to just add this to the argument list of all these functions. Instead, we need to change them all to take a cifs_sb arg. Then we can have a new function that does this conversion (and gets the info out of the cifs_sb instead of passing these parameters individually). It's a big cleanup job and I'm still looking at the best way to tackle it that won't break things...
While I hate to bump this to the next update again, I'm going to do so. I just don't see this happening for 5.3. The changes involved are likely to be substantial and it will need some upstream soak time. Moving to 5.4...
Created attachment 311754 [details] proof-of-concept patch 1 Here's a first, very rough, proof-of-concept upstream patchset. This patch: 1) adds a codepage= option similar to smbfs', that fills out a remote_nls field in the cifs_sb 2) adds a cifs_convert_nls() function that can convert from one codepage to another 3) changes the function that extracts filenames from the cifs dir search response to convert from one codepage to another Tested this by mounting an internal samba server with "-o codepage=cp932". With that, the files in that share with cp932 names showed up correctly. This still is going to require a lot of upstream work before this is at all upstream or RHEL ready. The patch does basically work for readdir, however, so I think the solution I'm considering should basically work. The challenge at this point is how best to clean up all of the places that do conversions to and from unicode to a more generic function that can handle other codepages. The existing code is a bit of a mess with a lot of duplicate cut-and-paste code, so we really need to clean that up at the same time.
Created attachment 316439 [details] patch: add codepage option to CIFS This is a more comprehensive patch against current upstream CIFS code. It seems to work for readdir and stat calls against the share on cherry. Unfortunately, I don't have a good place to test this more comprehensively. The patch is still not complete. Unix symlinks on shares may not work correctly (we still need to determine how to handle the translation here), but it's probably 90% done or so. Please test this patch and have the customer do so and report back the results. They'll probably need to test this on a recent kernel (rawhide or so).
Setting to NEEDINFO pending results of testing from reporter.
Unfortunately, I've found a bug in unicode handling with the new patch that shows up in CreateANDX calls. I'll have to track that down and respin.
Updating PM score.
Development Management has reviewed and declined this request. You may appeal this decision by reopening this request.