Bug 196648 - Emacs cannot edit files in directories with names containing ä
Emacs cannot edit files in directories with names containing ä
Status: CLOSED NEXTRELEASE
Product: Fedora
Classification: Fedora
Component: emacs (Show other bugs)
5
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Chip Coldwell
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-06-26 03:40 EDT by Pawel Salek
Modified: 2007-11-30 17:11 EST (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-11-06 11:48:59 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
enable multibyte strings in command line arguments (1.12 KB, patch)
2006-11-03 10:07 EST, Chip Coldwell
no flags Details | Diff

  None (edit)
Description Pawel Salek 2006-06-26 03:40:00 EDT
Description of problem:


Version-Release number of selected component (if applicable):
emacs-21.4-14

How reproducible:
Always

Steps to Reproduce:
1. Execute:
mkdir ä; cd ä; emacs xxx
  
Actual results:
emacs says "File not found and directory write-protected".

Expected results:
emacs should allow to edit the file.

Additional info:
The same holds for directory names containing ö, å, ø, but not eg. ł, ń or ą.
Comment 1 Chip Coldwell 2006-08-03 15:36:52 EDT
Actually, it works fine if you C-x C-f the file from within a running emacs.  So
this must have something to do with expanding PWD.

Chip
Comment 2 Chip Coldwell 2006-08-14 15:40:57 EDT
Oh, joy.  It appears that we're up against the ol' ISO8859-1 (Latin-1) versus
UTF-8 encoding problem.

This does work:

LANG=en_US.ISO8859-1 emacs ä/foo

This doesn't:

LANG=en_US.UTF-8 emacs ä/foo

emacs-21 seems to be a lot happier with Latin-1 strings than with UTF-8 strings.


Comment 3 Chip Coldwell 2006-08-14 15:53:52 EDT
(In reply to comment #0)

> The same holds for directory names containing ö, å, ø, but not eg. ł, ń or ą.

The difference here seems to be the way they are encoded in UTF-8:

ä: \303\244
Ã¥: \303\245
ö: \303\266
ø: \303\270

ą: \304\205
ł: \305\202
ń: \305\204

I would guess that the important difference is the value of the first byte in
the multibyte encoding: \303 (0xC3) doesn't work, \304 (0xC4) and higher does.

Chip
Comment 4 Chip Coldwell 2006-10-26 14:13:08 EDT
Amusingly,

emacsclient --no-wait ä/foo

works just fine (if you have the server started), even when

emacs ä/foo

doesn't.

Chip
Comment 5 Chip Coldwell 2006-10-26 15:05:33 EDT
The problem is in src/charset.h, where we have this macro defined:

#define UNIBYTE_STR_AS_MULTIBYTE_P(str, length, bytes)	\
  (((str)[0] < 0x80 || (str)[0] >= 0xA0)		\
   ? (bytes) = 1					\
   : (((bytes) = BYTES_BY_CHAR_HEAD ((str)[0])),	\
      ((bytes) > 1 && (bytes) <= (length)		\
       && (str)[0] != LEADING_CODE_8_BIT_CONTROL	\
       && !CHAR_HEAD_P ((str)[1])			\
       && ((bytes) == 2					\
	   || (!CHAR_HEAD_P ((str)[2])			\
	       && ((bytes) == 3				\
		   || !CHAR_HEAD_P ((str)[3])))))))

Really, only the first three lines matter.  I think the problem is that UTF-8
encoded multibte sequences can have leading bytes with values > 0xA0, and this
macro will report them as being single byte sequences.

I don't think there is a way of writing such a macro that will work
simultaneoulsy on UTF-8 and ISO-8859-1 (Latin-1) strings.  I think we will have
to check the LANG environment variable.

Chip
Comment 6 Chip Coldwell 2006-11-01 16:49:30 EST
I take that back.  It looks like this bug has been fixed in emacs 22, but the
approach is different from what I expected.  It's a combination of C and Lisp
code.  This is in src/emacs.c around line 600:

  Vcommand_line_args = Qnil;

  for (i = argc - 1; i >= 0; i--)
    {
      if (i == 0 || i > skip_args)
	/* For the moment, we keep arguments as is in unibyte strings.
	   They are decoded in the function command-line after we know
	   locale-coding-system.  */
	Vcommand_line_args
	  = Fcons (make_unibyte_string (argv[i], strlen (argv[i])),
		   Vcommand_line_args);
    }

N.B. the command line arguments are parsed as unibyte strings; emacs 21 has a
call to "build_string" here that will attempt to parse the command line
arguments as iso-8859-1 strings.

There's also an additional change in lisp/startup.el:

  ;; Convert the arguments to Emacs internal representation.
  (let ((args (cdr command-line-args)))
    (while args
      (setcar args
	      (decode-coding-string (car args) locale-coding-system t))
      (pop args)))

I'm going to try backporting these changes to emacs-21.
Comment 7 Chip Coldwell 2006-11-01 17:03:28 EST
The backport seems to have succeeded in fixing the bug.  I have some test rpms up at

http://people.redhat.com/coldwell/bugs/emacs/196648/

Could you please download and test them to verify the fix?

Thanks.

Chip
Comment 8 Pawel Salek 2006-11-01 17:11:54 EST
The test rpms work very well - thanks for tracking this down!
Comment 9 Chip Coldwell 2006-11-03 10:07:25 EST
Created attachment 140252 [details]
enable multibyte strings in command line arguments
Comment 10 Chip Coldwell 2006-11-06 11:48:59 EST
FC-5: 21.4-16.2
FC-6: 21.4-17.1
RAWHIDE: 21.4-20

Note You need to log in before you can comment on or make changes to this bug.