Red Hat Bugzilla – Bug 196648
Emacs cannot edit files in directories with names containing Ã¤
Last modified: 2007-11-30 17:11:35 EST
Description of problem:
Version-Release number of selected component (if applicable):
Steps to Reproduce:
mkdir Ã¤; cd Ã¤; emacs xxx
emacs says "File not found and directory write-protected".
emacs should allow to edit the file.
The same holds for directory names containing Ã¶, Ã¥, Ã¸, but not eg. Å, Å or Ä
Actually, it works fine if you C-x C-f the file from within a running emacs. So
this must have something to do with expanding PWD.
Oh, joy. It appears that we're up against the ol' ISO8859-1 (Latin-1) versus
UTF-8 encoding problem.
This does work:
LANG=en_US.ISO8859-1 emacs Ã¤/foo
LANG=en_US.UTF-8 emacs Ã¤/foo
emacs-21 seems to be a lot happier with Latin-1 strings than with UTF-8 strings.
(In reply to comment #0)
> The same holds for directory names containing Ã¶, Ã¥, Ã¸, but not eg. Å, Å or Ä
The difference here seems to be the way they are encoded in UTF-8:
I would guess that the important difference is the value of the first byte in
the multibyte encoding: \303 (0xC3) doesn't work, \304 (0xC4) and higher does.
emacsclient --no-wait ä/foo
works just fine (if you have the server started), even when
The problem is in src/charset.h, where we have this macro defined:
#define UNIBYTE_STR_AS_MULTIBYTE_P(str, length, bytes) \
(((str) < 0x80 || (str) >= 0xA0) \
? (bytes) = 1 \
: (((bytes) = BYTES_BY_CHAR_HEAD ((str))), \
((bytes) > 1 && (bytes) <= (length) \
&& (str) != LEADING_CODE_8_BIT_CONTROL \
&& !CHAR_HEAD_P ((str)) \
&& ((bytes) == 2 \
|| (!CHAR_HEAD_P ((str)) \
&& ((bytes) == 3 \
|| !CHAR_HEAD_P ((str))))))))
Really, only the first three lines matter. I think the problem is that UTF-8
encoded multibte sequences can have leading bytes with values > 0xA0, and this
macro will report them as being single byte sequences.
I don't think there is a way of writing such a macro that will work
simultaneoulsy on UTF-8 and ISO-8859-1 (Latin-1) strings. I think we will have
to check the LANG environment variable.
I take that back. It looks like this bug has been fixed in emacs 22, but the
approach is different from what I expected. It's a combination of C and Lisp
code. This is in src/emacs.c around line 600:
Vcommand_line_args = Qnil;
for (i = argc - 1; i >= 0; i--)
if (i == 0 || i > skip_args)
/* For the moment, we keep arguments as is in unibyte strings.
They are decoded in the function command-line after we know
= Fcons (make_unibyte_string (argv[i], strlen (argv[i])),
N.B. the command line arguments are parsed as unibyte strings; emacs 21 has a
call to "build_string" here that will attempt to parse the command line
arguments as iso-8859-1 strings.
There's also an additional change in lisp/startup.el:
;; Convert the arguments to Emacs internal representation.
(let ((args (cdr command-line-args)))
(decode-coding-string (car args) locale-coding-system t))
I'm going to try backporting these changes to emacs-21.
The backport seems to have succeeded in fixing the bug. I have some test rpms up at
Could you please download and test them to verify the fix?
The test rpms work very well - thanks for tracking this down!
Created attachment 140252 [details]
enable multibyte strings in command line arguments