Bug 196648
Summary: | Emacs cannot edit files in directories with names containing ä | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Pawel Salek <pawsa-gpa> | ||||
Component: | emacs | Assignee: | Chip Coldwell <coldwell> | ||||
Status: | CLOSED NEXTRELEASE | QA Contact: | |||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 5 | ||||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | i386 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2006-11-06 16:48:59 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Pawel Salek
2006-06-26 07:40:00 UTC
Actually, it works fine if you C-x C-f the file from within a running emacs. So this must have something to do with expanding PWD. Chip Oh, joy. It appears that we're up against the ol' ISO8859-1 (Latin-1) versus UTF-8 encoding problem. This does work: LANG=en_US.ISO8859-1 emacs ä/foo This doesn't: LANG=en_US.UTF-8 emacs ä/foo emacs-21 seems to be a lot happier with Latin-1 strings than with UTF-8 strings. (In reply to comment #0) > The same holds for directory names containing ö, Ã¥, ø, but not eg. Å, Å or Ä. The difference here seems to be the way they are encoded in UTF-8: ä: \303\244 Ã¥: \303\245 ö: \303\266 ø: \303\270 Ä: \304\205 Å: \305\202 Å: \305\204 I would guess that the important difference is the value of the first byte in the multibyte encoding: \303 (0xC3) doesn't work, \304 (0xC4) and higher does. Chip Amusingly, emacsclient --no-wait ä/foo works just fine (if you have the server started), even when emacs ä/foo doesn't. Chip The problem is in src/charset.h, where we have this macro defined: #define UNIBYTE_STR_AS_MULTIBYTE_P(str, length, bytes) \ (((str)[0] < 0x80 || (str)[0] >= 0xA0) \ ? (bytes) = 1 \ : (((bytes) = BYTES_BY_CHAR_HEAD ((str)[0])), \ ((bytes) > 1 && (bytes) <= (length) \ && (str)[0] != LEADING_CODE_8_BIT_CONTROL \ && !CHAR_HEAD_P ((str)[1]) \ && ((bytes) == 2 \ || (!CHAR_HEAD_P ((str)[2]) \ && ((bytes) == 3 \ || !CHAR_HEAD_P ((str)[3]))))))) Really, only the first three lines matter. I think the problem is that UTF-8 encoded multibte sequences can have leading bytes with values > 0xA0, and this macro will report them as being single byte sequences. I don't think there is a way of writing such a macro that will work simultaneoulsy on UTF-8 and ISO-8859-1 (Latin-1) strings. I think we will have to check the LANG environment variable. Chip I take that back. It looks like this bug has been fixed in emacs 22, but the approach is different from what I expected. It's a combination of C and Lisp code. This is in src/emacs.c around line 600: Vcommand_line_args = Qnil; for (i = argc - 1; i >= 0; i--) { if (i == 0 || i > skip_args) /* For the moment, we keep arguments as is in unibyte strings. They are decoded in the function command-line after we know locale-coding-system. */ Vcommand_line_args = Fcons (make_unibyte_string (argv[i], strlen (argv[i])), Vcommand_line_args); } N.B. the command line arguments are parsed as unibyte strings; emacs 21 has a call to "build_string" here that will attempt to parse the command line arguments as iso-8859-1 strings. There's also an additional change in lisp/startup.el: ;; Convert the arguments to Emacs internal representation. (let ((args (cdr command-line-args))) (while args (setcar args (decode-coding-string (car args) locale-coding-system t)) (pop args))) I'm going to try backporting these changes to emacs-21. The backport seems to have succeeded in fixing the bug. I have some test rpms up at http://people.redhat.com/coldwell/bugs/emacs/196648/ Could you please download and test them to verify the fix? Thanks. Chip The test rpms work very well - thanks for tracking this down! Created attachment 140252 [details]
enable multibyte strings in command line arguments
FC-5: 21.4-16.2 FC-6: 21.4-17.1 RAWHIDE: 21.4-20 |