Bug 185399

Summary: emacs forgets about utf keyboard encoding when TERM!=xterm
Product: [Fedora] Fedora Reporter: Axel Thimm <axel.thimm>
Component: emacsAssignee: Chip Coldwell <coldwell>
Status: CLOSED NEXTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 5   
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-08-03 18:53:13 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Axel Thimm 2006-03-14 12:47:00 UTC
Description of problem:
When TERM is set to xterm-16color or xterm-256color, emacs' keyboard input
encoding is set to nil, while for TERM=xterm it is "u -- utf-8 (alias of
mule-utf-8)"

Version-Release number of selected component (if applicable):
emacs-21.4-5

How reproducible:
always

Steps to Reproduce:
1.TERM=xterm emacs -nw -q
2.C-h C -> read encoding for keyboard input
3.repeat with TERM=xterm-256color or -16color
  
Actual results:
keyboard encoding drops to nil for xterm-16color or xterm-256color

Expected results:
keyboard encoding should remain at utf-8

Additional info:
This worked with FC3 and the regression came with the upgrade to FC4. It took me
some time to find out it was the TERM setting.

I also infocmp xterm and xterm-16color/256color, but that didn't reveal anything
related to different keyboard definitions. I even straced the different emacs
calls and checked whether anything else was pulled in, but found no trace either.

locale is
$ locale
LANG=en_US.UTF-8
LC_CTYPE=de_DE.UTF-8
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE=de_DE.UTF-8
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=de_DE.UTF-8
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Thanks!

Comment 1 Jens Petersen 2006-03-14 13:04:17 UTC
I think this is due to lang-coding-systems-init.el.

So a regexp or substring of $TERM needs to be used there instead.

Comment 2 Jens Petersen 2006-03-14 13:08:08 UTC
BTW what terminal defines TERM to be xterm-16color/256color?

Comment 3 Axel Thimm 2006-03-14 14:50:25 UTC
Thanks Jens! I'm now using

@@ -22,7 +22,7 @@
           (require 'un-define))
         (set-default-coding-systems 'utf-8)
         (set-terminal-coding-system 'utf-8)
-        (if (equal (getenv "TERM") "xterm")
+        (if (equal (substring (getenv "TERM") 0 5) "xterm")
             (set-keyboard-coding-system 'utf-8)))
        ((equal lang "ja")
         (set-default-coding-systems 'euc-jp)

Could you also do the same for linux and linux-c for FC5? Probably something like

@@ -23,8 +23,8 @@
         (set-default-coding-systems 'utf-8)
         (set-terminal-coding-system 'utf-8)
         (let ((term (getenv "TERM")))
-          (when (or (equal term "linux")
-                    (equal term "xterm"))
+          (when (or (equal (substring term 0 5) "linux")
+                    (equal (substring term 0 5) "xterm"))
             (set-keyboard-coding-system 'utf-8))))
        ((equal lang "ja")
         (set-default-coding-systems 'euc-jp)

Wrt where xterm-256color/16color are used: I'm using them with xterm for various
apps most notably mutt. And the editor called by mutt (emacs -nw) would not
accept german Umlauts, that's how I stumbled over it. I don't think xterm-Ncolor
is used by default anywhere in FC4.


Comment 4 Jens Petersen 2006-03-14 15:07:49 UTC
OK, but I think those substring calls need to be protected
against the case when TERM is less than 5 chars.

Comment 5 Axel Thimm 2006-03-14 15:27:02 UTC
I tested on FC4 against a TERM with less than 5 characters and the result was
nil again, which is what one wants. emacs' elist manual on the use of substring
isn't specific on what happens if the END is below the length of the string, but
it doesn't seem to hurt.

BTW I'm no list/elisp expert, substring was the first match I found in the docs,
compare-strings might be better.

Comment 6 Jens Petersen 2006-03-15 02:12:54 UTC
Hmm ok, for me evaluating say

  (substring "abc" 0 5)

gives an error:

  Debugger entered--Lisp error: (args-out-of-range "abc" 0 5)

but maybe the non-interactive behaviour is different?

Comment 7 Axel Thimm 2006-03-15 09:00:43 UTC
Then maybe something like

(substring (concat term "     ") 0 5)

to be safe?


Comment 8 Chip Coldwell 2006-08-03 17:59:25 UTC
A regular expression seems like the way to go.

I'm committing this fix to rawhide/FC-5.

$ cvs diff -u lang-coding-systems-init.el 
Index: lang-coding-systems-init.el
===================================================================
RCS file: /cvs/dist/rpms/emacs/devel/lang-coding-systems-init.el,v
retrieving revision 1.6
diff -u -r1.6 lang-coding-systems-init.el
--- lang-coding-systems-init.el 25 Nov 2005 09:04:10 -0000      1.6
+++ lang-coding-systems-init.el 3 Aug 2006 18:07:44 -0000
@@ -23,8 +23,8 @@
         (set-default-coding-systems 'utf-8)
         (set-terminal-coding-system 'utf-8)
         (let ((term (getenv "TERM")))
-          (when (or (equal term "linux")
-                    (equal term "xterm"))
+          (when (or (string-match "^linux" term)
+                    (string-match "^xterm" term))
             (set-keyboard-coding-system 'utf-8))))
        ((equal lang "ja")
         (set-default-coding-systems 'euc-jp)


Comment 9 Chip Coldwell 2006-08-03 18:53:13 UTC
Fixed in FC-5 (version 21.4-16.1) and rawhide (version 21.4-17).

Chip