Bug 185399

Summary:	emacs forgets about utf keyboard encoding when TERM!=xterm
Product:	[Fedora] Fedora	Reporter:	Axel Thimm <Axel.Thimm>
Component:	emacs	Assignee:	Chip Coldwell <coldwell>
Status:	CLOSED NEXTRELEASE	QA Contact:
Severity:	medium	Docs Contact:
Priority:	medium
Version:	5
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2006-08-03 18:53:13 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Axel Thimm 2006-03-14 12:47:00 UTC

Description of problem:
When TERM is set to xterm-16color or xterm-256color, emacs' keyboard input
encoding is set to nil, while for TERM=xterm it is "u -- utf-8 (alias of
mule-utf-8)"

Version-Release number of selected component (if applicable):
emacs-21.4-5

How reproducible:
always

Steps to Reproduce:
1.TERM=xterm emacs -nw -q
2.C-h C -> read encoding for keyboard input
3.repeat with TERM=xterm-256color or -16color
  
Actual results:
keyboard encoding drops to nil for xterm-16color or xterm-256color

Expected results:
keyboard encoding should remain at utf-8

Additional info:
This worked with FC3 and the regression came with the upgrade to FC4. It took me
some time to find out it was the TERM setting.

I also infocmp xterm and xterm-16color/256color, but that didn't reveal anything
related to different keyboard definitions. I even straced the different emacs
calls and checked whether anything else was pulled in, but found no trace either.

locale is
$ locale
LANG=en_US.UTF-8
LC_CTYPE=de_DE.UTF-8
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE=de_DE.UTF-8
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=de_DE.UTF-8
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Thanks!

Comment 1 Jens Petersen 2006-03-14 13:04:17 UTC

I think this is due to lang-coding-systems-init.el.

So a regexp or substring of $TERM needs to be used there instead.

Comment 2 Jens Petersen 2006-03-14 13:08:08 UTC

BTW what terminal defines TERM to be xterm-16color/256color?

Comment 3 Axel Thimm 2006-03-14 14:50:25 UTC

Thanks Jens! I'm now using

@@ -22,7 +22,7 @@
           (require 'un-define))
         (set-default-coding-systems 'utf-8)
         (set-terminal-coding-system 'utf-8)
-        (if (equal (getenv "TERM") "xterm")
+        (if (equal (substring (getenv "TERM") 0 5) "xterm")
             (set-keyboard-coding-system 'utf-8)))
        ((equal lang "ja")
         (set-default-coding-systems 'euc-jp)

Could you also do the same for linux and linux-c for FC5? Probably something like

@@ -23,8 +23,8 @@
         (set-default-coding-systems 'utf-8)
         (set-terminal-coding-system 'utf-8)
         (let ((term (getenv "TERM")))
-          (when (or (equal term "linux")
-                    (equal term "xterm"))
+          (when (or (equal (substring term 0 5) "linux")
+                    (equal (substring term 0 5) "xterm"))
             (set-keyboard-coding-system 'utf-8))))
        ((equal lang "ja")
         (set-default-coding-systems 'euc-jp)

Wrt where xterm-256color/16color are used: I'm using them with xterm for various
apps most notably mutt. And the editor called by mutt (emacs -nw) would not
accept german Umlauts, that's how I stumbled over it. I don't think xterm-Ncolor
is used by default anywhere in FC4.

Comment 4 Jens Petersen 2006-03-14 15:07:49 UTC

OK, but I think those substring calls need to be protected
against the case when TERM is less than 5 chars.

Comment 5 Axel Thimm 2006-03-14 15:27:02 UTC

I tested on FC4 against a TERM with less than 5 characters and the result was
nil again, which is what one wants. emacs' elist manual on the use of substring
isn't specific on what happens if the END is below the length of the string, but
it doesn't seem to hurt.

BTW I'm no list/elisp expert, substring was the first match I found in the docs,
compare-strings might be better.

Comment 6 Jens Petersen 2006-03-15 02:12:54 UTC

Hmm ok, for me evaluating say

  (substring "abc" 0 5)

gives an error:

  Debugger entered--Lisp error: (args-out-of-range "abc" 0 5)

but maybe the non-interactive behaviour is different?

Comment 7 Axel Thimm 2006-03-15 09:00:43 UTC

Then maybe something like

(substring (concat term "     ") 0 5)

to be safe?

Comment 8 Chip Coldwell 2006-08-03 17:59:25 UTC

A regular expression seems like the way to go.

I'm committing this fix to rawhide/FC-5.

$ cvs diff -u lang-coding-systems-init.el 
Index: lang-coding-systems-init.el
===================================================================
RCS file: /cvs/dist/rpms/emacs/devel/lang-coding-systems-init.el,v
retrieving revision 1.6
diff -u -r1.6 lang-coding-systems-init.el
--- lang-coding-systems-init.el 25 Nov 2005 09:04:10 -0000      1.6
+++ lang-coding-systems-init.el 3 Aug 2006 18:07:44 -0000
@@ -23,8 +23,8 @@
         (set-default-coding-systems 'utf-8)
         (set-terminal-coding-system 'utf-8)
         (let ((term (getenv "TERM")))
-          (when (or (equal term "linux")
-                    (equal term "xterm"))
+          (when (or (string-match "^linux" term)
+                    (string-match "^xterm" term))
             (set-keyboard-coding-system 'utf-8))))
        ((equal lang "ja")
         (set-default-coding-systems 'euc-jp)

Comment 9 Chip Coldwell 2006-08-03 18:53:13 UTC

Fixed in FC-5 (version 21.4-16.1) and rawhide (version 21.4-17).

Chip