Red Hat Bugzilla – Bug 75447
Nautilus doesn't offer editor for Latin-1 files
Last modified: 2015-01-07 19:00:56 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20020830
Description of problem:
In the menu I get when I button-3-click on an icon in nautilus, I'm normally
offered editors like emacs for text files. This does not seem to happen for
latin-1 files, even if I have a latin-1 locale
Version-Release number of selected component (if applicable): 2.0.6-6
Steps to Reproduce:
1.Set locale to sv_SE
2.Run nautilus in this locale, and button-3-click on Books.latin1.
Actual Results: I don't have any editors or text file viewers in th "open with"
For plain ASCII files, and also for UTF-8 files, I'm offered emacs, vi, and
other commands in this submenu.
Created attachment 79444 [details]
A latin 1 text file. Although a latin 1 locale (sv_SE), I'm not offered to open it with emacs.
Created attachment 79445 [details]
An UTF 8 text file. I'm offered to open this one with emacs.
Is it just the filename extension? Try renaming Books.latin1 to Books.txt.
If I rename it Books.txt, it is again possible to open it with GNU emacs. But
if I swap them instead, it is only the UTF-8 file I can open. If I check the
properties of the Latin-1 file (when not named .txt), it has mime type
It appears as if the extension takes precedence in deciding the mime type. If I
rename the file Books.jpg, Nautilus believes it is an image/jpeg.
But when there is no (recognised) extension, it apparently takes a look into the
file, to decide the type with some heuristics. And here, apparently again, it
fails to take locale appropriately into consideration.
I can see adding recognition for a .latin1 extension, but
I'm not sure what else can be done here. There's no way to automatically
recognize a latin-1 text file as it just looks like binary data; it could easily
be a binary file. (Also, the MIME recognition is based on only the first few
bytes of the file, if it was based on the whole file it would be unusably slow.)
I'm not proposing adding a .latin1 extension. I've never seen that used. It
was only something I invented to distinguish the to enclosures easily.
Since UTF8 can be recognised, it seems to me character sets like Latin 1 could
too. Though nautilus should probably only try to recognise file in the choosen
locale (or UTF8 maybe in all cases); a file where all characters are printable
according to the locale could be assumed to be text. A file with nonprintable
characters is data. In the case of Latin 1, the ranges 000 to 037 and 200 to
237 are non-printable. That's not too different from ASCII, which I assume is
recognised by the absense of 000 to 037 and 200 to 277. (Well, tab, line feed,
and a few others should of course be accepted in both cases, but you get the idea.)
(The "file" command distinguishes between "ascii text", "international text",
and "data". I haven't dug into what heuristics it's using, or if it is locale
Trust me, there's no way to recognize latin-1 text, especially if you can't scan
the entire file. Latin-1 and UTF-8 are different. In Latin-1 all bytes in all
sequences are possible valid text (and also there's Latin-2 through Latin-15
that all look exactly like Latin-1). So almost any file is a valid Latin-1 file.
For UTF-8, very few files will just "happen" to be valid UTF-8 - if it's valid
UTF-8, it's very likely to _be_ UTF-8.
> there's no way to recognize ...
> In Latin-1 all bytes in all sequences are possible valid text
If it contains control characters, it's not Latin-1, so no, not all sequences
are valid text.
The command "file" is a proof of concept that it IS possible to recognise it
with a high probability of success.
But it's not the end of the world. If you don't think it should be done, I
won't bother you more now.
Current gnome-vfs does this, if your locale is latin1.
I saw I never confirmed this, and maybe I should have: Yes, with a new
gnome-vfs this works fine now.