Bug 75447

Summary: Nautilus doesn't offer editor for Latin-1 files
Product: [Retired] Red Hat Linux Reporter: Göran Uddeborg <goeran>
Component: nautilusAssignee: Alexander Larsson <alexl>
Status: CLOSED RAWHIDE QA Contact: Jay Turner <jturner>
Severity: medium Docs Contact:
Priority: medium    
Version: 8.0CC: alexl, srevivo
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-05-27 11:56:33 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
A latin 1 text file. Although a latin 1 locale (sv_SE), I'm not offered to open it with emacs.
none
An UTF 8 text file. I'm offered to open this one with emacs. none

Description Göran Uddeborg 2002-10-08 18:12:09 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20020830

Description of problem:
In the menu I get when I button-3-click on an icon in nautilus, I'm normally
offered editors like emacs for text files.  This does not seem to happen for
latin-1 files, even if I have a latin-1 locale

Version-Release number of selected component (if applicable): 2.0.6-6


How reproducible:
Always

Steps to Reproduce:
1.Set locale to sv_SE
2.Run nautilus in this locale, and button-3-click on Books.latin1.


Actual Results:  I don't have any editors or text file viewers in th "open with"
submenu.

Additional info:

For plain ASCII files, and also for UTF-8 files, I'm offered emacs, vi, and
other commands in this submenu.

Comment 1 Göran Uddeborg 2002-10-08 18:14:13 UTC
Created attachment 79444 [details]
A latin 1 text file.  Although a latin 1 locale (sv_SE), I'm not offered to open it with emacs.

Comment 2 Göran Uddeborg 2002-10-08 18:16:02 UTC
Created attachment 79445 [details]
An UTF 8 text file.  I'm offered to open this one with emacs.

Comment 3 Havoc Pennington 2002-10-08 18:42:57 UTC
Is it just the filename extension? Try renaming Books.latin1 to Books.txt.


Comment 4 Göran Uddeborg 2002-10-09 20:53:36 UTC
If I rename it Books.txt, it is again possible to open it with GNU emacs.  But
if I swap them instead, it is only the UTF-8 file I can open.  If I check the
properties of the Latin-1 file (when not named .txt), it has mime type
application/octet-stream.

It appears as if the extension takes precedence in deciding the mime type.  If I
rename the file Books.jpg, Nautilus believes it is an image/jpeg.

But when there is no (recognised) extension, it apparently takes a look into the
file, to decide the type with some heuristics.  And here, apparently again, it
fails to take locale appropriately into consideration.

Comment 5 Havoc Pennington 2002-10-09 21:14:36 UTC
I can see adding recognition for a .latin1 extension, but 
I'm not sure what else can be done here. There's no way to automatically
recognize a latin-1 text file as it just looks like binary data; it could easily
be a binary file. (Also, the MIME recognition is based on only the first few
bytes of the file, if it was based on the whole file it would be unusably slow.)


Comment 6 Göran Uddeborg 2002-10-09 21:55:50 UTC
I'm not proposing adding a .latin1 extension.  I've never seen that used.  It
was only something I invented to distinguish the to enclosures easily.

Since UTF8 can be recognised, it seems to me character sets like Latin 1 could
too.  Though nautilus should probably only try to recognise file in the choosen
locale (or UTF8 maybe in all cases); a file where all characters are printable
according to the locale could be assumed to be text.  A file with nonprintable
characters is data.  In the case of Latin 1, the ranges 000 to 037 and 200 to
237 are non-printable.  That's not too different from ASCII, which I assume is
recognised by the absense of 000 to 037 and 200 to 277.  (Well, tab, line feed,
and a few others should of course be accepted in both cases, but you get the idea.)

(The "file" command distinguishes between "ascii text", "international text",
and "data".  I haven't dug into what heuristics it's using, or if it is locale
dependent.)

Comment 7 Havoc Pennington 2002-10-09 22:56:07 UTC
Trust me, there's no way to recognize latin-1 text, especially if you can't scan
the entire file. Latin-1 and UTF-8 are different. In Latin-1 all bytes in all
sequences are possible valid text (and also there's Latin-2 through Latin-15
that all look exactly like Latin-1). So almost any file is a valid Latin-1 file.
For UTF-8, very few files will just "happen" to be valid UTF-8 - if it's valid 
UTF-8, it's very likely to _be_ UTF-8.

Comment 8 Göran Uddeborg 2002-10-10 20:50:11 UTC
> there's no way to recognize ...

> In Latin-1 all bytes in all sequences are possible valid text

If it contains control characters, it's not Latin-1, so no, not all sequences
are valid text.

The command "file" is a proof of concept that it IS possible to recognise it
with a high probability of success.

But it's not the end of the world.  If you don't think it should be done, I
won't bother you more now.

Comment 9 Alexander Larsson 2003-05-27 11:56:33 UTC
Current gnome-vfs does this, if your locale is latin1.

Comment 10 Göran Uddeborg 2003-10-06 12:25:01 UTC
I saw I never confirmed this, and maybe I should have:  Yes, with a new
gnome-vfs this works fine now.