Bug 75447
Summary: | Nautilus doesn't offer editor for Latin-1 files | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Göran Uddeborg <goeran> | ||||||
Component: | nautilus | Assignee: | Alexander Larsson <alexl> | ||||||
Status: | CLOSED RAWHIDE | QA Contact: | Jay Turner <jturner> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 8.0 | CC: | alexl, srevivo | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | i386 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2003-05-27 11:56:33 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Göran Uddeborg
2002-10-08 18:12:09 UTC
Created attachment 79444 [details]
A latin 1 text file. Although a latin 1 locale (sv_SE), I'm not offered to open it with emacs.
Created attachment 79445 [details]
An UTF 8 text file. I'm offered to open this one with emacs.
Is it just the filename extension? Try renaming Books.latin1 to Books.txt. If I rename it Books.txt, it is again possible to open it with GNU emacs. But if I swap them instead, it is only the UTF-8 file I can open. If I check the properties of the Latin-1 file (when not named .txt), it has mime type application/octet-stream. It appears as if the extension takes precedence in deciding the mime type. If I rename the file Books.jpg, Nautilus believes it is an image/jpeg. But when there is no (recognised) extension, it apparently takes a look into the file, to decide the type with some heuristics. And here, apparently again, it fails to take locale appropriately into consideration. I can see adding recognition for a .latin1 extension, but I'm not sure what else can be done here. There's no way to automatically recognize a latin-1 text file as it just looks like binary data; it could easily be a binary file. (Also, the MIME recognition is based on only the first few bytes of the file, if it was based on the whole file it would be unusably slow.) I'm not proposing adding a .latin1 extension. I've never seen that used. It was only something I invented to distinguish the to enclosures easily. Since UTF8 can be recognised, it seems to me character sets like Latin 1 could too. Though nautilus should probably only try to recognise file in the choosen locale (or UTF8 maybe in all cases); a file where all characters are printable according to the locale could be assumed to be text. A file with nonprintable characters is data. In the case of Latin 1, the ranges 000 to 037 and 200 to 237 are non-printable. That's not too different from ASCII, which I assume is recognised by the absense of 000 to 037 and 200 to 277. (Well, tab, line feed, and a few others should of course be accepted in both cases, but you get the idea.) (The "file" command distinguishes between "ascii text", "international text", and "data". I haven't dug into what heuristics it's using, or if it is locale dependent.) Trust me, there's no way to recognize latin-1 text, especially if you can't scan the entire file. Latin-1 and UTF-8 are different. In Latin-1 all bytes in all sequences are possible valid text (and also there's Latin-2 through Latin-15 that all look exactly like Latin-1). So almost any file is a valid Latin-1 file. For UTF-8, very few files will just "happen" to be valid UTF-8 - if it's valid UTF-8, it's very likely to _be_ UTF-8. > there's no way to recognize ... > In Latin-1 all bytes in all sequences are possible valid text If it contains control characters, it's not Latin-1, so no, not all sequences are valid text. The command "file" is a proof of concept that it IS possible to recognise it with a high probability of success. But it's not the end of the world. If you don't think it should be done, I won't bother you more now. Current gnome-vfs does this, if your locale is latin1. I saw I never confirmed this, and maybe I should have: Yes, with a new gnome-vfs this works fine now. |