Bug 886408 - human-editable XML detected as application/xml MIME type
Summary: human-editable XML detected as application/xml MIME type
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: file
Version: 17
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Jan Kaluža
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On: 886005
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-12-12 08:50 UTC by Florian Weimer
Modified: 2012-12-13 08:47 UTC (History)
3 users (show)

Fixed In Version:
Clone Of: 886005
Environment:
Last Closed: 2012-12-13 08:47:33 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Florian Weimer 2012-12-12 08:50:39 UTC
+++ This bug was initially created as a clone of Bug #886005 +++

Description of problem:

"svn add" adds .xml files with an svn:mime-type property of "application/xml", which prevents many useful operations (including diffing).  This happens because libmagic (or "file -i") returns "application/xml" for most XML documents, and Subversion treats application/* as binary.

Version-Release number of selected component (if applicable):

subversion-1.7.7-1.fc17.x86_64

How reproducible:

Always.

Steps to Reproduce:
1. Create an XML file (e.g., a Docbook document) with an .xml extension.
2. "svn add" it.
3. "svn diff" does not print a diff.
  
Actual results:

File has "application/xml" property.

Expected results:

File has "text/xml" property.

--- Additional comment from Joe Orton on 2012-12-11 11:40:33 CET ---

You can bypass use of libmagic by setting autoprops in ~/.subversion/config.

Is the default appropriate?  That is a large question which cannot be addressed in RH bugzilla.  An XML file may be UTF-16, for example, or it may not be usefully human-readable.

The current behaviour is by intent and by design, anyway.

--- Additional comment from Florian Weimer on 2012-12-11 12:02:15 CET ---

(In reply to comment #1)
> You can bypass use of libmagic by setting autoprops in ~/.subversion/config.
> 
> Is the default appropriate?  That is a large question which cannot be
> addressed in RH bugzilla.  An XML file may be UTF-16, for example, or it may
> not be usefully human-readable.
> 
> The current behaviour is by intent and by design, anyway.

Note sure about that, the behavior emerges from the behavior of file and subversion.  For example, on Debian, XML documents are treated as text and no svn:mime-type is set because file reports different MIME types there.

Perhaps file could report text/xml for obvious text-like XML files (Docbook, XHTML, Ant build.xml files, Gconf configuration files etc.)?  In fact, I've got a hard time finding an XML file on my system which actually deserves the application/xml MIME type.

Comment 1 Jan Kaluža 2012-12-13 08:47:33 UTC
I think application/xml is preferred mime type for xml files.

- shared-mime-type (mime-type database used by KDE/Gnome and others) uses application/xml too and I think File should stay consistent with it (see http://lists.freedesktop.org/archives/xdg/2005-December/005962.html).
- there are tries to deprecate text/xml in recent XML specification drafts.
- returning text/xml mime-type without proper charset detection (which is not File's case) would mean that the documents could not be parsed at all (see http://annevankesteren.nl/2005/03/text-xml)
- I don't see a way how File could detect that XML "is readable by casual users". I admit most of them probably are, but without a way how to detect it, it's safer in current situation (see the points I made above), to return application/xml.

> For example, on Debian, XML documents are treated as text and no
> svn:mime-type is set because file reports different MIME types there.

It probably depends on XML file. Some of them could be detected as XHTML and then they have text/* mime-type. That's the same as on Fedora.

file-5.04 on Debian:
# file -i ./xml/iso-codes/iso_15924.xml
./xml/iso-codes/iso_15924.xml: application/xml; charset=utf-8

file-5.11 on Debian:
# file -i ./xml/iso-codes/iso_15924.xml
./xml/iso-codes/iso_15924.xml: application/xml; charset=utf-8

Of course you can contact File upstream ([1], [2]) and maybe they will have different opinion. In that case I'm more than happy to backport upstream patch for this. For now I will close it as NOTABUG.

[1] http://mx.gw.com/mailman/listinfo/file
[2] http://bugs.gw.com/


Note You need to log in before you can comment on or make changes to this bug.