Bug 886408

Summary: human-editable XML detected as application/xml MIME type
Product: [Fedora] Fedora Reporter: Florian Weimer <fweimer>
Component: fileAssignee: Jan Kaluža <jkaluza>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 17CC: jkaluza, jorton, vanmeeuwen+fedora
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 886005 Environment:
Last Closed: 2012-12-13 08:47:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 886005    
Bug Blocks:    

Description Florian Weimer 2012-12-12 08:50:39 UTC
+++ This bug was initially created as a clone of Bug #886005 +++

Description of problem:

"svn add" adds .xml files with an svn:mime-type property of "application/xml", which prevents many useful operations (including diffing).  This happens because libmagic (or "file -i") returns "application/xml" for most XML documents, and Subversion treats application/* as binary.

Version-Release number of selected component (if applicable):

subversion-1.7.7-1.fc17.x86_64

How reproducible:

Always.

Steps to Reproduce:
1. Create an XML file (e.g., a Docbook document) with an .xml extension.
2. "svn add" it.
3. "svn diff" does not print a diff.
  
Actual results:

File has "application/xml" property.

Expected results:

File has "text/xml" property.

--- Additional comment from Joe Orton on 2012-12-11 11:40:33 CET ---

You can bypass use of libmagic by setting autoprops in ~/.subversion/config.

Is the default appropriate?  That is a large question which cannot be addressed in RH bugzilla.  An XML file may be UTF-16, for example, or it may not be usefully human-readable.

The current behaviour is by intent and by design, anyway.

--- Additional comment from Florian Weimer on 2012-12-11 12:02:15 CET ---

(In reply to comment #1)
> You can bypass use of libmagic by setting autoprops in ~/.subversion/config.
> 
> Is the default appropriate?  That is a large question which cannot be
> addressed in RH bugzilla.  An XML file may be UTF-16, for example, or it may
> not be usefully human-readable.
> 
> The current behaviour is by intent and by design, anyway.

Note sure about that, the behavior emerges from the behavior of file and subversion.  For example, on Debian, XML documents are treated as text and no svn:mime-type is set because file reports different MIME types there.

Perhaps file could report text/xml for obvious text-like XML files (Docbook, XHTML, Ant build.xml files, Gconf configuration files etc.)?  In fact, I've got a hard time finding an XML file on my system which actually deserves the application/xml MIME type.

Comment 1 Jan Kaluža 2012-12-13 08:47:33 UTC
I think application/xml is preferred mime type for xml files.

- shared-mime-type (mime-type database used by KDE/Gnome and others) uses application/xml too and I think File should stay consistent with it (see http://lists.freedesktop.org/archives/xdg/2005-December/005962.html).
- there are tries to deprecate text/xml in recent XML specification drafts.
- returning text/xml mime-type without proper charset detection (which is not File's case) would mean that the documents could not be parsed at all (see http://annevankesteren.nl/2005/03/text-xml)
- I don't see a way how File could detect that XML "is readable by casual users". I admit most of them probably are, but without a way how to detect it, it's safer in current situation (see the points I made above), to return application/xml.

> For example, on Debian, XML documents are treated as text and no
> svn:mime-type is set because file reports different MIME types there.

It probably depends on XML file. Some of them could be detected as XHTML and then they have text/* mime-type. That's the same as on Fedora.

file-5.04 on Debian:
# file -i ./xml/iso-codes/iso_15924.xml
./xml/iso-codes/iso_15924.xml: application/xml; charset=utf-8

file-5.11 on Debian:
# file -i ./xml/iso-codes/iso_15924.xml
./xml/iso-codes/iso_15924.xml: application/xml; charset=utf-8

Of course you can contact File upstream ([1], [2]) and maybe they will have different opinion. In that case I'm more than happy to backport upstream patch for this. For now I will close it as NOTABUG.

[1] http://mx.gw.com/mailman/listinfo/file
[2] http://bugs.gw.com/