Bug 465994 - file --mime-encoding seems broken
Summary: file --mime-encoding seems broken
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: file
Version: rawhide
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Daniel Novotny
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-10-07 17:07 UTC by Lubomir Rintel
Modified: 2008-10-16 11:04 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-10-16 11:04:29 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
a patch for --mime-encoding (1.27 KB, patch)
2008-10-16 09:55 UTC, Daniel Novotny
no flags Details | Diff

Description Lubomir Rintel 2008-10-07 17:07:09 UTC
While investigating a regression in file --mime behavior (type and encoding are no longer separated with ";", unlike in 4.17), I noticed a behavior that seems somewhat odd to me:

[lkundrak@trurl ~]$ file --mime-type /etc/passwd
/etc/passwd: text/plain
[lkundrak@trurl ~]$ file --mime-type --mime-encoding /etc/passwd
/etc/passwd: text/plain charset=us-ascii
[lkundrak@trurl ~]$ file --mime-encoding /etc/passwd
/etc/passwd: binary
[lkundrak@trurl ~]$ 

Looking at lines 259-278 of ./src/ascmagic.c this is no surprise, yet I think this lacks much sense. Even though charset=us-ascii is not considered encoding, but part of content type, file added it when I specified --mime-encoding option. Contrary what would one expect, --mime-encoding does not output the charset alone, but (correct?) "binary" encoding (which is hardcoded in source code).

[lkundrak@trurl ~]$ file --mime-type /bin/ls
/bin/ls: application/x-executable
[lkundrak@trurl ~]$ file --mime-type --mime-encoding /bin/ls
/bin/ls: application/x-executable
[lkundrak@trurl ~]$ file --mime-encoding /bin/ls
/bin/ls: application/x-executable
[lkundrak@trurl ~]$ 

For non-text (binary) files the situation seems different -- the option does not make any difference, and never seems to yield results which could have been considered correct.

Any thoughts on this?

Comment 1 Daniel Novotny 2008-10-15 10:14:46 UTC
yes, acknowledged, the --mime-encoding option does not do anything useful right now: to output "binary" to all cases does not help the user at all and the other values like "base64" or "8bit" do not appear anywhere in the code

the question is how deep you want to go: to distinguish "8bit" from "binary", or "7bit" from "base64" can be quite a deal... I am analyzing possible ways to go...

other thing: the -i option is the same as turning both --mime-type and --mime-encoding together: most people use -i, even in scripts, so it can be a bad idea to break the output of this: the best way will be to treat --mime-encoding differently, separately. what do you think? I can also ask upstream...

Comment 2 Daniel Novotny 2008-10-16 09:55:54 UTC
Created attachment 320535 [details]
a patch for --mime-encoding

this patch distinguishs between "7bit" and "binary", when you run the --mime-encoding cmdline option on its own

Comment 3 Daniel Novotny 2008-10-16 09:59:14 UTC
Comment on attachment 320535 [details]
a patch for --mime-encoding

oops, I was too quick, it segfaults with directories and such (fsmagic) ... will look into this, but the overall direction is right

Comment 4 Daniel Novotny 2008-10-16 10:29:03 UTC
Comment on attachment 320535 [details]
a patch for --mime-encoding

oops#2 the patch is right, the segfault occurs also on vanilla file 4.26 without it, I will create additional bz item for this

Comment 5 Daniel Novotny 2008-10-16 11:04:29 UTC
file-4.26-3.fc10 in rawhide now


Note You need to log in before you can comment on or make changes to this bug.