Bug 465994

Summary: file --mime-encoding seems broken
Product: [Fedora] Fedora Reporter: Lubomir Rintel <lkundrak>
Component: fileAssignee: Daniel Novotny <dnovotny>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: dnovotny
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-10-16 11:04:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
a patch for --mime-encoding none

Description Lubomir Rintel 2008-10-07 17:07:09 UTC
While investigating a regression in file --mime behavior (type and encoding are no longer separated with ";", unlike in 4.17), I noticed a behavior that seems somewhat odd to me:

[lkundrak@trurl ~]$ file --mime-type /etc/passwd
/etc/passwd: text/plain
[lkundrak@trurl ~]$ file --mime-type --mime-encoding /etc/passwd
/etc/passwd: text/plain charset=us-ascii
[lkundrak@trurl ~]$ file --mime-encoding /etc/passwd
/etc/passwd: binary
[lkundrak@trurl ~]$ 

Looking at lines 259-278 of ./src/ascmagic.c this is no surprise, yet I think this lacks much sense. Even though charset=us-ascii is not considered encoding, but part of content type, file added it when I specified --mime-encoding option. Contrary what would one expect, --mime-encoding does not output the charset alone, but (correct?) "binary" encoding (which is hardcoded in source code).

[lkundrak@trurl ~]$ file --mime-type /bin/ls
/bin/ls: application/x-executable
[lkundrak@trurl ~]$ file --mime-type --mime-encoding /bin/ls
/bin/ls: application/x-executable
[lkundrak@trurl ~]$ file --mime-encoding /bin/ls
/bin/ls: application/x-executable
[lkundrak@trurl ~]$ 

For non-text (binary) files the situation seems different -- the option does not make any difference, and never seems to yield results which could have been considered correct.

Any thoughts on this?

Comment 1 Daniel Novotny 2008-10-15 10:14:46 UTC
yes, acknowledged, the --mime-encoding option does not do anything useful right now: to output "binary" to all cases does not help the user at all and the other values like "base64" or "8bit" do not appear anywhere in the code

the question is how deep you want to go: to distinguish "8bit" from "binary", or "7bit" from "base64" can be quite a deal... I am analyzing possible ways to go...

other thing: the -i option is the same as turning both --mime-type and --mime-encoding together: most people use -i, even in scripts, so it can be a bad idea to break the output of this: the best way will be to treat --mime-encoding differently, separately. what do you think? I can also ask upstream...

Comment 2 Daniel Novotny 2008-10-16 09:55:54 UTC
Created attachment 320535 [details]
a patch for --mime-encoding

this patch distinguishs between "7bit" and "binary", when you run the --mime-encoding cmdline option on its own

Comment 3 Daniel Novotny 2008-10-16 09:59:14 UTC
Comment on attachment 320535 [details]
a patch for --mime-encoding

oops, I was too quick, it segfaults with directories and such (fsmagic) ... will look into this, but the overall direction is right

Comment 4 Daniel Novotny 2008-10-16 10:29:03 UTC
Comment on attachment 320535 [details]
a patch for --mime-encoding

oops#2 the patch is right, the segfault occurs also on vanilla file 4.26 without it, I will create additional bz item for this

Comment 5 Daniel Novotny 2008-10-16 11:04:29 UTC
file-4.26-3.fc10 in rawhide now