While investigating a regression in file --mime behavior (type and encoding are no longer separated with ";", unlike in 4.17), I noticed a behavior that seems somewhat odd to me: [lkundrak@trurl ~]$ file --mime-type /etc/passwd /etc/passwd: text/plain [lkundrak@trurl ~]$ file --mime-type --mime-encoding /etc/passwd /etc/passwd: text/plain charset=us-ascii [lkundrak@trurl ~]$ file --mime-encoding /etc/passwd /etc/passwd: binary [lkundrak@trurl ~]$ Looking at lines 259-278 of ./src/ascmagic.c this is no surprise, yet I think this lacks much sense. Even though charset=us-ascii is not considered encoding, but part of content type, file added it when I specified --mime-encoding option. Contrary what would one expect, --mime-encoding does not output the charset alone, but (correct?) "binary" encoding (which is hardcoded in source code). [lkundrak@trurl ~]$ file --mime-type /bin/ls /bin/ls: application/x-executable [lkundrak@trurl ~]$ file --mime-type --mime-encoding /bin/ls /bin/ls: application/x-executable [lkundrak@trurl ~]$ file --mime-encoding /bin/ls /bin/ls: application/x-executable [lkundrak@trurl ~]$ For non-text (binary) files the situation seems different -- the option does not make any difference, and never seems to yield results which could have been considered correct. Any thoughts on this?
yes, acknowledged, the --mime-encoding option does not do anything useful right now: to output "binary" to all cases does not help the user at all and the other values like "base64" or "8bit" do not appear anywhere in the code the question is how deep you want to go: to distinguish "8bit" from "binary", or "7bit" from "base64" can be quite a deal... I am analyzing possible ways to go... other thing: the -i option is the same as turning both --mime-type and --mime-encoding together: most people use -i, even in scripts, so it can be a bad idea to break the output of this: the best way will be to treat --mime-encoding differently, separately. what do you think? I can also ask upstream...
Created attachment 320535 [details] a patch for --mime-encoding this patch distinguishs between "7bit" and "binary", when you run the --mime-encoding cmdline option on its own
Comment on attachment 320535 [details] a patch for --mime-encoding oops, I was too quick, it segfaults with directories and such (fsmagic) ... will look into this, but the overall direction is right
Comment on attachment 320535 [details] a patch for --mime-encoding oops#2 the patch is right, the segfault occurs also on vanilla file 4.26 without it, I will create additional bz item for this
file-4.26-3.fc10 in rawhide now