Bug 2020715

Summary: The file command identifies a JSON file as "JSON data" without including "text" in the output
Product: [Fedora] Fedora Reporter: Erkki Ruohtula <eru>
Component: fileAssignee: Vincent Mihalkovič <vmihalko>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 35CC: jkaluza, kdudka, odubaj, svashisht, vmihalko
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-12-14 12:36:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Erkki Ruohtula 2021-11-05 17:29:45 UTC
Description of problem:

The command "file" applied to a JSON file now outputs "JSON data", without
including the string "text" in the output. Earlier versions behaved differently,
and this is also contrary to the documentation of file (man file), which states

     The type printed will usually contain one of the words text (the file
     contains only printing characters and a few common control characters and
     is probably safe to read on an ASCII terminal), executable (the file con‐
     tains the result of compiling a program in a form understandable to some
     UNIX kernel or another), or data meaning anything else (data is usually
     “binary” or non-printable).  Exceptions are well-known file formats (core
     files, tar archives) that are known to contain binary data.  When modify‐
     ing magic files or the program itself, make sure to preserve these
     keywords.  Users depend on knowing that all the readable files in a di‐
     rectory have the word “text” printed.  Don't do as Berkeley did and
     change “shell commands text” to “shell script”.

Apparently file has now done as Berkeley did...

Version-Release number of selected component (if applicable):

file-5.39

How reproducible:

Steps to Reproduce:
1. Create a file with JSON contents, eg.

{
	"a": 1,
	"b": 2
}

2. Apply the command file to it.

Actual results:

$ file foo.json 
foo.json: JSON data

Expected results:

Output should contain "text" somewhere (it could contain also "JSON").
Eg. on CentOS 6 file returned just "ASCII text" for this file.
The change broke a script that used "file" to find plaintext files.

Comment 1 Ben Cotton 2021-11-30 16:12:22 UTC
Fedora 33 changed to end-of-life (EOL) status on 2021-11-30. Fedora 33 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 2 Erkki Ruohtula 2021-12-07 17:16:13 UTC
The problem can be reproduced also on Fedora 35, file-5.40

Comment 3 Vincent Mihalkovič 2021-12-10 14:05:52 UTC
Hi, 

so I talked to upstream and the result is this commit: https://github.com/file/file/commit/c49e7805fd8aa48b8d2afad98d2115560ffaaf21

We change the output from "JSON data" to "JSON text data.

Comment 4 Erkki Ruohtula 2021-12-10 17:49:57 UTC
Thanks, the correction looks good.

Comment 5 Vincent Mihalkovič 2021-12-14 12:36:29 UTC
dist-git commit: https://src.fedoraproject.org/rpms/file/c/411766c4848c80eb8d94b0e12f48013e7ceb9de1