Bug 1014299 - file detects docx files as zip archive
Summary: file detects docx files as zip archive
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: file
Version: 19
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
Assignee: Jan Kaluža
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-10-01 16:48 UTC by Alex Regan
Modified: 2015-02-17 17:26 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-02-17 17:26:56 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
docx archive exhibiting the "Zip archive data" issue (10.62 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2013-11-08 04:15 UTC, Alex Regan
no flags Details

Description Alex Regan 2013-10-01 16:48:54 UTC
Description of problem:

$ file report.docx
report.docx: Zip archive data, at least v2.0 to extract

Version-Release number of selected component (if applicable):

I've tried with 'file' through fc19. 

How reproducible:

Every time.


Steps to Reproduce:
1. Run 'file' on any docx file with [trash]/0001.dat files within them
2.
3.

Actual results:


Expected results:

If I remove the [trash] directory and re-zip the docx file, then re-run file, the results are as expected:

$ file report1.docx
report1.docx: Microsoft OOXML

Additional info:

This is important because it affects amavisd-new and detection of exe files.

Comment 1 Jan Kaluža 2013-10-03 11:49:06 UTC
Can you please upload some example docx file here?

Comment 2 Alex Regan 2013-10-04 03:13:38 UTC
I only have one sample, and I can't send it as-is because it contains confidential info. As it turns out, if I re-zip it, it then properly displays with 'file' as "Microsoft OOXML".

Viewing the contents of the original docx file shows the following:

$ unzip -v UtilizationReport.docx 
Archive:  UtilizationReport.docx
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
  412276  Defl:N    22707  95% 01-01-1980 00:00 88586817  word/document.xml
     456  Stored      456   0% 01-01-1980 00:00 ffffffff  [trash]/0001.dat
    1260  Defl:N      594  53% 01-01-1980 00:00 077742eb  word/styles.xml
    2773  Defl:N      383  86% 01-01-1980 00:00 8f253e71  word/_rels/document.xml.rels
    1828  Defl:N      895  51% 01-01-1980 00:00 1b3daf6b  word/settings.xml
    3725  Defl:N      814  78% 01-01-1980 00:00 2fca90b8  word/header0.xml
    5374  Defl:N     1027  81% 01-01-1980 00:00 964537b6  word/footer1.xml
    1417  Stored     1417   0% 01-01-1980 00:00 c104c3ac  word/media/img4.png
     305  Stored      305   0% 01-01-1980 00:00 b6dcdcc4  word/media/img5.png
     278  Stored      278   0% 01-01-1980 00:00 0e6ca9e3  word/media/img6.png
    8816  Stored     8816   0% 01-01-1980 00:00 a1cd4261  word/media/img7.png
   15665  Stored    15665   0% 01-01-1980 00:00 1f19026f  word/media/img8.png
   22268  Stored    22268   0% 01-01-1980 00:00 143b0c5f  word/media/img9.png
   21867  Stored    21867   0% 01-01-1980 00:00 8a7b361a  word/media/img10.png
   11017  Stored    11017   0% 01-01-1980 00:00 8920be29  word/media/img11.png
   17448  Stored    17448   0% 01-01-1980 00:00 d54d14ff  word/media/img12.png
   13005  Stored    13005   0% 01-01-1980 00:00 abd0d93a  word/media/img13.png
   11803  Stored    11803   0% 01-01-1980 00:00 0d2a6eff  word/media/img14.png
     466  Defl:N      265  43% 01-01-1980 00:00 8a49f4b8  docProps/core.xml
   37373  Defl:N     1888  95% 01-01-1980 00:00 4e070882  word/numbering.xml
    1911  Defl:N      388  80% 01-01-1980 00:00 54a3c1b4  [Content_Types].xml
     219  Defl:S      144  34% 01-01-1980 00:00 236284bd  customXml/item1.xml
     296  Defl:S      194  35% 01-01-1980 00:00 7a393f74  customXml/_rels/item1.xml.rels
     201  Defl:S      184   9% 01-01-1980 00:00 88dc520d  customXml/itemProps2.xml
     187  Stored      187   0% 01-01-1980 00:00 ffffffff  [trash]/0000.dat
     201  Defl:S      183   9% 01-01-1980 00:00 3099dde0  customXml/itemProps1.xml
     296  Defl:S      195  34% 01-01-1980 00:00 2227965c  customXml/_rels/item2.xml.rels
    7888  Defl:S     1747  78% 01-01-1980 00:00 753d0fd1  customXml/item2.xml
     290  Defl:S      193  33% 01-01-1980 00:00 c3438b7f  customXml/item3.xml
     201  Defl:S      182  10% 01-01-1980 00:00 a1fc264b  customXml/itemProps3.xml
     522  Defl:S      301  42% 01-01-1980 00:00 4753451c  docProps/custom.xml
     296  Defl:S      195  34% 01-01-1980 00:00 a302f37b  customXml/_rels/item3.xml.rels
     595  Stored      595   0% 01-01-1980 00:00 c782df8d  _rels/.rels
--------          -------  ---                            -------
  602523           157606  74%                            33 files

I notice when I unzip then zip the contents of the same original file, it stores the directory names, whereas in the original zip version, only the files are stored.

If I open the original docx file, remove the confidential information and save it, it also no longer reports that it's a Zip file. The [trash] directory and its contents are gone.

Does this help?

Comment 3 Alex Regan 2013-11-08 04:15:10 UTC
Created attachment 821424 [details]
docx archive exhibiting the "Zip archive data" issue

Comment 4 Alex Regan 2013-11-08 04:22:51 UTC
Okay, I've been able to recreate the problem.

Create a docx archive with the [Content_Types].xml file in a position in the archive other than the first:

$ unzip -v docx-test-zip.docx
Archive:  docx-test-zip.docx
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
       0  Stored        0   0% 11-07-2013 23:06 00000000  docProps/
     979  Defl:X      450  54% 01-01-1980 00:00 d8c29d91  docProps/app.xml
     629  Defl:X      309  51% 01-01-1980 00:00 fb171547  docProps/core.xml
       0  Stored        0   0% 11-07-2013 23:06 00000000  _rels/
     590  Defl:X      233  61% 01-01-1980 00:00 b71a911e  _rels/.rels
       0  Stored        0   0% 11-07-2013 23:06 00000000  word/
    1186  Defl:X      425  64% 01-01-1980 00:00 4a41e916  word/fontTable.xml
    2225  Defl:X      833  63% 01-01-1980 00:00 2df2afdc  word/settings.xml
       0  Stored        0   0% 11-07-2013 23:06 00000000  word/theme/
    7076  Defl:X     1478  79% 01-01-1980 00:00 2943dd30  word/theme/theme1.xml
       0  Stored        0   0% 11-07-2013 23:06 00000000  word/_rels/
     953  Defl:X      272  72% 01-01-1980 00:00 39973b7c  word/_rels/document.xml.rels
     428  Defl:X      249  42% 01-01-1980 00:00 4e16a017  word/webSettings.xml
    1712  Defl:X      580  66% 01-01-1980 00:00 6b13ee01  word/document.xml
   14957  Defl:X     1421  91% 01-01-1980 00:00 858d234e  word/styles.xml
   15710  Defl:X     1530  90% 01-01-1980 00:00 28d89db0  word/stylesWithEffects.xml
    1422  Defl:X      358  75% 01-01-1980 00:00 82872409  [Content_Types].xml
--------          -------  ---                            -------
   47867             8138  83%                            17 files

$ file docx-test-zip.docx 
docx-test-zip.docx: Zip archive data, at least v1.0 to extract

I thought it was dependent upon having a [trash]/0001.dat file in the archive, but it is not necessary.

The original I received was actually sent by a Microsoft executive, I assume using Office 2010, not some non-standard file creation mechanism.

I've attached my test sample here. Please advise.

Comment 5 Jan Kaluža 2014-06-27 12:11:52 UTC
This is fixed in file-5.19 in Fedora rawhide.

Comment 6 Fedora End Of Life 2015-01-09 20:03:58 UTC
This message is a notice that Fedora 19 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 19. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained. Approximately 4 (four) weeks from now this bug will
be closed as EOL if it remains open with a Fedora 'version' of '19'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 19 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 7 Fedora End Of Life 2015-02-17 17:26:56 UTC
Fedora 19 changed to end-of-life (EOL) status on 2015-01-06. Fedora 19 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.