Bug 1011793 - Review Request: docx2txt - Convert Docx documents to Text
Summary: Review Request: docx2txt - Convert Docx documents to Text
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: Package Review
Version: rawhide
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Christopher Meng
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-09-25 07:19 UTC by Mathieu Bridon
Modified: 2013-09-30 03:44 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-09-30 03:44:29 UTC
Type: ---
Embargoed:
i: fedora-review+
kevin: fedora-cvs+


Attachments (Terms of Use)

Description Mathieu Bridon 2013-09-25 07:19:34 UTC
Spec URL: http://bochecha.fedorapeople.org/packages/docx2txt.spec
SRPM URL: http://bochecha.fedorapeople.org/packages/docx2txt-1.2-1.fc20.src.rpm

Description:
Command line utility to convert Docx documents to equivalent Text documents.
It supports the following features during text extraction:

 * Character conversions (" ' < & > - ... fraction and some mathematical
   symbols etc.); currency characters are converted to respective names like
   Euro.
 * Capitalisation of text blocks.
 * Center and right justification of text fitting in a line of (configurable)
   80 columns.
 * Horizontal ruler, line breaks, paragraphs separation, tabs.
 * Indicating hyperlinked text along with the hyperlink. (configurable)
 * Naive nested list formatting - assumed 8 level nesting, however you can
   handle even deeper nesting by commenting/uncommenting appropriate lines in
   Perl script.


Fedora Account System Username: bochecha

Comment 1 Christopher Meng 2013-09-25 08:00:30 UTC
1. # Sent upstream by email on 2913-09-25

Oh...

2. Missing %config(noreplace) with %{_sysconfdir}/%{name}.config

3. BuildRequires: /usr/bin/perl

just

BuildRequires: perl

4. http://downloads.sourceforge.net/project/docx2txt/docx2txt/v1.2/docx2txt-1.2.tgz

is the real URL.

Comment 2 Mathieu Bridon 2013-09-25 08:20:53 UTC
(In reply to Christopher Meng from comment #1)
> 2. Missing %config(noreplace) with %{_sysconfdir}/%{name}.config

Oops, good catch!

> 3. BuildRequires: /usr/bin/perl
> 
> just
> 
> BuildRequires: perl

I hesitated on that one, not sure which one is better, but ok.

> 4.
> http://downloads.sourceforge.net/project/docx2txt/docx2txt/v1.2/docx2txt-1.2.
> tgz
> 
> is the real URL.

Nope: https://fedoraproject.org/wiki/Packaging:SourceURL#Sourceforge.net

-----

Spec URL: http://bochecha.fedorapeople.org/packages/docx2txt.spec
SRPM URL: http://bochecha.fedorapeople.org/packages/docx2txt-1.2-2.fc20.src.rpm

Comment 3 Christopher Meng 2013-09-25 08:28:16 UTC
(In reply to Mathieu Bridon from comment #2)
> > 4.
> > http://downloads.sourceforge.net/project/docx2txt/docx2txt/v1.2/docx2txt-1.2.
> > tgz
> > 
> > is the real URL.
> 
> Nope: https://fedoraproject.org/wiki/Packaging:SourceURL#Sourceforge.net

Well, that's the direct link got from sf, not one I generate.

Comment 4 Haïkel Guémar 2013-09-25 08:34:20 UTC
As for the download url, we should stick to the guidelines, other SF generated urls are not guaranteed to work in the future.

Comment 5 Christophe Burgun 2013-09-25 19:47:12 UTC
Hi,

http://bochecha.fedorapeople.org/packages/docx2txt-1.2-2.fc20.src.rpm isn't good
(404)

As point 1 of Christopher : 1. # Sent upstream by email on 2913-09-25
=> 2013 instead of 2913

Comment 6 Mathieu Bridon 2013-09-26 03:33:22 UTC
(In reply to Christophe Burgun from comment #5)
> http://bochecha.fedorapeople.org/packages/docx2txt-1.2-2.fc20.src.rpm isn't
> good
> (404)

Gah, I uploaded the noarch RPM instead of the source one. >_<

Sorry about that.

> As point 1 of Christopher : 1. # Sent upstream by email on 2913-09-25
> => 2013 instead of 2913

Oh, good catch, I hadn't seen the typo even in Christopher's comment above!

-----

Spec URL: http://bochecha.fedorapeople.org/packages/docx2txt.spec
SRPM URL: http://bochecha.fedorapeople.org/packages/docx2txt-1.2-3.fc20.src.rpm

Comment 7 Christopher Meng 2013-09-26 03:51:27 UTC
[cut]

Rpmlint
-------
Checking: docx2txt-1.2-3.fc21.noarch.rpm
          docx2txt-1.2-3.fc21.src.rpm
docx2txt.noarch: W: spelling-error %description -l en_US hyperlinked -> hyper linked, hyper-linked, hyperlink ed
docx2txt.noarch: W: spelling-error %description -l en_US uncommenting -> commenting, commentating, complimenting
docx2txt.noarch: W: invalid-url URL: http://docx2txt.sourceforge.net/ <urlopen error timed out>
docx2txt.noarch: E: executable-marked-as-config-file /etc/docx2txt.config
docx2txt.noarch: E: script-without-shebang /etc/docx2txt.config
docx2txt.noarch: W: no-manual-page-for-binary docx2txt.sh
docx2txt.noarch: W: no-manual-page-for-binary docx2txt.pl
docx2txt.src: W: spelling-error %description -l en_US hyperlinked -> hyper linked, hyper-linked, hyperlink ed
docx2txt.src: W: spelling-error %description -l en_US uncommenting -> commenting, commentating, complimenting
docx2txt.src: W: invalid-url URL: http://docx2txt.sourceforge.net/ <urlopen error timed out>
docx2txt.src: W: strange-permission docx2txt-1.2.tgz 0444L
2 packages and 0 specfiles checked; 2 errors, 9 warnings.




Rpmlint (installed packages)
----------------------------
# rpmlint docx2txt
docx2txt.noarch: W: spelling-error %description -l en_US hyperlinked -> hyper linked, hyper-linked, hyperlink ed
docx2txt.noarch: W: spelling-error %description -l en_US uncommenting -> commenting, commentating, complimenting
docx2txt.noarch: W: invalid-url URL: http://docx2txt.sourceforge.net/ <urlopen error timed out>
docx2txt.noarch: E: executable-marked-as-config-file /etc/docx2txt.config
docx2txt.noarch: E: script-without-shebang /etc/docx2txt.config
docx2txt.noarch: W: no-manual-page-for-binary docx2txt.sh
docx2txt.noarch: W: no-manual-page-for-binary docx2txt.pl
1 packages and 0 specfiles checked; 2 errors, 5 warnings.
# echo 'rpmlint-done:'



Requires
--------
docx2txt (rpmlib, GLIBC filtered):
    /usr/bin/env
    /usr/bin/unzip
    config(docx2txt)
    perl(:MODULE_COMPAT_5.18.1)



Provides
--------
docx2txt:
    config(docx2txt)
    docx2txt



Source checksums
----------------
http://downloads.sourceforge.net/docx2txt/docx2txt-1.2.tgz :
  CHECKSUM(SHA256) this package     : 33649d1e8c4f86df897d478376cf76bd9f2aed27a952aaa96c615bce976488cf
  CHECKSUM(SHA256) upstream package : 33649d1e8c4f86df897d478376cf76bd9f2aed27a952aaa96c615bce976488cf


Generated by fedora-review 0.5.0 (920221d) last change: 2013-08-30
Command line :/usr/bin/fedora-review -rvn docx2txt-1.2-3.fc20.src.rpm
Buildroot used: fedora-rawhide-i386
Active plugins: Generic, Shell-api, Perl
Disabled plugins: Java, C/C++, Python, SugarActivity, R, PHP, Ruby
Disabled flags: EPEL5, EXARCH, DISTTAG

--------------------------------------------------------------

Well,

First it's not always good to see /usr/bin/env in glibc filter.

Use sed to fix shebangs in two docx2txt bins.

Second, I just realize that you install 2 bins with same name but different types.

Can you tell me why we ship 2 same function scripts? Any difference?

Thanks.

Comment 8 Mathieu Bridon 2013-09-26 04:26:17 UTC
(In reply to Christopher Meng from comment #7)
> Rpmlint
> -------
> docx2txt.noarch: E: executable-marked-as-config-file /etc/docx2txt.config
> docx2txt.noarch: E: script-without-shebang /etc/docx2txt.config

Oops, I missed that one.

Fixed.

> Requires
> --------
> docx2txt (rpmlib, GLIBC filtered):
>     /usr/bin/env
>     /usr/bin/unzip
>     config(docx2txt)
>     perl(:MODULE_COMPAT_5.18.1)
>
[... snip ...]
>
> First it's not always good to see /usr/bin/env in glibc filter.

I have no idea what you mean here. /usr/bin/env has nothing to do with "glibc filtered", it is merely a requirement of the binary package.

You might be confused by the "docx2txt (rpmlib, GLIBC filtered)" line above: this is just Fedora Review listing the requirements for docx2txt, and letting you know that it filtered the rpm- and glibc-related requirements from the output, so you can focus on the ones which matter.

But the /usr/bin/env requirement is not "in glibc filter", that phrase makes no sense.

Do you mean that's it's not good to have /usr/bin/env as a requirement?

If so...

> Use sed to fix shebangs in two docx2txt bins.

... fixed. (but not with sed)

> Second, I just realize that you install 2 bins with same name but different
> types.
>
> Can you tell me why we ship 2 same function scripts? Any difference?

Right, that's certainly confusing.

The docx2txt.sh file is a wrapper shell script around the docx2txt.pl file.

Originally, I was thinking about packaging only the .pl file, and renaming it to /usr/bin/docx2txt.

However, the upstream documentation mentions both the .sh and .pl scripts:
  http://docx2txt.cvs.sourceforge.net/viewvc/docx2txt/docx2txt/README?view=markup

So I also find it weird to have these two scripts, but I'd rather not diverge from upstream for such a silly little thing, honestly.

-----

Spec URL: http://bochecha.fedorapeople.org/packages/docx2txt.spec
SRPM URL: http://bochecha.fedorapeople.org/packages/docx2txt-1.2-4.fc20.src.rpm

Comment 9 Christopher Meng 2013-09-26 04:45:20 UTC
PACKAGE APPROVED.

(NOTE I misunderstood something in the review template, sorry)

Comment 10 Mathieu Bridon 2013-09-26 05:05:56 UTC
Thanks for the review Christopher, and thank you for the comments Christophe and Haikel!

New Package SCM Request
=======================
Package Name: docx2txt
Short Description: Convert Docx documents to Text
Owners: bochecha
Branches: devel

Comment 11 Kevin Fenzi 2013-09-27 18:54:51 UTC
Git done (by process-git-requests).

Comment 12 Mathieu Bridon 2013-09-30 03:44:29 UTC
Thanks for the VCS creation Kevin!

Package built in Rawhide, closing.


Note You need to log in before you can comment on or make changes to this bug.