Bug 1011793 - Review Request: docx2txt - Convert Docx documents to Text
Review Request: docx2txt - Convert Docx documents to Text
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: Package Review (Show other bugs)
rawhide
All Linux
medium Severity medium
: ---
: ---
Assigned To: Christopher Meng
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-09-25 03:19 EDT by Mathieu Bridon
Modified: 2013-09-29 23:44 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-09-29 23:44:29 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
i: fedora‑review+
kevin: fedora‑cvs+


Attachments (Terms of Use)

  None (edit)
Description Mathieu Bridon 2013-09-25 03:19:34 EDT
Spec URL: http://bochecha.fedorapeople.org/packages/docx2txt.spec
SRPM URL: http://bochecha.fedorapeople.org/packages/docx2txt-1.2-1.fc20.src.rpm

Description:
Command line utility to convert Docx documents to equivalent Text documents.
It supports the following features during text extraction:

 * Character conversions (" ' < & > - ... fraction and some mathematical
   symbols etc.); currency characters are converted to respective names like
   Euro.
 * Capitalisation of text blocks.
 * Center and right justification of text fitting in a line of (configurable)
   80 columns.
 * Horizontal ruler, line breaks, paragraphs separation, tabs.
 * Indicating hyperlinked text along with the hyperlink. (configurable)
 * Naive nested list formatting - assumed 8 level nesting, however you can
   handle even deeper nesting by commenting/uncommenting appropriate lines in
   Perl script.


Fedora Account System Username: bochecha
Comment 1 Christopher Meng 2013-09-25 04:00:30 EDT
1. # Sent upstream by email on 2913-09-25

Oh...

2. Missing %config(noreplace) with %{_sysconfdir}/%{name}.config

3. BuildRequires: /usr/bin/perl

just

BuildRequires: perl

4. http://downloads.sourceforge.net/project/docx2txt/docx2txt/v1.2/docx2txt-1.2.tgz

is the real URL.
Comment 2 Mathieu Bridon 2013-09-25 04:20:53 EDT
(In reply to Christopher Meng from comment #1)
> 2. Missing %config(noreplace) with %{_sysconfdir}/%{name}.config

Oops, good catch!

> 3. BuildRequires: /usr/bin/perl
> 
> just
> 
> BuildRequires: perl

I hesitated on that one, not sure which one is better, but ok.

> 4.
> http://downloads.sourceforge.net/project/docx2txt/docx2txt/v1.2/docx2txt-1.2.
> tgz
> 
> is the real URL.

Nope: https://fedoraproject.org/wiki/Packaging:SourceURL#Sourceforge.net

-----

Spec URL: http://bochecha.fedorapeople.org/packages/docx2txt.spec
SRPM URL: http://bochecha.fedorapeople.org/packages/docx2txt-1.2-2.fc20.src.rpm
Comment 3 Christopher Meng 2013-09-25 04:28:16 EDT
(In reply to Mathieu Bridon from comment #2)
> > 4.
> > http://downloads.sourceforge.net/project/docx2txt/docx2txt/v1.2/docx2txt-1.2.
> > tgz
> > 
> > is the real URL.
> 
> Nope: https://fedoraproject.org/wiki/Packaging:SourceURL#Sourceforge.net

Well, that's the direct link got from sf, not one I generate.
Comment 4 Haïkel Guémar 2013-09-25 04:34:20 EDT
As for the download url, we should stick to the guidelines, other SF generated urls are not guaranteed to work in the future.
Comment 5 Christophe Burgun 2013-09-25 15:47:12 EDT
Hi,

http://bochecha.fedorapeople.org/packages/docx2txt-1.2-2.fc20.src.rpm isn't good
(404)

As point 1 of Christopher : 1. # Sent upstream by email on 2913-09-25
=> 2013 instead of 2913
Comment 6 Mathieu Bridon 2013-09-25 23:33:22 EDT
(In reply to Christophe Burgun from comment #5)
> http://bochecha.fedorapeople.org/packages/docx2txt-1.2-2.fc20.src.rpm isn't
> good
> (404)

Gah, I uploaded the noarch RPM instead of the source one. >_<

Sorry about that.

> As point 1 of Christopher : 1. # Sent upstream by email on 2913-09-25
> => 2013 instead of 2913

Oh, good catch, I hadn't seen the typo even in Christopher's comment above!

-----

Spec URL: http://bochecha.fedorapeople.org/packages/docx2txt.spec
SRPM URL: http://bochecha.fedorapeople.org/packages/docx2txt-1.2-3.fc20.src.rpm
Comment 7 Christopher Meng 2013-09-25 23:51:27 EDT
[cut]

Rpmlint
-------
Checking: docx2txt-1.2-3.fc21.noarch.rpm
          docx2txt-1.2-3.fc21.src.rpm
docx2txt.noarch: W: spelling-error %description -l en_US hyperlinked -> hyper linked, hyper-linked, hyperlink ed
docx2txt.noarch: W: spelling-error %description -l en_US uncommenting -> commenting, commentating, complimenting
docx2txt.noarch: W: invalid-url URL: http://docx2txt.sourceforge.net/ <urlopen error timed out>
docx2txt.noarch: E: executable-marked-as-config-file /etc/docx2txt.config
docx2txt.noarch: E: script-without-shebang /etc/docx2txt.config
docx2txt.noarch: W: no-manual-page-for-binary docx2txt.sh
docx2txt.noarch: W: no-manual-page-for-binary docx2txt.pl
docx2txt.src: W: spelling-error %description -l en_US hyperlinked -> hyper linked, hyper-linked, hyperlink ed
docx2txt.src: W: spelling-error %description -l en_US uncommenting -> commenting, commentating, complimenting
docx2txt.src: W: invalid-url URL: http://docx2txt.sourceforge.net/ <urlopen error timed out>
docx2txt.src: W: strange-permission docx2txt-1.2.tgz 0444L
2 packages and 0 specfiles checked; 2 errors, 9 warnings.




Rpmlint (installed packages)
----------------------------
# rpmlint docx2txt
docx2txt.noarch: W: spelling-error %description -l en_US hyperlinked -> hyper linked, hyper-linked, hyperlink ed
docx2txt.noarch: W: spelling-error %description -l en_US uncommenting -> commenting, commentating, complimenting
docx2txt.noarch: W: invalid-url URL: http://docx2txt.sourceforge.net/ <urlopen error timed out>
docx2txt.noarch: E: executable-marked-as-config-file /etc/docx2txt.config
docx2txt.noarch: E: script-without-shebang /etc/docx2txt.config
docx2txt.noarch: W: no-manual-page-for-binary docx2txt.sh
docx2txt.noarch: W: no-manual-page-for-binary docx2txt.pl
1 packages and 0 specfiles checked; 2 errors, 5 warnings.
# echo 'rpmlint-done:'



Requires
--------
docx2txt (rpmlib, GLIBC filtered):
    /usr/bin/env
    /usr/bin/unzip
    config(docx2txt)
    perl(:MODULE_COMPAT_5.18.1)



Provides
--------
docx2txt:
    config(docx2txt)
    docx2txt



Source checksums
----------------
http://downloads.sourceforge.net/docx2txt/docx2txt-1.2.tgz :
  CHECKSUM(SHA256) this package     : 33649d1e8c4f86df897d478376cf76bd9f2aed27a952aaa96c615bce976488cf
  CHECKSUM(SHA256) upstream package : 33649d1e8c4f86df897d478376cf76bd9f2aed27a952aaa96c615bce976488cf


Generated by fedora-review 0.5.0 (920221d) last change: 2013-08-30
Command line :/usr/bin/fedora-review -rvn docx2txt-1.2-3.fc20.src.rpm
Buildroot used: fedora-rawhide-i386
Active plugins: Generic, Shell-api, Perl
Disabled plugins: Java, C/C++, Python, SugarActivity, R, PHP, Ruby
Disabled flags: EPEL5, EXARCH, DISTTAG

--------------------------------------------------------------

Well,

First it's not always good to see /usr/bin/env in glibc filter.

Use sed to fix shebangs in two docx2txt bins.

Second, I just realize that you install 2 bins with same name but different types.

Can you tell me why we ship 2 same function scripts? Any difference?

Thanks.
Comment 8 Mathieu Bridon 2013-09-26 00:26:17 EDT
(In reply to Christopher Meng from comment #7)
> Rpmlint
> -------
> docx2txt.noarch: E: executable-marked-as-config-file /etc/docx2txt.config
> docx2txt.noarch: E: script-without-shebang /etc/docx2txt.config

Oops, I missed that one.

Fixed.

> Requires
> --------
> docx2txt (rpmlib, GLIBC filtered):
>     /usr/bin/env
>     /usr/bin/unzip
>     config(docx2txt)
>     perl(:MODULE_COMPAT_5.18.1)
>
[... snip ...]
>
> First it's not always good to see /usr/bin/env in glibc filter.

I have no idea what you mean here. /usr/bin/env has nothing to do with "glibc filtered", it is merely a requirement of the binary package.

You might be confused by the "docx2txt (rpmlib, GLIBC filtered)" line above: this is just Fedora Review listing the requirements for docx2txt, and letting you know that it filtered the rpm- and glibc-related requirements from the output, so you can focus on the ones which matter.

But the /usr/bin/env requirement is not "in glibc filter", that phrase makes no sense.

Do you mean that's it's not good to have /usr/bin/env as a requirement?

If so...

> Use sed to fix shebangs in two docx2txt bins.

... fixed. (but not with sed)

> Second, I just realize that you install 2 bins with same name but different
> types.
>
> Can you tell me why we ship 2 same function scripts? Any difference?

Right, that's certainly confusing.

The docx2txt.sh file is a wrapper shell script around the docx2txt.pl file.

Originally, I was thinking about packaging only the .pl file, and renaming it to /usr/bin/docx2txt.

However, the upstream documentation mentions both the .sh and .pl scripts:
  http://docx2txt.cvs.sourceforge.net/viewvc/docx2txt/docx2txt/README?view=markup

So I also find it weird to have these two scripts, but I'd rather not diverge from upstream for such a silly little thing, honestly.

-----

Spec URL: http://bochecha.fedorapeople.org/packages/docx2txt.spec
SRPM URL: http://bochecha.fedorapeople.org/packages/docx2txt-1.2-4.fc20.src.rpm
Comment 9 Christopher Meng 2013-09-26 00:45:20 EDT
PACKAGE APPROVED.

(NOTE I misunderstood something in the review template, sorry)
Comment 10 Mathieu Bridon 2013-09-26 01:05:56 EDT
Thanks for the review Christopher, and thank you for the comments Christophe and Haikel!

New Package SCM Request
=======================
Package Name: docx2txt
Short Description: Convert Docx documents to Text
Owners: bochecha
Branches: devel
Comment 11 Kevin Fenzi 2013-09-27 14:54:51 EDT
Git done (by process-git-requests).
Comment 12 Mathieu Bridon 2013-09-29 23:44:29 EDT
Thanks for the VCS creation Kevin!

Package built in Rawhide, closing.

Note You need to log in before you can comment on or make changes to this bug.