Bug 191060

Summary: antiword gives user a hard time with Postscript and PDF output
Product: [Fedora] Fedora Reporter: Michal Jaegermann <michal>
Component: antiwordAssignee: Adrian Reber <adrian>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: 8CC: extras-qa
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: bzcl34nup
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-04-06 20:53:20 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Michal Jaegermann 2006-05-08 16:12:46 UTC
Description of problem:

When LANG is <something>.UTF8 then an attempt to produce Poscript or PDF
output results in something of that sort:

The combination PostScript and UTF-8 is not supported
        Name: antiword
        Purpose: Display MS-Word files
        Author: (C) 1998-2005 Adri van Os
        Version: 0.37  (21 Oct 2005)
................

It is true that one can figure out that this can be worked around with
a help of -m switch with an explicitely specified mapping file, while it
is not really clear from a documentation how this specification should
look like.  Still it would be much friendlier to provide 'antiword'
executable as a shell wrapper of this sort:

#!/bin/bash

echo "$@" | egrep -q -- '-p|-a' && lang="${LANG%.UTF-8}"
$lang antiword.bin "$@"

which would pick up a resonable encoding for Postscript and PDF on its own.
'antiword.bin' is cleary here the real executable.

Version-Release number of selected component (if applicable):
antiword-0.37-2

Comment 1 Michal Jaegermann 2006-10-19 00:41:34 UTC
Ouch!  The quoted script has obvious errors.  Bad "copy-and-waste".
Here is the corrected one:

#!/bin/sh

# a shell wrapper to make 'antiword' usage reasonable on UTF-8 systems.
#
# Michal Jaegermann, michal, 2004/Nov/03
#    - simplify and we may be printing on a Postcript printer, 2006/May/08

echo "$@" | egrep -q -w -- '-p|-a' && lang="${LANG%.UTF-8}"
LANG=$lang antiword.bin "$@"
exit

By default 8859-1 mapping will be used but something else can be
specified with a help of -m option.


Comment 2 Michal Jaegermann 2006-10-19 00:55:27 UTC
BTW - /share/doc/antiword-0.37/kantiword will be affected the same way.
The simplest way to fix it is to replace there a line

antiword -p $paper_size -i 0 "$@" 2>"$err_file" >"$out_file"

with

LANG= antiword -p $paper_size -i 0 "$@" 2>"$err_file" >"$out_file"

although there are no provisions to supply a mapping other than
a default one.

Comment 3 Bug Zapper 2008-04-04 02:50:35 UTC
Fedora apologizes that these issues have not been resolved yet. We're
sorry it's taken so long for your bug to be properly triaged and acted
on. We appreciate the time you took to report this issue and want to
make sure no important bugs slip through the cracks.

If you're currently running a version of Fedora Core between 1 and 6,
please note that Fedora no longer maintains these releases. We strongly
encourage you to upgrade to a current Fedora release. In order to
refocus our efforts as a project we are flagging all of the open bugs
for releases which are no longer maintained and closing them.
http://fedoraproject.org/wiki/LifeCycle/EOL

If this bug is still open against Fedora Core 1 through 6, thirty days
from now, it will be closed 'WONTFIX'. If you can reporduce this bug in
the latest Fedora version, please change to the respective version. If
you are unable to do this, please add a comment to this bug requesting
the change.

Thanks for your help, and we apologize again that we haven't handled
these issues to this point.

The process we are following is outlined here:
http://fedoraproject.org/wiki/BugZappers/F9CleanUp

We will be following the process here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this
doesn't happen again.

And if you'd like to join the bug triage team to help make things
better, check out http://fedoraproject.org/wiki/BugZappers

Comment 4 Michal Jaegermann 2008-04-04 03:45:19 UTC
The current version of antiword supplies /usr/share/antiword/UTF-8.txt
mapping file and, AFAICS, this is good enough for a text output.
The moment one tries to use -p or -a a result is:

The combination PostScript and UTF-8 is not supported

followed by a usage message.

My current "wrapper" script which prevents the above from
happening looks like follows:

#!/bin/sh

# a shell wrapper to make 'antiword' usage reasonable on UTF-8 systems.
#
# Michal Jaegermann, michal, 2004/Nov/03
#    - simplify and we may be printing on a Postcript printer
#    - so do not use -i0, 2006/May/08

echo "$@" | egrep -q -- '-p|-a' && lang="env LANG=${LANG%.UTF-8}"
$lang antiword.bin "$@"
exit

and that works.


Comment 5 Adrian Reber 2008-04-05 15:19:13 UTC
Just tried it an it failed. LANG is set to en_GB.utf8 and therefore it fails. If
I change your script to do "${LANG%.utf8}" and then it works for me.

Do you have an idea how we could handle both cases "UTF-8" as well as "utf8". I
think something which removes anything starting with "utf" and "UTF"?

Comment 6 Michal Jaegermann 2008-04-05 16:47:30 UTC
> LANG is set to en_GB.utf8 and therefore it fails
Yes, this will fail. I was not aware that such variants will show up.
> Do you have an idea how we could handle both cases "UTF-8" as well as "utf8"
Likely the simplest way to account for all possbile LANG values would be

    lang="env LANG=${LANG%.*}"

This removes the last '.' and everything which follows.  To erase
all from the first dot and further use this:

    lang="env LANG=${LANG%%.*}"

I do not think that this makes a difference here but who knows?

If your LANG is already set to en_GB, say, then nothing will be changed.

If you think that an extra caution is warranted then a construct like
that will work

  echo "$@" | egrep -q -- '-p|-a' && \
    { LNG="${LANG%.UTF*}"; lang="env LANG=${LNG%.utf*}"; }

Take your pick.  Most likely the first proposition is just fine
(and covers such unlikely suffixes like "uTF8" too).

Comment 7 Adrian Reber 2008-04-06 20:53:20 UTC
Changes committed to rawhide and built.