Bug 167820 - Review Request: enca - Charset analyzer and discriminator
Summary: Review Request: enca - Charset analyzer and discriminator
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: Package Review
Version: rawhide
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Ville Skyttä
QA Contact: David Lawrence
URL: http://trific.ath.cx/software/enca/
Whiteboard:
Depends On:
Blocks: FE-ACCEPT
TreeView+ depends on / blocked
 
Reported: 2005-09-08 15:12 UTC by Dmitry Butskoy
Modified: 2007-12-10 16:38 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-09-16 11:44:55 UTC
Type: ---
Embargoed:
kevin: fedora-cvs+


Attachments (Terms of Use)

Description Dmitry Butskoy 2005-09-08 15:12:17 UTC
Spec Url: http://dmitry.butskoy.name/enca/enca.spec
SRPM Url: http://dmitry.butskoy.name/enca/enca-1.7-1.src.rpm

Some countries have several charsets in use. To handle such a situation, programs include auto-detectind code (Mozilla has it, and other). But these are mostly GUI programs, whereas a command line utility with the same functionality is currently missed in Fedora.


Description:

Enca is an Extremely Naive Charset Analyser. It detects character set and
encoding of text files and can also convert them to other encodings using
either a built-in convertor or external libraries and tools like libiconv,
librecode, or cstocs.

Currently, it has support for Belarussian, Bulgarian, Croatian, Czech,
Estonian, Latvian, Lithuanian, Polish, Russian, Slovak, Slovene, and
Ukrainian and some multibyte encodings (mostly variants of Unicode)
independently on the language.

This package also contains shared Enca library other programs can make use of.

Install enca if you need to cope with text files of dubious origin
and unknown encoding and convert them to some reasonable encoding.

Comment 1 Ville Skyttä 2005-09-08 16:53:28 UTC
Will review.  I have a local package of this too, see    
http://cachalot.mine.nu/4/SRPMS/enca-1.7-0.1.src.rpm in case you wish to merge    
some bits.  Initial comments, comparing my and your specfiles, some nitpicks,  
some not:  
    
- Why no support for recode, is everything already covered by iconv?  Could be    
enabled by BuildRequiring recode-devel and removing --without-librecode.    
    
- More docs could be included, for example TODO and ChangeLog*    
    
- No %{?_smp_mflags} in %build    
    
- Test suite not run, could do "make check" in a %check section (after    
%install)    
    
- Static libs shouldn't be shipped without a good reason    
    
- -devel should depend on pkgconfig   
   
- Configuring with --disable-dependency-tracking would result in cleaner build   
output and possibly slight speedup   
   
- libexecdir could be overridden with %{_libdir} for FHS compliance (see my 
package for an example) 
   
- "discriminator" sounds weird to me in the package summary, but that might be 
just my English... what does it mean in this context? 

Comment 2 Dmitry Butskoy 2005-09-08 17:25:47 UTC
> Why no support for recode
I successfully use enca built with iconv (a part of glibc).
Build with recode produces extra package dependency, which I prefer to avoid.
> is everything already covered by iconv?
Don`t now...

> More docs could be included, for example TODO and ChangeLog*
Is it reasonable to include such files for package end-users? It seems to be
useful when accompanied with the source code only.

> Test suite not run, could do "make check" in a %check section
In your srpm it is "%check || :" -- i.e. depends nothing on it (always true)?
That what reason to do it?

> --libexecdir=%{_libdir} 
%configure provides "--libexecdir=%_libexecdir" already...

> "discriminator"
Agree, it is questionable.
May be better just: "Character sets analyzer" ? Or "auto-detector"...

> smp_mflags & disable-dependency-tracking
Will be done.


Comment 3 Ville Skyttä 2005-09-08 18:35:47 UTC
(In reply to comment #2) 
> > is everything already covered by iconv? 
> Don`t now... 
 
Okay, recode can be enabled later if someone can point out a case where iconv 
isn't up to the task. 
 
> > More docs could be included, for example TODO and ChangeLog* 
> Is it reasonable to include such files for package end-users? It seems to be 
> useful when accompanied with the source code only. 
 
ChangeLog* is not necessary in this case because there's an end user oriented 
NEWS file included, but TODO contains some information about known issues; I 
think people would be interested in that. 
 
> > Test suite not run, could do "make check" in a %check section 
> In your srpm it is "%check || :" -- i.e. depends nothing on it (always 
true)? 
> That what reason to do it? 
 
The "|| :" part is a relic, it is there to support pre-rpm 4.2 versions that 
don't understand the %check _section marker_.  In a more up to date, clean 
specfile that would read: 
 
   %check 
   make check 
 
More info: 
http://rpm.org/max-rpm-snapshot/s1-rpm-specref-scripts.html#S3-RPM-SPECREF-CHECK 
http://rpm.org/max-rpm-snapshot/s1-rpm-inside-scripts.html#S3-RPM-INSIDE-CHECK-SCRIPT 
 
> > --libexecdir=%{_libdir}  
> %configure provides "--libexecdir=%_libexecdir" already... 
 
I know.  That's why I'm overriding it with %{_libdir} (not %{_libexecdir}).  
This is not a blocker but just a note if you want the package's layout closer 
to the FHS. 
 
> > "discriminator" 
> Agree, it is questionable. 
> May be better just: "Character sets analyzer" ? Or "auto-detector"... 
 
Or "Character set detector and converter" 

Comment 4 Dmitry Butskoy 2005-09-08 19:36:38 UTC
TODO included, make check done. 

Using "--libexecdir=%{_libdir}" does not any changes in result.
And I nevertheless would prefer to trust %configure macro here...

About recode:
There are some reasons to not use it.
- The main Enca goal is to *detect* charset. There are another good programs for
conversion.
- User always can use Enca with "recode", as Enca is built with external program
support (just run "enca -E recode")
- Charsets which Enca actually operates, are limited by charsets of the problem
languages. All possible charset conversions in this context IMHO are well
supported by standard glibc's iconv. 

Summary: Character set analyzer and detector
The word "converter" is not good here, as converting is just an additional feature.

New SRPM: http://dmitry.butskoy.name/enca/enca-1.7-2.src.rpm
New SPEC: http://dmitry.butskoy.name/enca/enca.spec



Comment 5 Ville Skyttä 2005-09-08 20:51:55 UTC
(In reply to comment #4)   
> Using "--libexecdir=%{_libdir}" does not any changes in result.   
   
Hmm, right, that's because you're using %makeinstall which overrides it again.  
BTW, "make install DESTDIR=$RPM_BUILD_ROOT" is generally better than 
%makeinstall; the latter is an ugly hack created for some old config systems 
that don't support sane staged installs using DESTDIR. 
 
 
> New SRPM: http://dmitry.butskoy.name/enca/enca-1.7-2.src.rpm   
   
Unaddressed issue from comment 1:  
- Static libs shouldn't be shipped without a good reason  
http://fedoraproject.org/wiki/PackagingGuidelines#StaticLibraries  
Configuring with --disable-static should work (untested).  
  
  
Build fails, the package insists to install the HTML docs here (and even  
outside of the build root, using "make install DESTDIR=$RPM_BUILD_ROOT"  
instead of %makeinstall would fix that but not the actual issue why the docs  
are installed by "make install" in the first place):  
  Making install in devel-docs  
  make[1]: Entering directory `/home/scop/rpmbuild/BUILD/enca-1.7/devel-docs'  
  make[2]: Entering directory `/home/scop/rpmbuild/BUILD/enca-1.7/devel-docs'  
  make[2]: Nothing to be done for `install-exec-am'.  
  /bin/sh ../mkinstalldirs /usr/share/gtk-doc/html/libenca  
  mkdir /usr/share/gtk-doc/html/libenca  
 mkdir: cannot create directory `/usr/share/gtk-doc/html/libenca': Permission  
denied  
  make[2]: *** [install-data-local] Error 1  
  
  
The availability of external converter programs at build time affects what  
becomes the default external converter.  For reproducible builds, make sure 
that the default becomes consistently the same, eg. add  
"BuildRequires: recode" (assuming we don't have cstocs installed anywhere; it 
is checked before recode by the config system). 

Comment 6 Dmitry Butskoy 2005-09-09 11:18:08 UTC
> htmls & libexec
Now:

%makeinstall HTML_DIR=$RPM_BUILD_ROOT/tmp/html
rm -rf $RPM_BUILD_ROOT/tmp/html

Using HTML_DIR, we put install of docs to some place, then remove them.
(Unfortunately there is no way to omit doc install at all).

I don't want to specify libexecdir explicitly. If "--libexecdir=%{_libdir}" is
good for some Fedora distribution, it should be included into %configure macro
of this distribution. And for the similar reasons, I don't want to refuse
%makeinstall .

> static libs
Enca seems to have a good devel stuff, with html docs etc. This devel stuff
would be a little bit incomplete if I omit static libs.

I would prefer to follow static lib policy of the corresponding Core
distribution. As I understand, currently devel packages have static libs...

BTW, is there some macro (like "dist"), which can be used in "%if" to make
static libs build conditional?

> external converters' mess
What about --disable-external ? Internal + iconv should be enough (see comment #4)


Comment 7 Ville Skyttä 2005-09-09 13:45:32 UTC
(In reply to comment #6)    
> I would prefer to follow static lib policy of the corresponding Core    
> distribution. As I understand, currently devel packages have static libs...    
    
If you don't like the guidelines, I suggest voicing your concerns on    
the fedora-packaging list.  This guideline is pretty new, and has not been    
applied everywhere yet, but that doesn't mean that it shouldn't be applied to    
new packages (unless there's a good reason not to, and "good devel stuff with   
html docs" isn't one when balanced against the potential security problems   
shipping static libs can inflict on users).  More info in the  
fedora-maintainers list archives:  
https://www.redhat.com/archives/fedora-maintainers/2005-July/thread.html  
  
  
> BTW, is there some macro (like "dist"), which can be used in "%if" to make    
> static libs build conditional?    
    
The standard --with functionality of rpmbuild should work:    
    
  %configure %{!?_with_static:--disable-static}    
  [...]    
  %files    
  [...]    
  %if 0%{?_with_static:1}    
  %{_libdir}/*.a    
  %endif    
   
   
> > external converters' mess    
> What about --disable-external ? Internal + iconv should be enough (see    
comment #4)    
    
Works for me, they can be enabled later if need be.  Note that with   
--disable-external, the now useless external converter scripts are still   
installed in libexecdir and should be manually removed in the specfile.  

Comment 8 Dmitry Butskoy 2005-09-09 14:53:12 UTC
I respect guidelines, but prefer to trust precedents ;-)
OK, I already see some -devel packages in RawHide (FC5) without static libraries.

Probably, it is a good decision:
if (dist == fc3 || dist == fc4 || _with_static)  then use static libs,
else not.

New SRPM: http://dmitry.butskoy.name/enca/enca-1.7-3.src.rpm
New SPEC: http://dmitry.butskoy.name/enca/enca.spec



Comment 9 Dmitry Butskoy 2005-09-12 17:09:47 UTC
...and %makeinstall is changed to "make install DESTDIR=... HTML_DIR=..."

Comment 10 Ville Skyttä 2005-09-15 20:52:16 UTC
Approved, suggesting that no static libs are shipped for any of the distro    
versions.  A couple of cosmetic things to tweak before the first build if  
you're bored and feel like fine tuning:  
- The placement of %post(un) scriptlets in the specfile is unusual    
- Not sure if index.sgml and libenca.devhelp are useful in the docs  

Comment 11 Dmitry Butskoy 2005-09-16 11:16:02 UTC
  %post(un) moved to usual place (:-))
  Cleanup in %doc section: leave *.html and README.devel only.


  Ville,

  Thanks for review, it was a useful training for me.

Comment 12 Ville Skyttä 2005-09-17 08:02:02 UTC
Thanks.  Don't forget to add enca to owners.list in CVS. 

Comment 13 Dmitry Butskoy 2007-12-10 12:45:24 UTC
Package Change Request
======================
Package Name: enca
New Branches: EL-4 EL-5

Comment 14 Kevin Fenzi 2007-12-10 16:38:54 UTC
cvs done.


Note You need to log in before you can comment on or make changes to this bug.