Bug 1249416 - Package for uchardet?
Summary: Package for uchardet?
Keywords:
Status: CLOSED DUPLICATE of bug 1264713
Alias: None
Product: Fedora
Classification: Fedora
Component: Package Review
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Nobody's working on this, feel free to take it
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-08-02 18:13 UTC by Jehan
Modified: 2015-09-21 01:41 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-08-03 11:17:54 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Jehan 2015-08-02 18:13:00 UTC
Hi,

I am unsure of the procedure to request new packages. Sorry by advance if opening a report was not the right way.

mpv (already packaged in Fedora) recently got support for the uchardet library (https://github.com/mpv-player/mpv/issues/908) which is a lot more efficient and broad language support that enca. Enca has mostly support for latin and cyrillic languages with the exception of Chinese. In particular it won't recognize encodings made for Japanese or Korean (even though UTF-8 will hopefully be someday the only used, right now this is still far from the case). Moreover enca is officially in maintenance mode with no new features planned (cf. http://blog.cihar.com/archives/2014/10/20/enca-116/).

On the other hand, uchardet is a C port of Mozilla encoding detection, and therefore recognize quite well a lot of encodings: https://github.com/BYVoid/uchardet

License is Mozilla Public License Version 1.1.

I think that uchardet (and similar libs) should supersede enca for the FOSS distributions to be used worldwide. Right now, mplayer/mpv/vlc,etc. none of the video players are able to read most Japanese and Korean subtitles without user tweaking. Same for text editors, etc. (right now gedit for instance is unable t recognize files found on the web in EUC-KR).
Would it be possible to package uchardet?
Thanks!

Note: fedora already has python-chardet and jchardet/juniversalchardet, respectively Python and Java ports of Mozilla C++ code.

Comment 1 Christopher Meng 2015-08-03 05:59:23 UTC
uchardet was written by my friend who is now working at Google, it's his project back to at least 4 years ago, do you still wanna use it?

Comment 2 Jehan 2015-08-03 10:26:07 UTC
Are you saying that the project is more or less abandoned? Is that the problem?

Apart from a few commits a year, I indeed don't see much activity there. But it is based anyway on years-old code of Mozilla, which does not seem to have been touched since 2008 as well, based on an algorithm and a paper itself made and untouched since 2002 (http://www.mozilla.org/projects/intl/chardet.html and http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html). I guess existing encoding detection works well and does not need to be modified too much, except from a few fixes here and there (there is still small activity in uchardet repo).

Of course if you know a better C lib for encoding detection (more encoding support, better algorithm, active development even…), I'm good with any alternative and have no preference there. I just wish our distributions were more user-friendly for Asian users and installed software would detect automatically common encoding still in use. If such a lib were packaged everywhere, then apps would start using it.

Comment 3 Michael Schwendt 2015-08-03 11:17:54 UTC
> I am unsure of the procedure to request new packages.

https://fedoraproject.org/wiki/Package_maintainers_wishlist

Package Review Request queue is existing packages:
 * https://fedoraproject.org/wiki/Package_Review_Process
 * https://fedoraproject.org/wiki/Category:Package_Maintainers

Comment 4 Christopher Meng 2015-09-21 01:41:52 UTC

*** This bug has been marked as a duplicate of bug 1264713 ***


Note You need to log in before you can comment on or make changes to this bug.