Bug 968386 - hu_HU.aff is in a weird mixture of encodings
Summary: hu_HU.aff is in a weird mixture of encodings
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: hunspell-hu
Version: 19
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Caolan McNamara
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-05-29 15:24 UTC by Mike FABIAN
Modified: 2013-10-08 10:52 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-10-08 10:52:43 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Mike FABIAN 2013-05-29 15:24:39 UTC
mfabian@ari:/usr/share/myspell
    $ rpm -qf hu_HU.aff
    hunspell-hu-1.6.1-4.fc18.noarch
    mfabian@ari:/usr/share/myspell
    $ 

The file command already tells use that there
is something wrong with the encoding:

    mfabian@ari:/usr/share/myspell
    $ file  hu_HU.aff 
    hu_HU.aff: Non-ISO extended-ASCII text, with very long lines

The file claims to be UTF-8 encoded:

    mfabian@ari:/usr/share/myspell
    $ grep ^SET hu_HU.aff
    SET UTF-8     

There are indeed some lines which are UTF-8 encoded:

    mfabian@ari:/usr/share/myspell
    $ grep ^TRY hu_HU.aff
    TRY íóútaeslzánorhgkiédmyőpvöbucfjüűxwq-.à        
    mfabian@ari:/usr/share/myspell
    $ grep ^KEY hu_HU.aff
    KEY öüó|qwertzuiopőú|asdfghjkléáű|íyxcvbnm        

But there are other lines in ISO-8859-2 encoding:

    mfabian@ari:/usr/share/myspell
    $ grep ^NAME hu_HU.aff
    NAME Magyar Ispell helyes�r�si sz�t�r     
    mfabian@ari:/usr/share/myspell
    $ grep ^NAME hu_HU.aff | iconv -f iso-8859-2 -t utf-8 
    NAME Magyar Ispell helyesírási szótár     
    mfabian@ari:/usr/share/myspell
    $ grep 2002-2010 hu_HU.aff
    # 2002-2010 (c) L�szl� N�meth <nemeth> and Ferenc God�     
    mfabian@ari:/usr/share/myspell
    $ grep 2002-2010 hu_HU.aff | iconv -f iso-8859-2 -t utf-8
    # 2002-2010 (c) László Németh <nemeth> and Ferenc Godó     

There are many lines at the top of the file which are neither in UTF-8
nor in ISO-8859-2 encoding:

    mfabian@ari:/usr/share/myspell
    $ head -n 2 hu_HU.aff
    AF 1262
    AF V˯j�Ln�����TtYc��l # 1
    mfabian@ari:/usr/share/myspell
    $ head -n 2 hu_HU.aff | iconv -f iso-8859-2 -t utf-8
    AF 1262
    AF VËŻj×LnÓéčłÄäTtYc¸źl # 1
    mfabian@ari:/usr/share/myspell
    $ head -n 2 hu_HU.aff | iconv -f iso-8859-1 -t utf-8
    AF 1262
    AF V˯j×LnÓéè³ÄäTtYc¸¼l # 1
    mfabian@ari:/usr/share/myspell
    $

Comment 1 Mike FABIAN 2013-05-29 15:29:14 UTC
This is *not* coming from our .spec file, the problem exists
like that in the original upstream tarball hu_HU-1.6.1.tar.gz already.

Comment 2 Caolan McNamara 2013-05-29 15:45:16 UTC
I think this is the result of running "makealias" over the originals to make a more compact format where the stuff after "AF" is basically raw binary

Comment 3 Caolan McNamara 2013-10-08 10:52:43 UTC
I'm not touching this with a barge pole, it's apparently deliberate according to nemeth


Note You need to log in before you can comment on or make changes to this bug.