Bug 968386 - hu_HU.aff is in a weird mixture of encodings
hu_HU.aff is in a weird mixture of encodings
Status: CLOSED WONTFIX
Product: Fedora
Classification: Fedora
Component: hunspell-hu (Show other bugs)
19
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Caolan McNamara
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-05-29 11:24 EDT by Mike FABIAN
Modified: 2013-10-08 06:52 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-10-08 06:52:43 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Mike FABIAN 2013-05-29 11:24:39 EDT
mfabian@ari:/usr/share/myspell
    $ rpm -qf hu_HU.aff
    hunspell-hu-1.6.1-4.fc18.noarch
    mfabian@ari:/usr/share/myspell
    $ 

The file command already tells use that there
is something wrong with the encoding:

    mfabian@ari:/usr/share/myspell
    $ file  hu_HU.aff 
    hu_HU.aff: Non-ISO extended-ASCII text, with very long lines

The file claims to be UTF-8 encoded:

    mfabian@ari:/usr/share/myspell
    $ grep ^SET hu_HU.aff
    SET UTF-8     

There are indeed some lines which are UTF-8 encoded:

    mfabian@ari:/usr/share/myspell
    $ grep ^TRY hu_HU.aff
    TRY íóútaeslzánorhgkiédmyőpvöbucfjüűxwq-.à        
    mfabian@ari:/usr/share/myspell
    $ grep ^KEY hu_HU.aff
    KEY öüó|qwertzuiopőú|asdfghjkléáű|íyxcvbnm        

But there are other lines in ISO-8859-2 encoding:

    mfabian@ari:/usr/share/myspell
    $ grep ^NAME hu_HU.aff
    NAME Magyar Ispell helyes�r�si sz�t�r     
    mfabian@ari:/usr/share/myspell
    $ grep ^NAME hu_HU.aff | iconv -f iso-8859-2 -t utf-8 
    NAME Magyar Ispell helyesírási szótár     
    mfabian@ari:/usr/share/myspell
    $ grep 2002-2010 hu_HU.aff
    # 2002-2010 (c) L�szl� N�meth <nemeth@openoffice.org> and Ferenc God�     
    mfabian@ari:/usr/share/myspell
    $ grep 2002-2010 hu_HU.aff | iconv -f iso-8859-2 -t utf-8
    # 2002-2010 (c) László Németh <nemeth@openoffice.org> and Ferenc Godó     

There are many lines at the top of the file which are neither in UTF-8
nor in ISO-8859-2 encoding:

    mfabian@ari:/usr/share/myspell
    $ head -n 2 hu_HU.aff
    AF 1262
    AF V˯j�Ln�����TtYc��l # 1
    mfabian@ari:/usr/share/myspell
    $ head -n 2 hu_HU.aff | iconv -f iso-8859-2 -t utf-8
    AF 1262
    AF VËŻj×LnÓéčłÄäTtYc¸źl # 1
    mfabian@ari:/usr/share/myspell
    $ head -n 2 hu_HU.aff | iconv -f iso-8859-1 -t utf-8
    AF 1262
    AF V˯j×LnÓéè³ÄäTtYc¸¼l # 1
    mfabian@ari:/usr/share/myspell
    $
Comment 1 Mike FABIAN 2013-05-29 11:29:14 EDT
This is *not* coming from our .spec file, the problem exists
like that in the original upstream tarball hu_HU-1.6.1.tar.gz already.
Comment 2 Caolan McNamara 2013-05-29 11:45:16 EDT
I think this is the result of running "makealias" over the originals to make a more compact format where the stuff after "AF" is basically raw binary
Comment 3 Caolan McNamara 2013-10-08 06:52:43 EDT
I'm not touching this with a barge pole, it's apparently deliberate according to nemeth

Note You need to log in before you can comment on or make changes to this bug.