| Summary: | hunspell has problems with input above the BMP, causes this: [abrt] ibus-typing-booster: SuggestMgr::leftcommonsubstring(char*, char const*)(): python3.5 killed by SIGSEGV | ||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Mike FABIAN <mfabian> | ||||||||||||||||||||||||||||||||
| Component: | hunspell | Assignee: | Caolan McNamara <caolanm> | ||||||||||||||||||||||||||||||||
| Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||||||||||||||||||||||||
| Severity: | unspecified | Docs Contact: | |||||||||||||||||||||||||||||||||
| Priority: | unspecified | ||||||||||||||||||||||||||||||||||
| Version: | 24 | CC: | anish.developer, caolanm, hopparz, i18n-bugs, mfabian, smaitra | ||||||||||||||||||||||||||||||||
| Target Milestone: | --- | ||||||||||||||||||||||||||||||||||
| Target Release: | --- | ||||||||||||||||||||||||||||||||||
| Hardware: | x86_64 | ||||||||||||||||||||||||||||||||||
| OS: | Unspecified | ||||||||||||||||||||||||||||||||||
| URL: | https://retrace.fedoraproject.org/faf/reports/bthash/3391ac560b2bed2b86c62550527292850e9f5fc8 | ||||||||||||||||||||||||||||||||||
| Whiteboard: | abrt_hash:ce549929093186e3290aca1694c819a2a6ba930e; | ||||||||||||||||||||||||||||||||||
| Fixed In Version: | hunspell-1.3.3-10.fc24 | Doc Type: | If docs needed, set a value | ||||||||||||||||||||||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||||||||||||||||||||||
| Clone Of: | Environment: | ||||||||||||||||||||||||||||||||||
| Last Closed: | 2016-09-01 16:53:11 UTC | Type: | --- | ||||||||||||||||||||||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||||||||||||||||||||||
| Documentation: | --- | CRM: | |||||||||||||||||||||||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||||||||||||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||||||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||||||||||||||||
| Attachments: |
|
||||||||||||||||||||||||||||||||||
|
Description
Mike FABIAN
2016-08-21 09:09:17 UTC
Created attachment 1192545 [details]
File: backtrace
Created attachment 1192546 [details]
File: cgroup
Created attachment 1192547 [details]
File: core_backtrace
Created attachment 1192548 [details]
File: dso_list
Created attachment 1192549 [details]
File: environ
Created attachment 1192550 [details]
File: exploitable
Created attachment 1192551 [details]
File: limits
Created attachment 1192552 [details]
File: maps
Created attachment 1192553 [details]
File: mountinfo
Created attachment 1192554 [details]
File: namespaces
Created attachment 1192555 [details]
File: open_fds
Created attachment 1192556 [details]
File: proc_pid_status
Created attachment 1192557 [details]
File: var_log_messages
Created attachment 1192563 [details]
python-enchant-crash.py
It crashes, because python3-enchant crashes:
$ python3 python-enchant-crash.py
['Budapest', 'Budapesti', 'Budapesté', 'Budapestű']
[b'Budapest', b'Budapesti', b'Budapest\xc3\xa9', b'Budapest\xc5\xb1']
This UTF-8 encoding can't convert to UTF-16:
𐲂𐳪𐳇𐳀𐳠𐳉𐳤𐳦
This UTF-8 encoding can't convert to UTF-16:
𐲂𐳪𐳇𐳀𐳠𐳉𐳤𐳦
This UTF-8 encoding can't convert to UTF-16:
𐲂𐳪𐳇𐳀𐳠𐳉𐳤𐳦
Segmentation fault (コアダンプ)
mfabian@ari:~
$
Created attachment 1192564 [details]
hunspell-conversion-problem.txt
python3-enchant probably crashes because of this problem in hunspell:
$ hunspell -d hu_HU -i utf-8 -l hunspell-conversion-problem.txt
hBudapxst
This UTF-8 encoding can't convert to UTF-16:
😇
This UTF-8 encoding can't convert to UTF-16:
😇
This UTF-8 encoding can't convert to UTF-16:
𐳠
This UTF-8 encoding can't convert to UTF-16:
𐳠
mfabian@ari:~
$
Of course the file converts to UTF-16 just fine:
$ iconv -f utf-8 -t utf-16 < hunspell-conversion-problem.txt | iconv -f utf-16 -t utf-8
Budapxst
😇
𐳠
mfabian@ari:~
$
It looks like hunspell has problems with characters above the BMP (Basic Multilingual Plane).
It works on current rawhide with hunspell-1.4.1: [mfabian@Fedora-Workstation-netinst-x86_6 ~]$ cat /etc/fedora-release Fedora release 26 (Rawhide) [mfabian@Fedora-Workstation-netinst-x86_6 ~]$ python3 python-enchant-crash.py ['Budapest', 'Budapesti', 'Budapesté', 'Budapestű'] [b'Budapest', b'Budapesti', b'Budapest\xc3\xa9', b'Budapest\xc5\xb1'] [] [] [mfabian@Fedora-Workstation-netinst-x86_6 ~]$ rpm -q hunspell hunspell-1.4.1-1.fc25.x86_64 [mfabian@Fedora-Workstation-netinst-x86_6 ~]$ hunspell -d hu_HU -i utf-8 -l hunspell-conversion-problem.txt Budapxst [mfabian@Fedora-Workstation-netinst-x86_6 ~]$ Can the fix be backported to f24? Good to know the big rework of stuff in hunspell had a practical worthwhile effect. I'll have to bisect to find when it started working to see what exactly was the cause to see if its backportable in isolation. hunspell-1.3.3-10.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2016-1a8b18ee44 hunspell-1.3.3-10.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-1a8b18ee44 hunspell-1.3.3-10.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report. |