| Summary: | iconv: the iconv interface does not allow to specify normalization | ||
|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Nikos Mavrogiannopoulos <nmavrogi> |
| Component: | glibc | Assignee: | Carlos O'Donell <codonell> |
| Status: | CLOSED WORKSFORME | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | rawhide | CC: | arjun.is, codonell, dj, fweimer, jakub, law, mfabian, nmavrogi, pfrankli, siddhesh |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-11-08 13:15:59 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Nikos Mavrogiannopoulos
2016-10-05 14:47:29 UTC
Do you need a specific version of the Unicode standard for stability? Then iconv will never be what you need because the plan is only to support the latest Unicode standard at the time of a glibc release. No I do not think that a specific version of the standard is required for the request above. As far as I understand a conversion from UTF-8 to UTF-16BE//NFC would have the same output for the same characters, so that is sufficient for me no matter of the underlying standard. My main use case for this request is being able to convert UTF-8 input to a UTF-16BE string under the NFC rules (to be used as a password, -and thus the output encoding must reproducible). (In reply to Nikos Mavrogiannopoulos from comment #2) > No I do not think that a specific version of the standard is required for > the request above. As far as I understand a conversion from UTF-8 to > UTF-16BE//NFC would have the same output for the same characters, so that is > sufficient for me no matter of the underlying standard. That's not correct. New characters may do away with the need for using combining characters to represent some glyphs, and so NFC results change. > My main use case for this request is being able to convert UTF-8 input to a > UTF-16BE string under the NFC rules (to be used as a password, -and thus the > output encoding must reproducible). That's not going to work with glibc, sorry. You need to record the specific version of Unicode/NFC to use and have support tables for that. icu should provide this. Hmm, If this is the normalization that is used: http://www.unicode.org/reports/tr15/#Stability_of_Normalized_Forms then it should be stable. So while this is traditionally the domain of ICU, we might be able to support this in glibc. Do you need this in iconv, or would a wchar_t *-to-wchar_t * conversion do the job as well? (In reply to Florian Weimer from comment #4) > Hmm, If this is the normalization that is used: > > http://www.unicode.org/reports/tr15/#Stability_of_Normalized_Forms > > then it should be stable. > So while this is traditionally the domain of ICU, we might be able to > support this in glibc. As I see it glibc already supports character conversions to and from UTF-8. Without normalization that means that a modern library which has to conform to (any) standard which involved UTF-8 has no way to specify (or even know) the normalization of the output data. > Do you need this in iconv, or would a wchar_t *-to-wchar_t * conversion do > the job as well? I use only iconv(), so I cannot talk about the other APIs. I no longer thing that the libc-provided APIs are reasonable for unicode processing. I am switching to libunistring. |