Red Hat Bugzilla – Bug 1016338
Detect invalid UTF-8 in XML
Last modified: 2014-08-04 18:29:20 EDT
Description of problem:
If a topic contains valid XML entities that are invalid UTF-8, PressGang doesn't report any problem, but the builds in DocBuilder and elsewhere fail for no obvious reason.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Create a topic
2. Insert a somewhere (carriage return)
3. Take a look in DocBuilder
PressGang reports no problem with the topic, but it doesn't build
PressGang warns user that there's a UTF-8 problem
PressGang should probably still allow users to write and store valid XML that's not UTF-8 compliant. That will never build in Publican, but we should remain open to the possibility that users might want to transform their XML with some other tool that might not require UTF-8 compliance.
Only adding this one as a blocker because of its pure nuisance value; it's easily worked around with sed before doing a mass upload. This particular CR is the only offending one I've hit so far.
Is there some documentation on character codes that are not valid UTF-8?