Bug 1298018
Summary: | parsing empty file with XMLParser(recover=True) raises lxml.etree.XMLSyntaxError: Document is empty | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Dan Callaghan <dcallagh> |
Component: | libxml2 | Assignee: | Daniel Veillard <veillard> |
Status: | CLOSED NOTABUG | QA Contact: | qe-baseos-tools-bugs |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.2 | CC: | jpopelka, lmiksik, ohudlick, tlavigne, veillard |
Target Milestone: | rc | Keywords: | Regression |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-09-13 20:07:06 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Dan Callaghan
2016-01-13 02:03:45 UTC
(In reply to Dan Callaghan from comment #0) > It seems this regressed from libxml2-2.9.1-5.el7_1.2.x86_64.rpm to > libxml2-2.9.1-6.el7_2.2.x86_64, which was a bunch of CVE fixes in libxml2. Does it ring a bell Daniel ? (to me none of the CVEs' names look related) We haven't updated python-lxml in RHEL-7 yet. Moving to libxml2 as python-lxml is a low profile component and is unlikely to see any update. Hi that's Daniel Veillard author of libxml2 Seems you are parsing XML not HTML. Using recover is an abuse of the spec and I threatened to remove it if people were using it casually instead of just for data recovery in the event one accept data loss or invalid data (which the XML spec goes to length to avoid as this was a design goal). https://www.w3.org/TR/REC-xml/#NT-document [1] document ::= prolog element Misc* defines what an XML document is, you can't derive an empty string from it (exercise left to the reader). So libxml2 *MUST* raise a fatal error, lxml does accordingly, the error even seems absolutely proper. You: 1/ you are abusing a corner case of the libxml2 API 2/ your document MUST raise a fatal error 3/ libxml2/lxml does so => This is not a bug it's basic compliance to XML spec so closing accordingly, fix your software to not use recover by default and second handle that exception as the parser is mandated by the spec to raise it :-) The real bug is that somehow that error wasn't raised before, just detect the empty string and never invoke the parser on it. Daniel Veillard |