Bug 444802

Summary: parser error : parsing XML declaration: '?>' expected when DTD specified
Product: Red Hat Enterprise Linux 5 Reporter: Dave Malcolm <dmalcolm>
Component: libxml2Assignee: Daniel Veillard <veillard>
Status: CLOSED NOTABUG QA Contact:
Severity: low Docs Contact:
Priority: low    
Version: 5.0   
Target Milestone: rc   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-05-12 07:55:22 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description Dave Malcolm 2008-04-30 17:36:39 UTC
Description of problem:

Ran into an unexpected parser error running xmllint with --valid on an XML file
 (or XML-like; not sure if it's well-formed) with a particular header; got it
down to this reproducer:
$ cat test-1.xml
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE greeting PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "">
<greeting>Hello, world!</greeting> 

$ xmllint --noout test-1.xml
$ echo $?
0
$ xmllint --noout --valid test-1.xml
test-1.xml:1: parser error : parsing XML declaration: '?>' expected
<?xml version="1.0" encoding="utf-8" standalone="no"?>
                                     ^
test-1.xml:2: parser error : Content error in the external subset
<!DOCTYPE greeting PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "">
^
test-1.xml:2: parser error : Content error in the external subset
<!DOCTYPE greeting PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "">
^

Trying without the DOCTYPE decl:
$ cat test-2.xml
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<greeting>Hello, world!</greeting> 
$ xmllint --noout test-2.xml
$ echo $?
0
$ xmllint --noout --valid test-2.xml
test-2.xml:2: validity error : Validation failed: no DTD found !
<greeting>Hello, world!</greeting> 

Version-Release number of selected component (if applicable):
libxml2-2.6.26-2.1.2

How reproducible:
100%

Is test-1.xml well-formed?  If so, this is a bug; if not, the error is confusing
(why is complaining about the "standalone", rather than the DOCTYPE?)

Comment 2 Daniel Veillard 2008-05-12 07:54:59 UTC
It's actually not a bug ! But understanding why is a bit nasty ...
It all boils down to the SYSTEM identifier you used in the DTD
declaration: "" , i.e. an empty string.
http://www.w3.org/TR/REC-xml/#NT-doctypedecl
and
http://www.w3.org/TR/REC-xml/#NT-ExternalID
are the production gouverning the DOCTYPE. 

------------------------
Definition: The SystemLiteral is called the entity's system identifier. It is
meant to be converted to a URI reference (as defined in [IETF RFC 3986]),
------------------------
Now if you go back to RFC 3986] and al. you will find that "" is a correct
URI-Reference meaning "the current document containing the reference",
so basically you're asking the XML parser to use your XML as the DTD, this
leads to a fatal error, the first one being that for an external parsed
entity the XML declaration doesn't allow a 'standalone' part.
Of course if you don't specify validation, the DTD is not fetched/parsed 
and the document appears well-formed.

The real solution is to follow the indications from XHTML1 and *never* put
an empty system identifier, but follow the practice indicated on the spec
itself for conformance to the specification:

http://www.w3.org/TR/xhtml1/#strict

and if you don't want network fetch the solution is to ask xmllint to forbid
them with --nonet

  Definitely not a bug, the XML is actually not well-formed but that can
be detected only in validating mode !

Daniel