Bug 444802 - parser error : parsing XML declaration: '?>' expected when DTD specified
parser error : parsing XML declaration: '?>' expected when DTD specified
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: libxml2 (Show other bugs)
5.0
i386 Linux
low Severity low
: rc
: ---
Assigned To: Daniel Veillard
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-04-30 13:36 EDT by Dave Malcolm
Modified: 2008-05-12 03:55 EDT (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-05-12 03:55:22 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Dave Malcolm 2008-04-30 13:36:39 EDT
Description of problem:

Ran into an unexpected parser error running xmllint with --valid on an XML file
 (or XML-like; not sure if it's well-formed) with a particular header; got it
down to this reproducer:
$ cat test-1.xml
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE greeting PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "">
<greeting>Hello, world!</greeting> 

$ xmllint --noout test-1.xml
$ echo $?
0
$ xmllint --noout --valid test-1.xml
test-1.xml:1: parser error : parsing XML declaration: '?>' expected
<?xml version="1.0" encoding="utf-8" standalone="no"?>
                                     ^
test-1.xml:2: parser error : Content error in the external subset
<!DOCTYPE greeting PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "">
^
test-1.xml:2: parser error : Content error in the external subset
<!DOCTYPE greeting PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "">
^

Trying without the DOCTYPE decl:
$ cat test-2.xml
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<greeting>Hello, world!</greeting> 
$ xmllint --noout test-2.xml
$ echo $?
0
$ xmllint --noout --valid test-2.xml
test-2.xml:2: validity error : Validation failed: no DTD found !
<greeting>Hello, world!</greeting> 

Version-Release number of selected component (if applicable):
libxml2-2.6.26-2.1.2

How reproducible:
100%

Is test-1.xml well-formed?  If so, this is a bug; if not, the error is confusing
(why is complaining about the "standalone", rather than the DOCTYPE?)
Comment 2 Daniel Veillard 2008-05-12 03:54:59 EDT
It's actually not a bug ! But understanding why is a bit nasty ...
It all boils down to the SYSTEM identifier you used in the DTD
declaration: "" , i.e. an empty string.
http://www.w3.org/TR/REC-xml/#NT-doctypedecl
and
http://www.w3.org/TR/REC-xml/#NT-ExternalID
are the production gouverning the DOCTYPE. 

------------------------
Definition: The SystemLiteral is called the entity's system identifier. It is
meant to be converted to a URI reference (as defined in [IETF RFC 3986]),
------------------------
Now if you go back to RFC 3986] and al. you will find that "" is a correct
URI-Reference meaning "the current document containing the reference",
so basically you're asking the XML parser to use your XML as the DTD, this
leads to a fatal error, the first one being that for an external parsed
entity the XML declaration doesn't allow a 'standalone' part.
Of course if you don't specify validation, the DTD is not fetched/parsed 
and the document appears well-formed.

The real solution is to follow the indications from XHTML1 and *never* put
an empty system identifier, but follow the practice indicated on the spec
itself for conformance to the specification:

http://www.w3.org/TR/xhtml1/#strict

and if you don't want network fetch the solution is to ask xmllint to forbid
them with --nonet

  Definitely not a bug, the XML is actually not well-formed but that can
be detected only in validating mode !

Daniel

Note You need to log in before you can comment on or make changes to this bug.