Bug 444802 - parser error : parsing XML declaration: '?>' expected when DTD specified
Summary: parser error : parsing XML declaration: '?>' expected when DTD specified
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: libxml2
Version: 5.0
Hardware: i386
OS: Linux
low
low
Target Milestone: rc
: ---
Assignee: Daniel Veillard
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-04-30 17:36 UTC by Dave Malcolm
Modified: 2008-05-12 07:55 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-05-12 07:55:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Dave Malcolm 2008-04-30 17:36:39 UTC
Description of problem:

Ran into an unexpected parser error running xmllint with --valid on an XML file
 (or XML-like; not sure if it's well-formed) with a particular header; got it
down to this reproducer:
$ cat test-1.xml
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE greeting PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "">
<greeting>Hello, world!</greeting> 

$ xmllint --noout test-1.xml
$ echo $?
0
$ xmllint --noout --valid test-1.xml
test-1.xml:1: parser error : parsing XML declaration: '?>' expected
<?xml version="1.0" encoding="utf-8" standalone="no"?>
                                     ^
test-1.xml:2: parser error : Content error in the external subset
<!DOCTYPE greeting PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "">
^
test-1.xml:2: parser error : Content error in the external subset
<!DOCTYPE greeting PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "">
^

Trying without the DOCTYPE decl:
$ cat test-2.xml
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<greeting>Hello, world!</greeting> 
$ xmllint --noout test-2.xml
$ echo $?
0
$ xmllint --noout --valid test-2.xml
test-2.xml:2: validity error : Validation failed: no DTD found !
<greeting>Hello, world!</greeting> 

Version-Release number of selected component (if applicable):
libxml2-2.6.26-2.1.2

How reproducible:
100%

Is test-1.xml well-formed?  If so, this is a bug; if not, the error is confusing
(why is complaining about the "standalone", rather than the DOCTYPE?)

Comment 2 Daniel Veillard 2008-05-12 07:54:59 UTC
It's actually not a bug ! But understanding why is a bit nasty ...
It all boils down to the SYSTEM identifier you used in the DTD
declaration: "" , i.e. an empty string.
http://www.w3.org/TR/REC-xml/#NT-doctypedecl
and
http://www.w3.org/TR/REC-xml/#NT-ExternalID
are the production gouverning the DOCTYPE. 

------------------------
Definition: The SystemLiteral is called the entity's system identifier. It is
meant to be converted to a URI reference (as defined in [IETF RFC 3986]),
------------------------
Now if you go back to RFC 3986] and al. you will find that "" is a correct
URI-Reference meaning "the current document containing the reference",
so basically you're asking the XML parser to use your XML as the DTD, this
leads to a fatal error, the first one being that for an external parsed
entity the XML declaration doesn't allow a 'standalone' part.
Of course if you don't specify validation, the DTD is not fetched/parsed 
and the document appears well-formed.

The real solution is to follow the indications from XHTML1 and *never* put
an empty system identifier, but follow the practice indicated on the spec
itself for conformance to the specification:

http://www.w3.org/TR/xhtml1/#strict

and if you don't want network fetch the solution is to ask xmllint to forbid
them with --nonet

  Definitely not a bug, the XML is actually not well-formed but that can
be detected only in validating mode !

Daniel



Note You need to log in before you can comment on or make changes to this bug.