Bug 444802 - parser error : parsing XML declaration: '?>' expected when DTD specified
Summary: parser error : parsing XML declaration: '?>' expected when DTD specified
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: libxml2
Version: 5.0
Hardware: i386
OS: Linux
Target Milestone: rc
: ---
Assignee: Daniel Veillard
QA Contact:
Depends On:
TreeView+ depends on / blocked
Reported: 2008-04-30 17:36 UTC by Dave Malcolm
Modified: 2008-05-12 07:55 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2008-05-12 07:55:22 UTC

Attachments (Terms of Use)

Description Dave Malcolm 2008-04-30 17:36:39 UTC
Description of problem:

Ran into an unexpected parser error running xmllint with --valid on an XML file
 (or XML-like; not sure if it's well-formed) with a particular header; got it
down to this reproducer:
$ cat test-1.xml
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE greeting PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "">
<greeting>Hello, world!</greeting> 

$ xmllint --noout test-1.xml
$ echo $?
$ xmllint --noout --valid test-1.xml
test-1.xml:1: parser error : parsing XML declaration: '?>' expected
<?xml version="1.0" encoding="utf-8" standalone="no"?>
test-1.xml:2: parser error : Content error in the external subset
<!DOCTYPE greeting PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "">
test-1.xml:2: parser error : Content error in the external subset
<!DOCTYPE greeting PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "">

Trying without the DOCTYPE decl:
$ cat test-2.xml
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<greeting>Hello, world!</greeting> 
$ xmllint --noout test-2.xml
$ echo $?
$ xmllint --noout --valid test-2.xml
test-2.xml:2: validity error : Validation failed: no DTD found !
<greeting>Hello, world!</greeting> 

Version-Release number of selected component (if applicable):

How reproducible:

Is test-1.xml well-formed?  If so, this is a bug; if not, the error is confusing
(why is complaining about the "standalone", rather than the DOCTYPE?)

Comment 2 Daniel Veillard 2008-05-12 07:54:59 UTC
It's actually not a bug ! But understanding why is a bit nasty ...
It all boils down to the SYSTEM identifier you used in the DTD
declaration: "" , i.e. an empty string.
are the production gouverning the DOCTYPE. 

Definition: The SystemLiteral is called the entity's system identifier. It is
meant to be converted to a URI reference (as defined in [IETF RFC 3986]),
Now if you go back to RFC 3986] and al. you will find that "" is a correct
URI-Reference meaning "the current document containing the reference",
so basically you're asking the XML parser to use your XML as the DTD, this
leads to a fatal error, the first one being that for an external parsed
entity the XML declaration doesn't allow a 'standalone' part.
Of course if you don't specify validation, the DTD is not fetched/parsed 
and the document appears well-formed.

The real solution is to follow the indications from XHTML1 and *never* put
an empty system identifier, but follow the practice indicated on the spec
itself for conformance to the specification:


and if you don't want network fetch the solution is to ask xmllint to forbid
them with --nonet

  Definitely not a bug, the XML is actually not well-formed but that can
be detected only in validating mode !


Note You need to log in before you can comment on or make changes to this bug.