Description of problem: Ran into an unexpected parser error running xmllint with --valid on an XML file (or XML-like; not sure if it's well-formed) with a particular header; got it down to this reproducer: $ cat test-1.xml <?xml version="1.0" encoding="utf-8" standalone="no"?> <!DOCTYPE greeting PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" ""> <greeting>Hello, world!</greeting> $ xmllint --noout test-1.xml $ echo $? 0 $ xmllint --noout --valid test-1.xml test-1.xml:1: parser error : parsing XML declaration: '?>' expected <?xml version="1.0" encoding="utf-8" standalone="no"?> ^ test-1.xml:2: parser error : Content error in the external subset <!DOCTYPE greeting PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" ""> ^ test-1.xml:2: parser error : Content error in the external subset <!DOCTYPE greeting PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" ""> ^ Trying without the DOCTYPE decl: $ cat test-2.xml <?xml version="1.0" encoding="utf-8" standalone="no"?> <greeting>Hello, world!</greeting> $ xmllint --noout test-2.xml $ echo $? 0 $ xmllint --noout --valid test-2.xml test-2.xml:2: validity error : Validation failed: no DTD found ! <greeting>Hello, world!</greeting> Version-Release number of selected component (if applicable): libxml2-2.6.26-2.1.2 How reproducible: 100% Is test-1.xml well-formed? If so, this is a bug; if not, the error is confusing (why is complaining about the "standalone", rather than the DOCTYPE?)
It's actually not a bug ! But understanding why is a bit nasty ... It all boils down to the SYSTEM identifier you used in the DTD declaration: "" , i.e. an empty string. http://www.w3.org/TR/REC-xml/#NT-doctypedecl and http://www.w3.org/TR/REC-xml/#NT-ExternalID are the production gouverning the DOCTYPE. ------------------------ Definition: The SystemLiteral is called the entity's system identifier. It is meant to be converted to a URI reference (as defined in [IETF RFC 3986]), ------------------------ Now if you go back to RFC 3986] and al. you will find that "" is a correct URI-Reference meaning "the current document containing the reference", so basically you're asking the XML parser to use your XML as the DTD, this leads to a fatal error, the first one being that for an external parsed entity the XML declaration doesn't allow a 'standalone' part. Of course if you don't specify validation, the DTD is not fetched/parsed and the document appears well-formed. The real solution is to follow the indications from XHTML1 and *never* put an empty system identifier, but follow the practice indicated on the spec itself for conformance to the specification: http://www.w3.org/TR/xhtml1/#strict and if you don't want network fetch the solution is to ask xmllint to forbid them with --nonet Definitely not a bug, the XML is actually not well-formed but that can be detected only in validating mode ! Daniel