Red Hat Bugzilla – Bug 181068
Review Request: html401-dtds - HTML 4.01 document type definitions
Last modified: 2007-11-30 17:11:23 EST
SRPM Name or Url: http://cachalot.mine.nu/4/SRPMS/html401-dtds-4.01-0.2.src.rpm
Note for reviewers: the packaging approach here has been discussed with various folks over past few years. The general ideas in it roughly follow the Debian SGML packaging draft policy, http://debian-xml-sgml.alioth.debian.org/sgml-policy/
%description improvements gracefully accepted :)
Ditto opinions whether the specification subpackage should be called -spec,
-doc, or -docs.
What are the differences with the policies followed by docbook-dtds and
docbook-dtds and xhtml1-dtds don't really seem to constitute a common policy. I
think the basic ideas are mostly the same in them, this package and the Debian
docbook-dtds AFAICT unnecessarily embeds full version-release strings into its
install dirs, making it hard for apps to point only to a specific version of the
dtds. It also sets SGMLDECL in a catalog that gets included in the global
catalog which is problematic because SGMLDECL (at least as implemented in
opensp) is global and the first one encountered will be used for _all_
subsequent matches -> breakage. Further, it registers itself to the private
catalogs of openjade -> most likely unneeded, needs changing between openjade
releases, and results in a build dependency loop.
xhtml1-dtds registers the dtds only to the XML catalogs (despite of installing
into sgml dirs) so it's somewhat different. It also embeds a date stamp (but
not version-release) in its install dirs but that's less problematic than in
docbook-dtds because there's a xmlcatalog in an unversioned path in
Both of the above install DTDs executable, but that's probably just a bug.
So in a nutshell, compared to those, this package uses a dir structure that is
very unlikely to need changing, installs a self-contained catalog to its own dir
below /usr/share/sgml, uses DTDDECLs instead of SGMLDECLs in order to not
interfere with (nor be confused by) other SGML dtd packages' SGML declarations,
+ bug fixes.
Yes, I intend to file some related bug reports in the future :) And I have also
HTML 2.0, 3.2, 4.0, ISO-HTML, XHTML Basic 1.0, XHTML 1.1, SMIL 2.0, and RSS
*-dtds packages more or less ready to roll, some of which will probably be
you can't mix sgml and xml resources in the XML catalogs. SGML definition
will generate fatal errors when loaded by an XML parser.
Anything reachable from /etc/xml/catalog must be XML only. So html-4... just
can share things with xhtml1. W.r.t. using /usr/share/sgml for the XML catalog
this is the unfortunate result of SGML nutheads blocking XML from the LSB
standard a few years ago, so Red Hat had to keep them there instead of a
far more logical /usr/share/xml subtree !
W.r.t. XHTML 1.1 and SMIL 2.0, they are extensible languages, i.e. the basic
language defined in the DTDs are supposed to be extended with foreign elements
in different namespaces, which is actually something which doesn't work with
DTDs, so the usefulness of shipping those 2 gets very limited, as DTD based
validation will just fail in general (but Relax-NG or XSD schemas would work
better though there isn't good ways to reference them for catalog access from
Please be very careful when trying to handle XML resources if you're competent
in SGML but not really aware of XML, this is a very different field with
very different rules and specifications, this has bitten us in the past hard
and I don't want this to happen again.
Daniel (libxml2 author and member of W3C XML Core Working Group)
(In reply to comment #4)
> you can't mix sgml and xml resources in the XML catalogs. SGML definition
> will generate fatal errors when loaded by an XML parser.
Of course. I don't know where you got the impression that someone/thing wanted
to do that. It would be nice if xhtml1-dtds would register the DTDs to the SGML
catalogs in addition to XML catalogs though (for example for use in SGML
validation tools and things that grok SGML but not XML catalogs), but that's a
bit offtopic for this submission.
Of course, the html401-dtds package deals with SGML catalogs only.
> Please be very careful when trying to handle XML resources if you're competent
> in SGML but not really aware of XML, this is a very different field with
> very different rules and specifications, this has bitten us in the past hard
> and I don't want this to happen again.
I don't claim to be an SGML nor XML god, but do have more than a little
experience with both.
> Daniel (libxml2 author and member of W3C XML Core Working Group)
Ville (member of the W3C QA Tools Development Team) ;)
Note that there is no garantee that SGML tools will handle XML correctly,
it will parse but for example they will probably misprocess xml:base or
xml:id , checking dependancies to those one a case by case basis before
pushing on both catalogs will be a good idea :-)
The package in general looks good to me.
I'd suggest either dropping the actual text of the specification entirely or
simply include it as %doc. Having a separate sub-package for this seems
Alternately I'd package up the spec and have the DTDs tag along (as they're a
normative part of the HTML 4.01 Recommendation).
The .ent(ity) files might beneficially be shipped by sgml-common -- since
they're common for HTML 2.0, 3.2, 4.0, and 4.01 -- and shared by the relevant
packages (cf. the comment at the top of the spec file).
For the %description I might use something along the lines of: [[[
Provides the three HTML 4.01 DTDs (strict, frameset, and transitional).
The DTDs are required for processing HTML 4.01 document instances using
SGML tools such as OpenSP, OpenJade, or SGMLSpm.
Well, the specification is already in the source tarball, so let's just package
it somewhere. I don't have that strong opinions on actually _where_, but FWIW,
the DTDs are about 30kB and the documentation is 370kB.
Note that *.ent are the same "only" in 4.0, 4.01, and ISO-HTML; not 2.0 or 3.2.
I'm not against including them in sgml-common (or let's say a hypothetical
html-common) package, but I think the approach here is good enough at least for
now, and doesn't require waiting for the FC sgml-common package to be updated,
possibly also for earlier distro versions etc. And including them here makes it
easier to maintain a self-contained catalog containing only the HTML 4.01 stuff
in /usr/share/sgml/ which is always nice to have.
The description sounds good to me, will adopt for the next package revision, due
out when other things discussed here have been agreed on.
* Sat Feb 25 2006 Ville SkyttÃ¤ <ville.skytta at iki.fi> - 4.01-0.3
- Improve description (#181068).
- Fold specification into main package as %doc (#181068).
* html401-dtds-4.01 seems odd. W3C specs have a Version (4.01) and each
edition of that version get tagged with the Date of publication (19991224).
"html-dtd-4.01-19991224" or "dtd-html-4.01-19991224" seem like saner
naming and versioning conventions. Dunno what that'd be in RPM spec terms
* In any case, the rec version (4.01) should in either he package name xor
the package version; a directory named "html401-dtds-4.01" just seems odd.
Other than that this looks good; I'd say push it as is if there isn't an easy
way to fix the versioning scheme nits.
401 needs to be in the package's name in order to make rpm and friends not
consider eg. the 4.01 DTDs an upgrade over let's say 3.2 DTDs, but to keep them
cleanly parallel installable.
The package's version number is intentionally "duplicated" so that it'll be kept
unchanged between package (and possibly spec) revisions so that the
documentation dir stays the same between possible package revisions for
/usr/share/doc/html401-dtds-$version/index.html bookmarkability; I don't want to
invent another version number. This also follows the practice of the existing
Including the date stamp to the package's release field should be no problem,
will do before requesting the first build unless there are other things that
-dtds vs -dtd: there are already xhtml1-dtds and docbook-dtds, I don't see a
reason to deviate from that.
w.r.t. 401, I don't think the W3C will ever release an updated version of the
SGML HTML DTDs. All new developments have been around XML anyway.
"Cast in stone"
Fully agreed, but I'm afraid there might be some confusion here. Daniel, could
you clarify if/what you imply in practical packaging terms with "w.r.t. 401"?
To me it reinforced the fact that it's fine to not put it as a version,
we don't expect any more versions.
I also put the number in the name for xhtml1-dtds for the reasons exposed in #11
Ok, so... how about a formal acceptance or an explicit list of blockers so we
could proceed here?
Well I think the idea is fine, I didn't checked closely the package, I'm
not really used to that. I don't know who takes the decision, in principle
it's fine by me.
* Sun Jun 18 2006 Ville SkyttÃ¤ <ville.skytta at iki.fi> - 4.01-19991224.1
- Include specification date in release field (#181068).
- Make doc symlinks relative.
Unfortunately I know little about SGML and can't really evaluate this package in
that context or test it. But I can evaluate it against the general set of
packaging guidelines and take Daniel's acceptance in comment 16 that it OK from
a SGML standpoint. Hopefully that's sufficient.
The package builds fine; rpmlint only complains about the license, which is OK.
It installs and uninstalls fine and the catalog in /etc/sgml is updated properly.
The only major issue I see is that there don't seem to be any dependencies on
/usr/bin/install-catalog or sgml-common for the scriptlets. Unless I'm
misunderstanding something, this is a blocker.
perl(File::Spec) is part of base perl, so technically don't need to BR: it
although it certainly isn't a problem to do so. (I know you know this; I only
add it for posterity.)
This isn't really code, but I can't imagine the "code not content" rule would
* package meets naming and packaging guidelines (given the
parallel-installability argument I see the need to put the version in the name).
* specfile is properly named, is cleanly written and uses macros consistently.
* dist tag is present.
* build root is correct.
* license field matches the actual license.
* license is open source-compatible. License text not included upstream.
* source files match upstream:
* latest version is being packaged.
* BuildRequires are proper.
* package builds in mock (development, x86_64).
O rpmlint is silent except for invalid license warning.
* final provides and requires are sane:
html401-dtds = 4.01-19991224.1.fc6
* no shared libraries are present.
* package is not relocatable.
* owns the directories it creates.
* doesn't own any directories it shouldn't.
* no duplicates in %files.
* file permissions are appropriate.
* %clean is present.
* %check is not present; test suite wouldn't make much sense.
X scriptlets present but don't seem to have appropriate dependencies.
O code, not content.
* documentation is small, so no -docs subpackage is necessary.
* %docs are not necessary for the proper functioning of the package (the links
go from the doc directory into /usr/share/sgml and not the other way around).
* no headers.
* no pkgconfig files.
* no libtool .la droppings.
* not a GUI app.
* Tue Jun 20 2006 Ville SkyttÃ¤ <ville.skytta at iki.fi> - 4.01-19991224.2
- Require install-catalog at post-install and pre-uninstall time (#181068).
Looks good to me; the only issue I had is fixed.
Imported, built for devel, and FC5 branch requested. Thanks!
Package Change Request
Package Name: html401-dtds
Updated Description: HTML 4.01 document type definitions
Remove reference to the spec from package description as the docs are no longer
shipped due to the W3C Documentation License being non-free:
Can you explain Comment #22, problem as well as effects on the package.
The W3C documentation is freely copiable what is the problem ?
See link in comment 22, the W3C Documentation License does not permit
modification of content licensed under it, making it "not okay" for Fedora. The
effect on this package is that the documentation will no longer be shipped in
the binary rpm. This is already implemented in the html401-dtds package in F7
The DTDs and related items will stay because the W3C Software (not
Documentation) License can be used for them, and that license is ok.