Bug 66179 - Command nsgmls/onsgmls returns errors for valid xhtml1 multibyte document
Summary: Command nsgmls/onsgmls returns errors for valid xhtml1 multibyte document
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: opensp
Version: 9
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Ondrej Vasik
QA Contact: Brock Organ
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2002-06-05 21:16 UTC by Sam Steingold
Modified: 2009-01-06 12:32 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-01-06 12:32:20 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Sam Steingold 2002-06-05 21:16:00 UTC
Description of Problem:
When I try to validate an XHTML page, I get this:

$ nsgmls -E10 -s -wxml -c /usr/share/sgml/html.soc foo.html 
nsgmls:<URL>http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent:25:21:E: "402" is
not a character number in the document character set
nsgmls:<URL>http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent:29:21:E: "913" is
not a character number in the document character set
nsgmls:<URL>http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent:30:21:E: "914" is
not a character number in the document character set
nsgmls:<URL>http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent:31:21:E: "915" is
not a character number in the document character set
nsgmls:<URL>http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent:33:21:E: "916" is
not a character number in the document character set
nsgmls:<URL>http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent:35:21:E: "917" is
not a character number in the document character set
nsgmls:<URL>http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent:36:21:E: "918" is
not a character number in the document character set
nsgmls:<URL>http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent:37:21:E: "919" is
not a character number in the document character set
nsgmls:<URL>http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent:38:21:E: "920" is
not a character number in the document character set
nsgmls:<URL>http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent:40:21:E: "921" is
not a character number in the document character set
nsgmls:I: maximum number of errors (10) reached; change with -E option
$

the page starts with

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
          "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

the error message appear to indicate that nsgmls does not support multibyte
characters.
AFAIK, openjade does have multibyte support 
please re-build it with that support included.

Version-Release number of selected component (if applicable):
1.3.1-4

How Reproducible:
always

Steps to Reproduce:
1. cat > foo.html <<EOF
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
          "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">
</html>
EOF
2. nsgmls -E10 -s -wxml -c /usr/share/sgml/html.soc foo.html 


Actual Results:
errors

Expected Results:
the document should validate

Additional Information:
see comp.text.sgml:
From: "William F. Hammond" <hammond.albany.edu>

My guess is that you are using a version of SP built without enabling
multi-byte characters.  AFAIK, multi-byte with SP is still limited to
two bytes, which, long term, is a problem.  If, however, the charset
for a document instance is plane 0, a multi-byte build should work
with XHTML provided that SP sees its private version of the SGML
declaration pertaining to all XML document types.  One way to provide
this might be to make that declaration the SGMLDECL for the catalog
that is hard-wired into your build.

                             -- Bill

Comment 1 Sam Steingold 2002-06-07 14:59:22 UTC
it appears that the problem is in a different area:
when I install the files from
<http://www.htmlhelp.org/tools/validator/offline/sgml-lib.tar.gz>
(which should all be in the sgml-common package!)
everything works just fine.

Comment 2 Tim Waugh 2002-06-19 10:19:34 UTC
Where exactly do you install them?

Comment 3 Sam Steingold 2002-06-19 14:27:19 UTC
/usr/share/sgml with (possibly) appropriate modifications of catalogs in /etc/sgml
(just like all the stuff in sgml-common)

Comment 4 Bill Nottingham 2006-08-05 05:30:17 UTC
Red Hat apologizes that these issues have not been resolved yet. We do want to
make sure that no important bugs slip through the cracks.

Red Hat Linux 7.3 and Red Hat Linux 9 are no longer supported by Red Hat, Inc.
They are maintained by the Fedora Legacy project (http://www.fedoralegacy.org/)
for security updates only. If this is a security issue, please reassign to the
'Fedora Legacy' product in bugzilla. Please note that Legacy security update
support for these products will stop on December 31st, 2006.

If this is not a security issue, please check if this issue is still present
in a current Fedora Core release. If so, please change the product and version
to match, and check the box indicating that the requested information has been
provided.

If you are currently still running Red Hat Linux 7.3 or 9, please note that
Fedora Legacy security update support for these products will stop on December
31st, 2006. You are strongly advised to upgrade to a current Fedora Core release
or Red Hat Enterprise Linux or comparable. Some information on which option may
be right for you is available at http://www.redhat.com/rhel/migrate/redhatlinux/.

Any bug still open against Red Hat Linux 7.3 or 9 at the end of 2006 will be
closed 'CANTFIX'. Again, if this bug still exists in a current release, or is a
security issue, please change the product as necessary. We thank you for your
help, and apologize again that we haven't handled these issues to this point.


Comment 5 Sam Steingold 2006-08-05 13:09:03 UTC
here is what I see for a file with non-ASCII characters (utf-8)
onsgmls -s -e -g -c/usr/share/sgml/xml.soc index.html
onsgmls:index.html:184:33:E: non SGML character number 159
onsgmls:index.html:184:33: open elements: html body[1] (p[1])
onsgmls:index.html:184:39:E: non SGML character number 129
onsgmls:index.html:184:39: open elements: html body[1] (p[1])
onsgmls:index.html:184:47:E: non SGML character number 129
onsgmls:index.html:184:47: open elements: html body[1] (p[1])
onsgmls:index.html:184:49:E: non SGML character number 130
onsgmls:index.html:184:49: open elements: html body[1] (p[1])
onsgmls:index.html:184:56:E: non SGML character number 159
onsgmls:index.html:184:56: open elements: html body[1] (p[1])
onsgmls:index.html:184:58:E: non SGML character number 128
onsgmls:index.html:184:58: open elements: html body[1] (p[1])
onsgmls:index.html:184:62:E: non SGML character number 130
onsgmls:index.html:184:62: open elements: html body[1] (p[1])
onsgmls:index.html:184:64:E: non SGML character number 143
onsgmls:index.html:184:64: open elements: html body[1] (p[1])
onsgmls:index.html:184:68:E: non SGML character number 131
onsgmls:index.html:184:68: open elements: html body[1] (p[1])
onsgmls:index.html:184:70:E: non SGML character number 130
onsgmls:index.html:184:70: open elements: html body[1] (p[1])
onsgmls:index.html:184:74:E: non SGML character number 143
onsgmls:index.html:184:74: open elements: html body[1] (p[1])
onsgmls:index.html:184:77:E: non SGML character number 159
onsgmls:index.html:184:77: open elements: html body[1] (p[1])
onsgmls:index.html:184:81:E: non SGML character number 131
onsgmls:index.html:184:81: open elements: html body[1] (p[1])
onsgmls:index.html:184:83:E: non SGML character number 130
onsgmls:index.html:184:83: open elements: html body[1] (p[1])

Comment 6 Ondrej Vasik 2008-01-20 20:53:40 UTC
Changing product version to rawhide as it is still broken in devel branch and
FC-5 is EOL. Changing summary text as OpenSP/OpenJade IS compiled with multibyte
support- so summary is wrong. Problem is most probably in missing or not
registered xhtml DTD's/stylesheets catalogs - so most probably wrong component
as well - but I will check it. 

Comment 7 Bug Zapper 2008-05-14 01:54:50 UTC
Changing version to '9' as part of upcoming Fedora 9 GA.
More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 8 David Tardon 2009-01-04 17:29:51 UTC
There must be some environments variables set before one can process XML files in non-ASCII encodings:

  export SP_CHARSET_FIXED=yes SP_ENCODING=xml

should fix the problem. It's described in /usr/share/doc/opensp-1.5.2/xml.htm , with more details about character sets, encodings and such stuff in /usr/share/doc/opensp-1.5.2/charset.htm .

Comment 9 Ondrej Vasik 2009-01-06 12:32:20 UTC
Additionally - you are validating xml with html.soc - you should use xml.soc for validating. 
Command "nsgmls -E10 -s -wxml -c /usr/share/sgml/xml.soc foo.html" will proceed correctly (if you finish the xhtml file correctly). Closing NOTABUG - feel free to clarify add comments, if you are not satisfied with the resolution.


Note You need to log in before you can comment on or make changes to this bug.