Bug 768167

Summary: cannot submit job with encoding in XML declaration
Product: [Retired] Beaker Reporter: Matěj Cepl <mcepl>
Component: schedulerAssignee: Dan Callaghan <dcallagh>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: low Docs Contact:
Priority: unspecified    
Version: 0.7CC: bpeck, dcallagh, mcsontos, mishin, rmancy, stl
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-10-11 23:49:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Matěj Cepl 2011-12-15 22:24:46 UTC
When my job control XML starts with 

<?xml version="1.0" encoding="utf-8"?>
<job xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:noNamespaceSchemaLocation="http://fedorahosted.org/beaker/job.xsd"
	retention_tag="scratch">

(which is good for Eclipse XML editor), beaker (in https://beaker.engineering.redhat.com/jobs/clone page) rejects such XML file with "Failed to import job because of: Unicode strings with encoding declaration are not supported." It shouldn't.

Comment 1 Dan Callaghan 2011-12-18 22:46:23 UTC
You can work around this by removing encoding="utf-8" from your XML declaration.

The reason it happens is that Beaker's web UI is fully Unicode-aware and treats all submitted form data as UTF-8. So by the time the Beaker code sees your submitted job XML it has already been decoded as UTF-8 character data. Then lxml sees the encoding declaration and considers it to be an error.

Comment 2 Matěj Cepl 2011-12-19 10:24:15 UTC
Thanks, that looks good ... I was afraid there is something broken with the namespace declaration. I don't insist on encoding declaration.

This should be fixed but with the lowest priority.

Comment 3 Matěj Cepl 2011-12-19 20:37:40 UTC
Well, I tried

<?xml version="1.0"?>
<job xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:noNamespaceSchemaLocation="http://fedorahosted.org/beaker/job.xsd"
	retention_tag="scratch">
	<whiteboard>
		cairo on x86_64
	</whiteboard>
	<recipeSet priority="Normal">

and got an error:

Job failed schema validation. Please confirm that you want to submit it.

It could be overcome by sheer persistence, but still it is not right.

Comment 4 Dan Callaghan 2012-09-04 22:38:08 UTC
So I think there are two issues here:

1. The encoding in XML declaration makes lxml complain, as described in comment 1. The fix is probably to either:
* re-encode the submitted XML as UTF-8 and then pass the raw bytes to the XML parser and let it decode them again according to the encoding in the XML declaration (defaulting to UTF-8 if there is no declaration); or
* use a regex to strip out encoding="UTF-8" (if it's present) before parsing, under the assumption that nobody would try submitting XML with any other encoding.

2. The xsi:noNamespaceSchemaLocation attribute violates the RELAX NG schema as a superfluous attribute. We could add this to the schema as an allowed attribute.

Matěj, I would also point out that that XSD is not maintained (and is no longer reachable). For that reason I'm not sure if it's worth fixing the issue with xsi:noNamespaceSchemaLocation.

You can use the RELAX NG schema from here instead:

http://beaker-project.org/schema/beaker-job.rng

This is the same schema which the server uses internally when validating submitted jobs.

Comment 5 Dan Callaghan 2012-09-05 05:11:05 UTC
(In reply to comment #4)
> 1. The encoding in XML declaration makes lxml complain, as described in
> comment 1. The fix is probably to either:
> * re-encode the submitted XML as UTF-8 and then pass the raw bytes to the
> XML parser and let it decode them again according to the encoding in the XML
> declaration (defaulting to UTF-8 if there is no declaration); or

Okay, I discovered while testing that xml.sax explodes if you pass it a unicode object with non-ASCII chars (which we currently do), so this option is the right one. Even if it is inefficient re-encoding and re-decoding all over the place.

Comment 6 Dan Callaghan 2012-09-05 05:20:47 UTC
(In reply to comment #5)
> Okay, I discovered while testing that xml.sax explodes if you pass it a
> unicode object with non-ASCII chars 

For future reference the exception is:

2012-09-05 14:57:49,977 bkr.server.xmlrpccontroller ERROR Error handling XML-RPC method
Traceback (most recent call last):
  File "/home/dcallagh/work/beaker/Server/bkr/server/xmlrpccontroller.py", line 54, in RPC2
    response = self.process_rpc(method,params)
  File "/home/dcallagh/work/beaker/Server/bkr/server/xmlrpccontroller.py", line 43, in process_rpc
    response = obj(*params)
  File "<string>", line 3, in upload
  File "/usr/lib/python2.6/site-packages/turbogears/identity/conditions.py", line 249, in require
    return fn(self, *args, **kwargs)
  File "/home/dcallagh/work/beaker/Server/bkr/server/jobs.py", line 240, in upload
    xml = xmltramp.parse(jobxml)
  File "/usr/lib/python2.6/site-packages/xmltramp.py", line 259, in parse
    return seed(StringIO(text))
  File "/usr/lib/python2.6/site-packages/xmltramp.py", line 254, in seed
    parser.parse(fileobj)
  File "/usr/lib64/python2.6/site-packages/_xmlplus/sax/expatreader.py", line 109, in parse
    xmlreader.IncrementalParser.parse(self, source)
  File "/usr/lib64/python2.6/site-packages/_xmlplus/sax/xmlreader.py", line 123, in parse
    self.feed(buffer)
  File "/usr/lib64/python2.6/site-packages/_xmlplus/sax/expatreader.py", line 216, in feed
    self._parser.Parse(data, isFinal)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 122-124: ordinal not in range(128)

Comment 7 Dan Callaghan 2012-09-05 06:16:52 UTC
On Gerrit: http://gerrit.beaker-project.org/1321

Comment 8 Matěj Cepl 2012-09-05 14:50:03 UTC
(In reply to comment #4)
> 1. The encoding in XML declaration makes lxml complain, as described in
> comment 1. The fix is probably to either:

? switch to xml.elementtree (or some other XML parser in the standard library ... like we have not enough of them there) from XMLTramp? I mean, I am always very suspicious about using 3rd party libs when the standard one works well (and apparently 3rd party ones are even not fully standard compliant).

> Matěj, I would also point out that that XSD is not maintained (and is no
> longer reachable). For that reason I'm not sure if it's worth fixing the
> issue with xsi:noNamespaceSchemaLocation.

My point with XSD was that it is used by Eclipse XML editors (I don't about any Eclipse XML plugin which would be actively using RNG) and some people may use Eclipse.

(not me anymore, I am now fully in lovely arms of vim and omni completion apparently could work only with DTD anyway :( ... http://vimdoc.sourceforge.net/htmldoc/insert.html#ft-xml-omni)

Comment 10 Dan Callaghan 2012-09-07 07:48:30 UTC
(In reply to comment #8)
> (In reply to comment #4)
> > 1. The encoding in XML declaration makes lxml complain, as described in
> > comment 1. The fix is probably to either:
> 
> ? switch to xml.elementtree (or some other XML parser in the standard
> library ... like we have not enough of them there) from XMLTramp? I mean, I
> am always very suspicious about using 3rd party libs when the standard one
> works well (and apparently 3rd party ones are even not fully standard
> compliant).

The bug here is not in any of the XML parsers but in how Beaker is calling them. Take a look at the patch in comment 7 if you're curious.

And for what it's worth, while testing this bug I found a bug in xml.dom.minidom (in the stdlib) where lxml works fine :-)
http://bugs.python.org/issue15877

> My point with XSD was that it is used by Eclipse XML editors (I don't about
> any Eclipse XML plugin which would be actively using RNG) and some people
> may use Eclipse.
> 
> (not me anymore, I am now fully in lovely arms of vim and omni completion
> apparently could work only with DTD anyway :( ...
> http://vimdoc.sourceforge.net/htmldoc/insert.html#ft-xml-omni)

We investigated XSD, DTD, and RELAX NG a few years ago and the conclusion was that RELAX NG was the only schema format capable of representing Beaker's job XML. None of the others have an equivalent to RELAX NG's <interleave/>. But we are now getting way off topic for this bug...

Comment 11 Matěj Cepl 2012-09-07 07:58:21 UTC
(In reply to comment #10)
> And for what it's worth, while testing this bug I found a bug in
> xml.dom.minidom (in the stdlib) where lxml works fine :-)
> http://bugs.python.org/issue15877

MiniDOM is the least maintained standard XML parser and on the etch of being deprecated, it is there mostly because of legacy reasons (http://thread.gmane.org/gmane.comp.python.devel/127963).

> We investigated XSD, DTD, and RELAX NG a few years ago and the conclusion
> was that RELAX NG was the only schema format capable of representing
> Beaker's job XML. None of the others have an equivalent to RELAX NG's
> <interleave/>.

I completely agree ... RNG is by far the best schema standard. My only point was that Eclipse (which I used when filing this bug) was able to use only XML Schema for its XML Editor.

> But we are now getting way off topic for this bug...

We are.

Comment 13 Bill Peck 2012-10-02 13:47:20 UTC
I'm able to successfully submit a job with the following xml:

<?xml version="1.0"?>
<job xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" product="cpe:/o:redhat:enterprise_linux:5:update8" retention_tag="audit">

As noted the xsi attribute it not supported.

Comment 14 Dan Callaghan 2012-10-11 23:49:50 UTC
Beaker 0.9.4 has been released.