| Summary: | cannot submit job with encoding in XML declaration | ||
|---|---|---|---|
| Product: | [Retired] Beaker | Reporter: | Matěj Cepl <mcepl> |
| Component: | scheduler | Assignee: | Dan Callaghan <dcallagh> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |
| Severity: | low | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 0.7 | CC: | bpeck, dcallagh, mcsontos, mishin, rmancy, stl |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2012-10-11 23:49:50 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Matěj Cepl
2011-12-15 22:24:46 UTC
You can work around this by removing encoding="utf-8" from your XML declaration. The reason it happens is that Beaker's web UI is fully Unicode-aware and treats all submitted form data as UTF-8. So by the time the Beaker code sees your submitted job XML it has already been decoded as UTF-8 character data. Then lxml sees the encoding declaration and considers it to be an error. Thanks, that looks good ... I was afraid there is something broken with the namespace declaration. I don't insist on encoding declaration. This should be fixed but with the lowest priority. Well, I tried <?xml version="1.0"?> <job xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://fedorahosted.org/beaker/job.xsd" retention_tag="scratch"> <whiteboard> cairo on x86_64 </whiteboard> <recipeSet priority="Normal"> and got an error: Job failed schema validation. Please confirm that you want to submit it. It could be overcome by sheer persistence, but still it is not right. So I think there are two issues here: 1. The encoding in XML declaration makes lxml complain, as described in comment 1. The fix is probably to either: * re-encode the submitted XML as UTF-8 and then pass the raw bytes to the XML parser and let it decode them again according to the encoding in the XML declaration (defaulting to UTF-8 if there is no declaration); or * use a regex to strip out encoding="UTF-8" (if it's present) before parsing, under the assumption that nobody would try submitting XML with any other encoding. 2. The xsi:noNamespaceSchemaLocation attribute violates the RELAX NG schema as a superfluous attribute. We could add this to the schema as an allowed attribute. Matěj, I would also point out that that XSD is not maintained (and is no longer reachable). For that reason I'm not sure if it's worth fixing the issue with xsi:noNamespaceSchemaLocation. You can use the RELAX NG schema from here instead: http://beaker-project.org/schema/beaker-job.rng This is the same schema which the server uses internally when validating submitted jobs. (In reply to comment #4) > 1. The encoding in XML declaration makes lxml complain, as described in > comment 1. The fix is probably to either: > * re-encode the submitted XML as UTF-8 and then pass the raw bytes to the > XML parser and let it decode them again according to the encoding in the XML > declaration (defaulting to UTF-8 if there is no declaration); or Okay, I discovered while testing that xml.sax explodes if you pass it a unicode object with non-ASCII chars (which we currently do), so this option is the right one. Even if it is inefficient re-encoding and re-decoding all over the place. (In reply to comment #5) > Okay, I discovered while testing that xml.sax explodes if you pass it a > unicode object with non-ASCII chars For future reference the exception is: 2012-09-05 14:57:49,977 bkr.server.xmlrpccontroller ERROR Error handling XML-RPC method Traceback (most recent call last): File "/home/dcallagh/work/beaker/Server/bkr/server/xmlrpccontroller.py", line 54, in RPC2 response = self.process_rpc(method,params) File "/home/dcallagh/work/beaker/Server/bkr/server/xmlrpccontroller.py", line 43, in process_rpc response = obj(*params) File "<string>", line 3, in upload File "/usr/lib/python2.6/site-packages/turbogears/identity/conditions.py", line 249, in require return fn(self, *args, **kwargs) File "/home/dcallagh/work/beaker/Server/bkr/server/jobs.py", line 240, in upload xml = xmltramp.parse(jobxml) File "/usr/lib/python2.6/site-packages/xmltramp.py", line 259, in parse return seed(StringIO(text)) File "/usr/lib/python2.6/site-packages/xmltramp.py", line 254, in seed parser.parse(fileobj) File "/usr/lib64/python2.6/site-packages/_xmlplus/sax/expatreader.py", line 109, in parse xmlreader.IncrementalParser.parse(self, source) File "/usr/lib64/python2.6/site-packages/_xmlplus/sax/xmlreader.py", line 123, in parse self.feed(buffer) File "/usr/lib64/python2.6/site-packages/_xmlplus/sax/expatreader.py", line 216, in feed self._parser.Parse(data, isFinal) UnicodeEncodeError: 'ascii' codec can't encode characters in position 122-124: ordinal not in range(128) On Gerrit: http://gerrit.beaker-project.org/1321 (In reply to comment #4) > 1. The encoding in XML declaration makes lxml complain, as described in > comment 1. The fix is probably to either: ? switch to xml.elementtree (or some other XML parser in the standard library ... like we have not enough of them there) from XMLTramp? I mean, I am always very suspicious about using 3rd party libs when the standard one works well (and apparently 3rd party ones are even not fully standard compliant). > Matěj, I would also point out that that XSD is not maintained (and is no > longer reachable). For that reason I'm not sure if it's worth fixing the > issue with xsi:noNamespaceSchemaLocation. My point with XSD was that it is used by Eclipse XML editors (I don't about any Eclipse XML plugin which would be actively using RNG) and some people may use Eclipse. (not me anymore, I am now fully in lovely arms of vim and omni completion apparently could work only with DTD anyway :( ... http://vimdoc.sourceforge.net/htmldoc/insert.html#ft-xml-omni) (In reply to comment #8) > (In reply to comment #4) > > 1. The encoding in XML declaration makes lxml complain, as described in > > comment 1. The fix is probably to either: > > ? switch to xml.elementtree (or some other XML parser in the standard > library ... like we have not enough of them there) from XMLTramp? I mean, I > am always very suspicious about using 3rd party libs when the standard one > works well (and apparently 3rd party ones are even not fully standard > compliant). The bug here is not in any of the XML parsers but in how Beaker is calling them. Take a look at the patch in comment 7 if you're curious. And for what it's worth, while testing this bug I found a bug in xml.dom.minidom (in the stdlib) where lxml works fine :-) http://bugs.python.org/issue15877 > My point with XSD was that it is used by Eclipse XML editors (I don't about > any Eclipse XML plugin which would be actively using RNG) and some people > may use Eclipse. > > (not me anymore, I am now fully in lovely arms of vim and omni completion > apparently could work only with DTD anyway :( ... > http://vimdoc.sourceforge.net/htmldoc/insert.html#ft-xml-omni) We investigated XSD, DTD, and RELAX NG a few years ago and the conclusion was that RELAX NG was the only schema format capable of representing Beaker's job XML. None of the others have an equivalent to RELAX NG's <interleave/>. But we are now getting way off topic for this bug... (In reply to comment #10) > And for what it's worth, while testing this bug I found a bug in > xml.dom.minidom (in the stdlib) where lxml works fine :-) > http://bugs.python.org/issue15877 MiniDOM is the least maintained standard XML parser and on the etch of being deprecated, it is there mostly because of legacy reasons (http://thread.gmane.org/gmane.comp.python.devel/127963). > We investigated XSD, DTD, and RELAX NG a few years ago and the conclusion > was that RELAX NG was the only schema format capable of representing > Beaker's job XML. None of the others have an equivalent to RELAX NG's > <interleave/>. I completely agree ... RNG is by far the best schema standard. My only point was that Eclipse (which I used when filing this bug) was able to use only XML Schema for its XML Editor. > But we are now getting way off topic for this bug... We are. I'm able to successfully submit a job with the following xml: <?xml version="1.0"?> <job xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" product="cpe:/o:redhat:enterprise_linux:5:update8" retention_tag="audit"> As noted the xsi attribute it not supported. Beaker 0.9.4 has been released. |