Red Hat Bugzilla – Bug 612227
Validate XML before accepting
Last modified: 2015-05-03 21:36:58 EDT
Description of problem:
2010-07-07 11:42:17,909 bkr.server.tools.beakerd DEBUG Entering new_recipes routine
2010-07-07 11:42:17,953 bkr.server.tools.beakerd ERROR Failed to commit due to :invalid literal for int(): 5000MB
I've seen a couple of xml's come in which were invalid. I would really like to see us validate the xml before accepting it.
I'm setting the target to future_maint for now since I don't know how difficult this will really be. If you have any questions let me know.
We could probably achieve this by writing up an XSD and rejecting any jobs that don't validate. It would mean a lot of work creating and maintaining the XSD though (keeping it in sync with the various bits of XML that Beaker supports).
A better option might be to refactor the code such that we fully parse and evaluate the job XML before accepting it. I'll have a look today to see if we can do that.
Created attachment 446441 [details]
Patch: evaluate hostRequires at job submission time, to catch errors in XML
The attached patch would at least take care of the kind of problem Bill pasted above (invalid values in <hostRequires/>). I have a test case too, but it depends on the selenium-tests branch for test data setup.
I was looking for other parts of the job XML like <hostRequires/> where we defer parsing/evaluation until the job is scheduled, but I couldn't see anything else.
I think Marian's XSD would be useful, but we can't use it to reject jobs unless it's 100% correct. Maybe we could use the XSD as a warning in bkr client/web UI when submitting jobs -- if it fails validation we could show the validation errors to the user and ask if they really want to proceed. That might be useful for users who submit broken job XML and need help figuring out what's wrong; at present the error messages from Beaker are a bit cryptic in these cases.
Well, that was the original idea - to use it as an advisor.
To keep XSD up-to-date: job # should be logged WHEN (user decides to proceed despite warnings and job parsing/scheduling succeeds).
For now I do not want it to fill in a BZ, as there is a potential for many duplicates, and it may be difficult to pair them...
Simple message in beaker log (and I want to see them grepped from time to time.)
There are some bugs in Marian's xsd. I'll try and get a patch together Today.
I have a patch here for the web UI to warn users when their job XML doesn't validate. I will sit on it until we can iron out the problems with the XSD.
I wrote a little script to run the XSD against existing jobs (using job.to_xml(clone=True)), it seems like the <kickstart/>, <partitions/>, and <ks_appends/> elements are new and needed to be added to the XSD.
from bkr.server.util import load_config
from bkr.server.model import *
xsd = lxml.etree.XMLSchema(lxml.etree.parse('bkr/server/xsd/beaker-job.xsd'))
for job in Job.query():
xml = job.to_xml(clone=True).toxml()
if not xsd.validate(lxml.etree.fromstring(xml)):
Created attachment 448352 [details]
Patch: add new elements to XSD
The attached patch adds <kickstart/>, <ks_appends/>, and <partitions/> to the XSD. Bill/Marian, does this look right to you?
I couldn't find any other problems using my script in comment 7, but it doesn't run to completion because it leaks memory so the VM ends up swapping like mental and going nowhere :-(
Nevertheless, if you guys are happy with this then I will push my web UI changes to warn users when their job XML fails to validate. Should it be included in 0.5.58?
The other issue is where/how to serve up the XSD to users.
We'll need to include a copy of the XSD in beaker's source, for it to use when validating. So one possibility would be to make Beaker serve it up as a static file, e.g. <http://beaker.example.com/xsd/beaker-job.xsd>.
But maybe it would be better to have a single, permanent, publicly accessible location for the XSD (on the beaker web site)? That way its location doesn't depend on having Beaker deployed.
Either way, I like Marian's idea of using xs3p to turn the XSD into something human-readable, mainly because it all happens on the client side so we don't have to write any code. If we did want to ship it inside of Beaker, there is a clause in the license (3.7 Larger Works) which I think permits us to do so, but it might make the licensing issues simpler if we were to just host a copy of xs3p on the beaker site.
Looks fine to me.
One Q: Does it work with CDATA in kickstart and ks_appends elements?
(In reply to comment #10)
> One Q: Does it work with CDATA in kickstart and ks_appends elements?
Yes it works with CDATA. My understanding is that the presence (or absence) of <![CDATA]> is only the concern of the parser; either way there are just "characters" in the document. So an element containing CDATA still validates as xs:string.
I've also added an XSD validation check to bkr job-submit. I figured we don't want to introduce a yes/no prompt ("Your job xml failed validation, are you sure you want to submit it?") because that might break people's scripts. For the same reason we probably don't want to abort the job submission when validation fails.
At present I've just got it printing a warning to stdout if the validation fails. Maybe we could also add an option to make validation errors abort the submission (--validate, or even more generally just --fatal-warnings). Is this worth adding?
Why not stderr? In case of --debug it will mix with XML data.
(In reply to comment #13)
> Why not stderr? In case of --debug it will mix with XML data.
We currently print a line like
to stdout (unless it's a dry run) so I was just following the trend. :-)
But you're right, stderr makes more sense, I'll change it.
Pushed branch bz612227 for review. The interesting commits are:
* evaluate hostRequires at job submission time, to catch errors in XML
* warn users before accepting job XML which does not validate
* validate against XSD in bkr job-submit
Merged into develop branch.