Bug 612227 - Validate XML before accepting
Summary: Validate XML before accepting
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Beaker
Classification: Retired
Component: scheduler
Version: 0.5
Hardware: All
OS: Linux
high
medium
Target Milestone: ---
Assignee: Raymond Mancy
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-07-07 15:48 UTC by Bill Peck
Modified: 2019-05-22 13:34 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-09-30 04:56:54 UTC
Embargoed:


Attachments (Terms of Use)
Patch: evaluate hostRequires at job submission time, to catch errors in XML (2.04 KB, patch)
2010-09-10 06:24 UTC, Dan Callaghan
no flags Details | Diff
Patch: add new elements to XSD (1.75 KB, patch)
2010-09-20 05:25 UTC, Dan Callaghan
no flags Details | Diff

Description Bill Peck 2010-07-07 15:48:26 UTC
Description of problem:
2010-07-07 11:42:17,909 bkr.server.tools.beakerd DEBUG Entering new_recipes routine
2010-07-07 11:42:17,953 bkr.server.tools.beakerd ERROR Failed to commit due to :invalid literal for int(): 5000MB

Comment 1 Bill Peck 2010-09-09 15:43:13 UTC
Hi Dan,

I've seen a couple of xml's come in which were invalid.  I would really like to see us validate the xml before accepting it.


I'm setting the target to future_maint for now since I don't know how difficult this will really be.  If you have any questions let me know.

Comment 2 Dan Callaghan 2010-09-10 00:37:12 UTC
We could probably achieve this by writing up an XSD and rejecting any jobs that don't validate. It would mean a lot of work creating and maintaining the XSD though (keeping it in sync with the various bits of XML that Beaker supports).

A better option might be to refactor the code such that we fully parse and evaluate the job XML before accepting it. I'll have a look today to see if we can do that.

Comment 4 Dan Callaghan 2010-09-10 06:24:53 UTC
Created attachment 446441 [details]
Patch: evaluate hostRequires at job submission time, to catch errors in XML

The attached patch would at least take care of the kind of problem Bill pasted above (invalid values in <hostRequires/>). I have a test case too, but it depends on the selenium-tests branch for test data setup.

I was looking for other parts of the job XML like <hostRequires/> where we defer parsing/evaluation until the job is scheduled, but I couldn't see anything else.

I think Marian's XSD would be useful, but we can't use it to reject jobs unless it's 100% correct. Maybe we could use the XSD as a warning in bkr client/web UI when submitting jobs -- if it fails validation we could show the validation errors to the user and ask if they really want to proceed. That might be useful for users who submit broken job XML and need help figuring out what's wrong; at present the error messages from Beaker are a bit cryptic in these cases.

Comment 5 Marian Csontos 2010-09-10 07:03:36 UTC
Well, that was the original idea - to use it as an advisor.

To keep XSD up-to-date: job # should be logged WHEN (user decides to proceed despite warnings and job parsing/scheduling succeeds).

Logging:
For now I do not want it to fill in a BZ, as there is a potential for many duplicates, and it may be difficult to pair them...
Simple message in beaker log (and I want to see them grepped from time to time.)

Comment 6 Bill Peck 2010-09-10 13:28:58 UTC
There are some bugs in Marian's xsd.  I'll try and get a patch together Today.

Comment 7 Dan Callaghan 2010-09-13 06:46:23 UTC
I have a patch here for the web UI to warn users when their job XML doesn't validate. I will sit on it until we can iron out the problems with the XSD.

I wrote a little script to run the XSD against existing jobs (using job.to_xml(clone=True)), it seems like the <kickstart/>, <partitions/>, and <ks_appends/> elements are new and needed to be added to the XSD.


import lxml.etree
import sqlalchemy
from bkr.server.util import load_config
load_config('dev.cfg')
from bkr.server.model import *

xsd = lxml.etree.XMLSchema(lxml.etree.parse('bkr/server/xsd/beaker-job.xsd'))

for job in Job.query():
    xml = job.to_xml(clone=True).toxml()
    if not xsd.validate(lxml.etree.fromstring(xml)):
        print xsd.error_log

Comment 8 Dan Callaghan 2010-09-20 05:25:35 UTC
Created attachment 448352 [details]
Patch: add new elements to XSD

The attached patch adds <kickstart/>, <ks_appends/>, and <partitions/> to the XSD. Bill/Marian, does this look right to you?

I couldn't find any other problems using my script in comment 7, but it doesn't run to completion because it leaks memory so the VM ends up swapping like mental and going nowhere :-(

Nevertheless, if you guys are happy with this then I will push my web UI changes to warn users when their job XML fails to validate. Should it be included in 0.5.58?

Comment 9 Dan Callaghan 2010-09-20 05:34:41 UTC
The other issue is where/how to serve up the XSD to users.

We'll need to include a copy of the XSD in beaker's source, for it to use when validating. So one possibility would be to make Beaker serve it up as a static file, e.g. <http://beaker.example.com/xsd/beaker-job.xsd>.

But maybe it would be better to have a single, permanent, publicly accessible location for the XSD (on the beaker web site)? That way its location doesn't depend on having Beaker deployed.

Either way, I like Marian's idea of using xs3p to turn the XSD into something human-readable, mainly because it all happens on the client side so we don't have to write any code. If we did want to ship it inside of Beaker, there is a clause in the license (3.7 Larger Works) which I think permits us to do so, but it might make the licensing issues simpler if we were to just host a copy of xs3p on the beaker site.

Comment 10 Marian Csontos 2010-09-20 06:12:19 UTC
Looks fine to me.

One Q: Does it work with CDATA in kickstart and ks_appends elements?

Comment 11 Dan Callaghan 2010-09-20 06:36:55 UTC
(In reply to comment #10)
> One Q: Does it work with CDATA in kickstart and ks_appends elements?

Yes it works with CDATA. My understanding is that the presence (or absence) of <![CDATA[]]> is only the concern of the parser; either way there are just "characters" in the document. So an element containing CDATA still validates as xs:string.

Comment 12 Dan Callaghan 2010-09-21 03:16:09 UTC
I've also added an XSD validation check to bkr job-submit. I figured we don't want to introduce a yes/no prompt ("Your job xml failed validation, are you sure you want to submit it?") because that might break people's scripts. For the same reason we probably don't want to abort the job submission when validation fails.

At present I've just got it printing a warning to stdout if the validation fails. Maybe we could also add an option to make validation errors abort the submission (--validate, or even more generally just --fatal-warnings). Is this worth adding?

Comment 13 Marian Csontos 2010-09-21 03:49:04 UTC
Why not stderr? In case of --debug it will mix with XML data.

Comment 14 Dan Callaghan 2010-09-21 03:54:05 UTC
(In reply to comment #13)
> Why not stderr? In case of --debug it will mix with XML data.

We currently print a line like

  Submitted: ['j:13713']

to stdout (unless it's a dry run) so I was just following the trend. :-)

But you're right, stderr makes more sense, I'll change it.

Comment 15 Dan Callaghan 2010-09-21 23:57:39 UTC
Pushed branch bz612227 for review. The interesting commits are:

* evaluate hostRequires at job submission time, to catch errors in XML
http://git.fedorahosted.org/git/?p=beaker.git;a=commitdiff;h=683d8eaafd5c37b2d59535e5c51042bdd8dd2f38

* warn users before accepting job XML which does not validate
http://git.fedorahosted.org/git/?p=beaker.git;a=commitdiff;h=4ca4cb901cf0cc896c4f409202a20345b6369574

* validate against XSD in bkr job-submit
http://git.fedorahosted.org/git/?p=beaker.git;a=commitdiff;h=24876d0c6960fded6195455acb73479993e9bfeb

Comment 16 Dan Callaghan 2010-09-27 01:27:33 UTC
Merged into develop branch.


Note You need to log in before you can comment on or make changes to this bug.