1183239 – beakerd kills the database when a recipe's hostRequires has <arch/> inside <or/>

Bug 1183239 - beakerd kills the database when a recipe's hostRequires has <arch/> inside <or/>

Summary: beakerd kills the database when a recipe's hostRequires has <arch/> inside <or/>

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Beaker
Classification:	Retired
Component:	scheduler
Sub Component:
Version:	19
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	19.3
Assignee:	Dan Callaghan
QA Contact:	tools-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-01-17 15:04 UTC by drohwer
Modified:	2018-02-06 00:41 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2015-03-17 06:34:19 UTC
Embargoed:

Attachments	(Terms of Use)

Comment 6 Bill Peck 2015-01-20 18:57:35 UTC

A little more info on this since it happened again Today and Darrell and I were able to investigate a little.  The following hostRequires seems to be the culprit.

            <hostRequires>
                <and>
                    <or>
                        <system>
                            <or>
                                <arch op="=" value="ppc64"/>
                                <arch op="=" value="s390x"/>
                                <arch op="=" value="ia64"/>
                            </or>
                        </system>
                        <not>
                            <or>
                                <system>
                                    <or>
                                        <hypervisor op="!=" value=""/>
                                        <name op="like" value="%sheep%"/>
                                    </or>
                                </system>
                                <cpu>
                                    <or>
                                        <model_name op="like" value="%irtual%"/>
                                        <processors op="&lt;" value="2"/>
                                    </or>
                                </cpu>
                                <key_value key="MODEL" op="like" value="%irtual%"/>
                            </or>
                        </not>
                    </or>
                </and>
                <system_type value="Machine"/>
            </hostRequires> 

While I think it can be simplified some the point is its possible to send a job/query to mysql that it can't handle, either because of temp disk space or cpu.

Any thoughts?

Comment 10 Michael Petlan 2015-01-20 21:41:37 UTC

We have been discussing the issue with Arjun. I am the developer of the MACHINE_SET__STRONG filter (= the XML snippet). I don't know much about the internal mechanism of Beaker, but I guess that the following is happening:

A job is submitted into Beaker with all the XML host filtering inside. It gets into a NEW state. Then it's being transferred into the QUEUED state and during that process the final list of possible machines for the job is created. I suppose that because when I see a NEW job in Beaker, there's nothing about the machines, but when the job becomes QUEUED, then there's an information like "Possible systems: 196".

So during the transfer to the QUEUED state the problematic query is done in order to get the list of machines which would fit to the job.

The XML snippet looks pretty complicated, yes, it's a bit more complex than usual filters are, but I don't think that it's the culprit. Although this filter is new, I created some similar filters which are also not simple. But because we used to schedule only one (two, three, four max) jobs at once, no problem has ever occured.

So I guess that the problem occured because Arjun needed to schedule tens of jobs, so they were scheduled, they were in NEW state and Beaker was trying to move them into the QUEUED state doing the database queries. And there were so many queries at once, that it got overloaded.

I think that a solution for that might be limiting (i.e. by semaphores) that activity in Beaker so not that many jobs wouldn't be moved from the NEW state to the QUEUED one *at once*. I used also the MACHINE_SET__STRONG filter when I was testing it, it worked without any problems.

-----------------------------------------------------------------------------

What should MACHINE_SET__STRONG do?

The problem:
There are many x86_64 machines in Beaker which are weak for running glibc tests. The weak machines are prefered somehow, so many glibc jobs get stuck or fail then.

The solution:
The MACHINE_SET__STRONG filter should avoid the slow machines -- so it should enable all s390x machines (they are all the same, right), all ppc64 machines (they are all POWERful) and select some subset of x86_64 machines which should consist of reasonably strong ones only. You may run a job using that filter and you will get one s390x, one ppc64 and one x86_64 machine, which would be strong enough.

Although it's maybe a bit more complicated than usual requirements are, it is logically correct, the XML is correct too, and similar XML snippets work well, or they at least didn't cause any problems because they were used one-at-once and not with tens of jobs at once.

----------------------------------------------------------------------------

This is an example of quite old and often used XML snippet, which hasn't caused any problem yet:

<not>
    <or>
        <system>
            <hypervisor op="!=" value=""/>
        </system>
        <cpu>
            <or>
                <model_name op="like" value="%irtual%"/>
                <family op="=" value="15"/>
                <and>
                    <family op="=" value="6"/>
                    <or>
                        <model op="=" value="15"/>
                        <model op="=" value="22"/>
                        <model op="=" value="13"/>
                    </or>
                </and>
                <and>
                    <model_name op="like" value="%AMD%"/>
                    <family op="&lt;" value="15"/>
                </and>
            </or>
        </cpu>
        <key_value key="MODEL" op="like" value="%ProLiant%"/>
        <key_value key="MODEL" op="like" value="%irtual%"/>
    </or>
</not>

Comment 12 Dan Callaghan 2015-01-21 05:37:02 UTC

(In reply to Michael Petlan from comment #10)
> We have been discussing the issue with Arjun. I am the developer of the
> MACHINE_SET__STRONG filter (= the XML snippet). I don't know much about the
> internal mechanism of Beaker, but I guess that the following is happening:
> 
> A job is submitted into Beaker with all the XML host filtering inside. It
> gets into a NEW state. Then it's being transferred into the QUEUED state and
> during that process the final list of possible machines for the job is
> created. I suppose that because when I see a NEW job in Beaker, there's
> nothing about the machines, but when the job becomes QUEUED, then there's an
> information like "Possible systems: 196".
> 
> So during the transfer to the QUEUED state the problematic query is done in
> order to get the list of machines which would fit to the job.

Right.

> The XML snippet looks pretty complicated, yes, it's a bit more complex than
> usual filters are, but I don't think that it's the culprit. Although this
> filter is new, I created some similar filters which are also not simple. But
> because we used to schedule only one (two, three, four max) jobs at once, no
> problem has ever occured.
> 
> So I guess that the problem occured because Arjun needed to schedule tens of
> jobs, so they were scheduled, they were in NEW state and Beaker was trying
> to move them into the QUEUED state doing the database queries. And there
> were so many queries at once, that it got overloaded.
> 
> I think that a solution for that might be limiting (i.e. by semaphores) that
> activity in Beaker so not that many jobs wouldn't be moved from the NEW
> state to the QUEUED one *at once*. I used also the MACHINE_SET__STRONG
> filter when I was testing it, it worked without any problems.

The scheduler only processes a single recipe at a time (that's another long sad story...) so the problem is not caused by submitting multiple jobs with this filter.

There are a few possible explanations why we might not have this before. MySQL's query optimizer is often quite surprising in its behaviour. We learn the hard way (bug 590723) that db01 has a setting, optimizer_search_depth = 6, which means that any query with more than 6 JOINs can pick a suboptimal query plan that will blow up the whole database. But the exact query plan chosen depends on many factors, including table sizes, which means that even just the growth of some table or subset of a table can alter the query execution. The other thing is that the SQL emitted by Beaker for a given recipe may change if Beaker's code is changed or if SQLAlchemy is upgraded (we went from 0.6 to 0.8 in September 2014 for example).

Anyway, I have successfully reproduced this on dbdevel.db.eng.bne so some further notes will follow...

Comment 13 Dan Callaghan 2015-01-21 05:49:49 UTC

The first query in comment 0 looked suspicious (cartesian product with system_arch_map 4 times). That's from recipe 1765676. The query being run is systems.count() in process_new_recipe, where systems is recipe.candidate_systems(only_in_lab=True). The whole query is not listed in the traceback but we can see what it would have been:

user = User.by_user_name(u'ashankar')
distro_tree = DistroTree.query.join(DistroTree.distro)\
        .filter(Distro.name == u'RHEL5-Server-U11')\
        .filter(DistroTree.arch == Arch.by_name(u'i386')).one()
systems = System.all(user)
host_filter = XmlHost.from_string('''
  <hostRequires>
    <and>
      <or>
        <system>
          <or>
            <arch op="=" value="ppc64"/>
            <arch op="=" value="s390x"/>
            <arch op="=" value="ia64"/>
          </or>
        </system>
        <not>
          <or>
            <system>
              <or>
                <hypervisor op="!=" value=""/>
                <name op="like" value="%sheep%"/>
              </or>
            </system>
            <cpu>
              <or>
                <model_name op="like" value="%irtual%"/>
                <processors op="&lt;" value="2"/>
              </or>
            </cpu>
            <key_value key="MODEL" op="like" value="%irtual%"/>
          </or>
        </not>
      </or>
    </and>
    <system_type value="Machine"/>
  </hostRequires>
''')
systems = host_filter.apply_filter(systems).filter(System.status == SystemStatus.automated)
systems = systems.filter(System.can_reserve(user))
systems = systems.filter(System.compatible_with_distro_tree(distro_tree))
systems = systems.filter(System.in_lab_with_distro_tree(distro_tree))
print systems.count()

That code is basically the contents of Recipe.candidate_systems with some branches simplified etc. That produces this lovely monster of a query:

2015-01-21 15:47:08,892 sqlalchemy.engine.base.Engine INFO SELECT count(*) AS count_1 
FROM (SELECT system.status AS system_status, system.id AS system_id, system.fqdn AS system_fqdn, system.serial AS system_serial, system.date_added AS system_date_added, system.date_modified AS system_date_modified, system.date_lastcheckin AS system_date_lastcheckin, system.location AS system_location, system.vendor AS system_vendor, system.model AS system_model, system.lender AS system_lender, system.owner_id AS system_owner_id, system.user_id AS system_user_id, system.type AS system_type, system.status_reason AS system_status_reason, system.deleted AS system_deleted, system.memory AS system_memory, system.checksum AS system_checksum, system.lab_controller_id AS system_lab_controller_id, system.mac_address AS system_mac_address, system.loan_id AS system_loan_id, system.loan_comment AS system_loan_comment, system.release_action AS system_release_action, system.reprovision_distro_tree_id AS system_reprovision_distro_tree_id, system.hypervisor_id AS system_hypervisor_id, system.kernel_type_id AS system_kernel_type_id 
FROM system_arch_map AS system_arch_map_1, system_arch_map AS system_arch_map_2, system_arch_map AS system_arch_map_3, system_arch_map AS system_arch_map_4, system LEFT OUTER JOIN lab_controller ON lab_controller.id = system.lab_controller_id LEFT OUTER JOIN system_access_policy ON system.id = system_access_policy.system_id LEFT OUTER JOIN hypervisor ON hypervisor.id = system.hypervisor_id INNER JOIN cpu ON system.id = cpu.system_id 
WHERE ((EXISTS (SELECT 1 
FROM system_access_policy_rule 
WHERE system_access_policy.id = system_access_policy_rule.policy_id AND system_access_policy_rule.permission = %s AND (%s = system_access_policy_rule.user_id OR system_access_policy_rule.user_id IS NULL AND system_access_policy_rule.group_id IS NULL OR system_access_policy_rule.group_id IN (%s)))) OR system.owner_id = %s OR system.loan_id = %s OR system.user_id = %s) AND (system.id = system_arch_map_1.system_id AND %s = system_arch_map_1.arch_id OR system.id = system_arch_map_2.system_id AND %s = system_arch_map_2.arch_id OR system.id = system_arch_map_3.system_id AND %s = system_arch_map_3.arch_id OR NOT (hypervisor.hypervisor IS NOT NULL OR system.fqdn LIKE %s OR cpu.model_name LIKE %s OR cpu.processors < %s OR (EXISTS (SELECT 1 
FROM key_value_string 
WHERE system.id = key_value_string.system_id AND %s = key_value_string.key_id AND key_value_string.key_value LIKE %s)))) AND system.type = %s AND system.status = %s AND ((EXISTS (SELECT 1 
FROM system_access_policy_rule 
WHERE system_access_policy.id = system_access_policy_rule.policy_id AND system_access_policy_rule.permission = %s AND (%s = system_access_policy_rule.user_id OR system_access_policy_rule.user_id IS NULL AND system_access_policy_rule.group_id IS NULL OR system_access_policy_rule.group_id IN (%s)))) OR system.owner_id = %s OR system.loan_id = %s) AND system.id = system_arch_map_4.system_id AND %s = system_arch_map_4.arch_id AND NOT (EXISTS (SELECT 1 
FROM exclude_osmajor 
WHERE system.id = exclude_osmajor.system_id AND %s = exclude_osmajor.osmajor_id AND %s = exclude_osmajor.arch_id)) AND NOT (EXISTS (SELECT 1 
FROM exclude_osversion 
WHERE system.id = exclude_osversion.system_id AND %s = exclude_osversion.osversion_id AND %s = exclude_osversion.arch_id)) AND (EXISTS (SELECT 1 
FROM distro_tree_lab_controller_map 
WHERE lab_controller.id = distro_tree_lab_controller_map.lab_controller_id AND %s = distro_tree_lab_controller_map.distro_tree_id))) AS anon_1
2015-01-21 15:47:08,892 sqlalchemy.engine.base.Engine INFO ('view', 2381L, 128L, 2381L, 2381L, 2381L, 5L, 7L, 3L, '%sheep%', '%irtual%', 2, 13L, '%irtual%', 'Machine', 'Automated', 'reserve', 2381L, 128L, 2381L, 2381L, 1L, 9L, 1L, 164L, 1L, 68222L)

Comment 14 Dan Callaghan 2015-01-21 05:59:10 UTC

The good news is that I can reproduce the temp table issue (out of disk space) on dbdevel with a relatively recent production database dump when I run the above query.

Even better news is that I can narrow it down a lot. The minimal reproducer is this, which reveals the problem:

user = User.by_user_name(u'ashankar')
systems = System.all(user)
host_filter = XmlHost.from_string('''
  <hostRequires>
    <and>
      <or>
        <system>
          <arch op="=" value="ia64"/>
        </system>
        <not>
          <system>
            <hypervisor op="!=" value=""/>
          </system>
        </not>
      </or>
    </and>
    <system_type value="Machine"/>
  </hostRequires>
''')
systems = host_filter.apply_filter(systems)
print systems.count()

2015-01-21 15:54:01,168 sqlalchemy.engine.base.Engine INFO SELECT count(*) AS count_1 
FROM (SELECT system.status AS system_status, system.id AS system_id, system.fqdn AS system_fqdn, system.serial AS system_serial, system.date_added AS system_date_added, system.date_modified AS system_date_modified, system.date_lastcheckin AS system_date_lastcheckin, system.location AS system_location, system.vendor AS system_vendor, system.model AS system_model, system.lender AS system_lender, system.owner_id AS system_owner_id, system.user_id AS system_user_id, system.type AS system_type, system.status_reason AS system_status_reason, system.deleted AS system_deleted, system.memory AS system_memory, system.checksum AS system_checksum, system.lab_controller_id AS system_lab_controller_id, system.mac_address AS system_mac_address, system.loan_id AS system_loan_id, system.loan_comment AS system_loan_comment, system.release_action AS system_release_action, system.reprovision_distro_tree_id AS system_reprovision_distro_tree_id, system.hypervisor_id AS system_hypervisor_id, system.kernel_type_id AS system_kernel_type_id 
FROM system_arch_map AS system_arch_map_1, system LEFT OUTER JOIN lab_controller ON lab_controller.id = system.lab_controller_id LEFT OUTER JOIN system_access_policy ON system.id = system_access_policy.system_id LEFT OUTER JOIN hypervisor ON hypervisor.id = system.hypervisor_id 
WHERE ((EXISTS (SELECT 1 
FROM system_access_policy_rule 
WHERE system_access_policy.id = system_access_policy_rule.policy_id AND system_access_policy_rule.permission = %s AND (%s = system_access_policy_rule.user_id OR system_access_policy_rule.user_id IS NULL AND system_access_policy_rule.group_id IS NULL OR system_access_policy_rule.group_id IN (%s)))) OR system.owner_id = %s OR system.loan_id = %s OR system.user_id = %s) AND (system.id = system_arch_map_1.system_id AND %s = system_arch_map_1.arch_id OR hypervisor.hypervisor IS NULL) AND system.type = %s) AS anon_1
2015-01-21 15:54:01,168 sqlalchemy.engine.base.Engine INFO ('view', 2381L, 128L, 2381L, 2381L, 2381L, 3L, 'Machine')

The issue is we are using a cartesian product (FROM system_arch_map, system) but the joining clause appears in one side of an OR. Therefore MySQL is resorting to computing the entire cartesian product on disk before filtering, because it's not smart enough to do better.

So the root cause is having <arch/> inside an <or/>, with some other non-arch filters as well.

Next question is, whether this is a regression or not and how hard it is to fix...

Comment 15 Dan Callaghan 2015-01-21 06:09:58 UTC

One interesting data point is that SQLAlchemy 0.6 produces a slightly different construct than 0.8 does in this case. The query itself is constructed the same, and it suffers the same problem (cartesian product with join condition not always applied) but instead of

SELECT count(*) FROM (SELECT <all columns> FROM ...)

it just produces

SELECT count(1) FROM system_arch_map, system...

which still produces the *wrong* result (36026451 instead of 70) but MySQL is able to successfully execute the query. Presumably its optimiser recognises that it only needs to count rows and therefore it avoids writing the entire cartesian product to disk.

So that may explain why we have scraped by without noticing this flaw with <arch/> in the past. As mentioned above, we upgraded beaker.engineering to SQLAlchemy 0.8 in September 2014 with the Beaker 0.18 upgrade.

Comment 16 Dan Callaghan 2015-01-21 06:16:45 UTC

So XmlArch in needpropertyxml.py is just using System.arch.contains() which seems right to me. At this point I would consider this a bug in the SQLAlchemy ORM.

This is enough to blow up:

ia64 = Arch.by_name(u'ia64')
print System.query.filter(or_(true(), System.arch.contains(ia64))).count()

2015-01-21 16:15:12,895 sqlalchemy.engine.base.Engine INFO SELECT count(*) AS count_1 
FROM (SELECT system.status AS system_status, system.id AS system_id, system.fqdn AS system_fqdn, system.serial AS system_serial, system.date_added AS system_date_added, system.date_modified AS system_date_modified, system.date_lastcheckin AS system_date_lastcheckin, system.location AS system_location, system.vendor AS system_vendor, system.model AS system_model, system.lender AS system_lender, system.owner_id AS system_owner_id, system.user_id AS system_user_id, system.type AS system_type, system.status_reason AS system_status_reason, system.deleted AS system_deleted, system.memory AS system_memory, system.checksum AS system_checksum, system.lab_controller_id AS system_lab_controller_id, system.mac_address AS system_mac_address, system.loan_id AS system_loan_id, system.loan_comment AS system_loan_comment, system.release_action AS system_release_action, system.reprovision_distro_tree_id AS system_reprovision_distro_tree_id, system.hypervisor_id AS system_hypervisor_id, system.kernel_type_id AS system_kernel_type_id 
FROM system, system_arch_map AS system_arch_map_1 
WHERE true OR system.id = system_arch_map_1.system_id AND %s = system_arch_map_1.arch_id) AS anon_1
2015-01-21 16:15:12,896 sqlalchemy.engine.base.Engine INFO (3L,)

Comment 17 Dan Callaghan 2015-01-21 06:29:52 UTC

Okay, seems it is actually a known issue with no solution, apart from "don't use .contains() with a secondary mapping" which is what we are doing here.

https://bitbucket.org/zzzeek/sqlalchemy/issue/2177/or-ed-relationcontains-filters-produce

Instead we need to use .any() which is slightly more awkward, but produces EXISTS with a subquery instead of a cartesian product with join condition.

2015-01-21 16:28:46,970 sqlalchemy.engine.base.Engine INFO SELECT count(*) AS count_1 
FROM (SELECT system.status AS system_status, system.id AS system_id, system.fqdn AS system_fqdn, system.serial AS system_serial, system.date_added AS system_date_added, system.date_modified AS system_date_modified, system.date_lastcheckin AS system_date_lastcheckin, system.location AS system_location, system.vendor AS system_vendor, system.model AS system_model, system.lender AS system_lender, system.owner_id AS system_owner_id, system.user_id AS system_user_id, system.type AS system_type, system.status_reason AS system_status_reason, system.deleted AS system_deleted, system.memory AS system_memory, system.checksum AS system_checksum, system.lab_controller_id AS system_lab_controller_id, system.mac_address AS system_mac_address, system.loan_id AS system_loan_id, system.loan_comment AS system_loan_comment, system.release_action AS system_release_action, system.reprovision_distro_tree_id AS system_reprovision_distro_tree_id, system.hypervisor_id AS system_hypervisor_id, system.kernel_type_id AS system_kernel_type_id 
FROM system LEFT OUTER JOIN lab_controller ON lab_controller.id = system.lab_controller_id LEFT OUTER JOIN system_access_policy ON system.id = system_access_policy.system_id LEFT OUTER JOIN hypervisor ON hypervisor.id = system.hypervisor_id 
WHERE ((EXISTS (SELECT 1 
FROM system_access_policy_rule 
WHERE system_access_policy.id = system_access_policy_rule.policy_id AND system_access_policy_rule.permission = %s AND (%s = system_access_policy_rule.user_id OR system_access_policy_rule.user_id IS NULL AND system_access_policy_rule.group_id IS NULL OR system_access_policy_rule.group_id IN (%s)))) OR system.owner_id = %s OR system.loan_id = %s OR system.user_id = %s) AND ((EXISTS (SELECT 1 
FROM system_arch_map, arch 
WHERE system.id = system_arch_map.system_id AND arch.id = system_arch_map.arch_id AND arch.id = %s)) OR hypervisor.hypervisor IS NULL) AND system.type = %s) AS anon_1
2015-01-21 16:28:46,970 sqlalchemy.engine.base.Engine INFO ('view', 2381L, 128L, 2381L, 2381L, 2381L, 3L, 'Machine')

Comment 19 Dan Callaghan 2015-01-21 07:33:42 UTC

http://gerrit.beaker-project.org/3942

Comment 20 Dan Callaghan 2015-02-12 05:09:57 UTC

Steps to reproduce:
1. Submit a job containing <hostRequires/> like the example in comment 13.
2. Check that Beaker schedules the job in a reasonable amount of time and nothing blows up.

Comment 25 Jeff Burke 2015-03-11 18:20:10 UTC

Dan,
 Do you know when Beaker 19.3 will be deploy internally? If it will be several weeks would it be possible to get a hotfix for 19.2?

Thanks in advance,
Jeff

Comment 28 Dan Callaghan 2015-03-17 06:34:19 UTC

Beaker 19.3 has been released.

Note You need to log in before you can comment on or make changes to this bug.