Bug 1293007
Summary: | [RFE] enforce a server-side limit on number of results in a recipe | |||
---|---|---|---|---|
Product: | [Retired] Beaker | Reporter: | Dan Callaghan <dcallagh> | |
Component: | general | Assignee: | Dan Callaghan <dcallagh> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | tools-bugs <tools-bugs> | |
Severity: | urgent | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 21 | CC: | dcallagh, dowang, mjia, rjoost | |
Target Milestone: | 23.0 | Keywords: | FutureFeature, Patch | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Enhancement | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1358557 (view as bug list) | Environment: | ||
Last Closed: | 2016-07-07 23:12:25 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: |
Description
Dan Callaghan
2015-12-19 02:12:37 UTC
We hit this problem once before already, and it seems I already filed a bug about the issues with beaker-transfer specifically... however this seems like a better general solution which we should implement too. It turned out in this particular case, the large number of results were *not* due to an accidental infinite loop in the task, but were in fact intentional. So I'm no longer totally sure that this is a worthwhile idea, and if we do enforce a limit what number it should be. Probably more important that we fix up the issues preventing Beaker from properly handling such a large recipe in the first place (bug 1293011, bug 1240809). I used the following query (in Teiid against our production database) to get an idea of the upper end of the distribution of results-count-per-recipe. (The constraint on recipe.id > 1600000 is just to get the query to complete in a reasonable amount of time without being killed.) SELECT recipe.id, COUNT(recipe_task_result.id) result_count FROM Beaker.recipe INNER JOIN recipe_task ON recipe_task.recipe_id = recipe.id INNER JOIN recipe_task_result ON recipe_task_result.recipe_task_id = recipe_task.id WHERE recipe.id > 1600000 GROUP BY recipe.id ORDER BY result_count DESC LIMIT 20; id | result_count ---------+-------------- 2373965 | 31779 2373967 | 31457 2044046 | 27067 2030795 | 21781 1962366 | 16639 1970679 | 14669 1622138 | 14002 2197682 | 11539 2066693 | 10655 2057292 | 9763 2373966 | 9111 2104047 | 7995 2104046 | 7938 2060648 | 7429 2070229 | 6659 1966377 | 6408 1617373 | 6283 2011777 | 6081 2060453 | 5954 1822492 | 5425 (20 rows) And it turns out there are only 23 recent recipes with more than 5000 results. Of those, 10 are deleted and 13 are not. Two of them were deleted by us because of the problems they caused with beaker-transfer in December and July. Looking at the recipes which aren't deleted: SELECT recipe.id, COUNT(recipe_task_result.id) result_count FROM Beaker.recipe INNER JOIN recipe_task ON recipe_task.recipe_id = recipe.id INNER JOIN recipe_task_result ON recipe_task_result.recipe_task_id = recipe_task.id INNER JOIN recipe_set ON recipe.recipe_set_id = recipe_set.id INNER JOIN job ON recipe_set.job_id = job.id WHERE recipe.id > 1600000 AND job.deleted IS NULL GROUP BY recipe.id HAVING COUNT(recipe_task_result.id) > 5000 ORDER BY result_count DESC; id | result_count ---------+-------------- 2044046 | 27067 2030795 | 21781 1970679 | 14669 2197682 | 11539 2066693 | 10655 2057292 | 9763 2104047 | 7995 2104046 | 7938 2070229 | 6659 2011777 | 6081 2060453 | 5954 2015907 | 5250 2065544 | 5079 (13 rows) it seems the only reason they have survived without crippling beaker-transfer is because many of their results have no logs. I'm not sure how that's possible since rhts-report-result always sends a test log, but I guess it must be some bug in the system under test (e.g. segfaulting Python) which also caused the infinite loop of results reporting. The first one in the above list which doesn't fit that pattern is recipe 2104047 with 7995 results. It appears to be a performance test run which intentionally loops many times. Probably the only reason it didn't kill beaker-transfer is because there is only one log for most of the results, not two, since the avc_log is absent. So it seems like a generous limit on number of results per recipe would be 10,000. This would allow all existing use cases we have seen. But that limit is still high enough to kill beaker-transfer and beaker-log-delete as they stand now (assuming 2 logs per result) so enforcing it seems a bit pointless. A more conservative limit would be 5000 which is enough for almost every recipe we have seen recently on our production Beaker, *and* is low enough that beaker-transfer and beaker-log-delete in their current form can actually handle it. Another option might be to enforce a maximum number of logs per recipe, rather than results per recipe. Logs are generally costlier for Beaker to store and handle so that might be a nicer approach. Either way the limit would need to be configurable so that we can easily tweak it if we get the number wrong initially. http://gerrit.beaker-project.org/4572 I went with (configurable) limits of 7500 results per recipe and 7500 result logs per recipe. That should permit all legitimate testing (based on the maximum numbers in our production db so far) while still avoiding recipes that are large enough to start crashing things as we have seen in bug 1240809 and elsewhere. We have a couple of bugs related to handling very large recipes and it probably makes sense to verify them all in one go, by submitting a job on beaker-devel which produces a similarly large recipe. I will think up a good way to craft one and post it here. Based on those test jobs, we can see that the results limiting is working as expected. However the log limiting is effectively defeated by beah (I guess in an error handling path somewhere) which uploads the logs against the task instead of the result when the result upload fails. So for completeness we probably need to apply the limit to the total number of task logs *and* result logs. The other thing is that Beaker should probably record a warning against the task when it hits the limits, otherwise it could just show up as fully Passing with no hint that some results are missing. http://gerrit.beaker-project.org/4598 enforce limit on task logs as well as result logs http://gerrit.beaker-project.org/4599 record a warning result if limits are reached Beaker 23.0 has been released. |