Description of problem: We have an increasing number of integration test runs which produce randomly failing jobs. This is tracked here: https://beaker-project.org/~dcallagh/dogfoodstats/known-issues.html The problem with this is, that: a) every contributor needs to be aware of the table of known failures b) reduces trust in our QA step which is the Jenkins run running all automated tests. c) a cumbersome effort to check logs and distinguish known issues from genuine test failures d) a cumbersome way to communicate the reasons of the known issues This bug is to look into these problems and find solutions on how we can address them in such a way that: a) a failure is associated with an introduced regression b) are not environment sensitive and therefore fragile Version-Release number of selected component (if applicable): develop How reproducible: not always Steps to Reproduce: 1. Run dogfood tests in Jenkins Actual results: Sometimes tests fail because of known issues Expected results: all pass without known-issues Additional info:
I suggest using the known-issues graph to prioritise those issues which are happening most frequently (that's why I made the graph). Each of the issues is an especially tricky problem with no obvious solution and will require a lot of debugging. If any of them were easily fixed we would have just fixed them already :-)