Our development instance of Beaker is under-resourced and so it typically swaps when running createrepo after a task is uploaded. As a result, the task uploading process can take several minutes. I guess that a database transaction is held open during that time, because I noticed this exception from bkr job-submit at the same time as I was uploading a new task: Exception: <Fault 1: "<class 'sqlalchemy.exc.OperationalError'>:(OperationalError) (1205, 'Lock wait timeout exceeded; try restarting transaction') 'INSERT INTO recipe_task (recipe_id, task_id, start_time, finish_time, result, status, role) VALUES (%s, %s, %s, %s, %s, %s, %s)' (572L, 6L, None, None, 'New', 'New', 'STANDLONE')">
Hmm, I wonder if this could be the culprit for the current gaps in the metrics reporting (the ones that don't correspond to beakerd actually falling over). I noted the assumption about being called in a transaction when I last refactored that code, but I missed the fact that it may impact the whole web server because we call it inline from the web UI code. http://git.beaker-project.org/cgit/beaker/tree/Server/bkr/server/model.py?h=develop#n6579 Yet another operation that we really should be handling asynchronously, but the error handling for doing that in this case is going to be rather painful.
Closing this issue. We are not planning to address this problem in the Beaker development lifecycle. Instead of that, we are planning to continue our effort in building Beaker.NEXT. If you have any questions, feel free to reach out to me. Best regards, Martin Styk