I found this in the activity records for a system: +---------+---------+---------------------+-----------------+------------+-----------+---------+-----------+-----------+ | id | user_id | created | type | field_name | service | action | old_value | new_value | +---------+---------+---------------------+-----------------+------------+-----------+---------+-----------+-----------+ | 6888810 | 1657 | 2012-12-20 12:21:56 | system_activity | Status | Scheduler | Changed | Automated | Broken | | 6888812 | 1657 | 2012-12-20 12:21:56 | system_activity | Status | Scheduler | Changed | Automated | Broken | +---------+---------+---------------------+-----------------+------------+-----------+---------+-----------+-----------+ I'm still not sure how this can happen, since beaker-provision is supposed to serialize handling power commands for a given system, and the failure reporting (which triggers mark_broken) is part of that. So this may indicate some kind of bug in beaker-provision. In any case, the server should also guard against this by doing the update atomically. Aside from the obvious side-effects which aren't too bad (duplicate entries in system history, duplicate breakage notifications sent to the owner), this also breaks the invariant of the system_status_durations table there is always exactly one "open" row (with NULL finish_time) per system. That is a more insidious problem which can cause failures much later down the track, for example in reporting code. +-------+-----------+-----------+---------------------+---------------------+ | id | system_id | status | start_time | finish_time | +-------+-----------+-----------+---------------------+---------------------+ | 23592 | 265 | Broken | 2012-12-20 12:21:56 | NULL | | 23593 | 265 | Broken | 2012-12-20 12:21:56 | 2012-12-20 15:47:16 | | 23596 | 265 | Automated | 2012-12-20 15:47:16 | NULL | +-------+-----------+-----------+---------------------+---------------------+
Can this be closed in light of 951309?
(In reply to comment #5) Bug 951309 was the cause, but I still think the server needs to protect the invariant that only one system_status_duration row has a NULL finish time. We can do it fairly easily, using the same pattern as we have been using for the other similar tables (conditional UPDATE with a row count check). It should have been done that way from the start.
On Gerrit: http://gerrit.beaker-project.org/1911
Beaker 0.13.1 has been released.