Red Hat Bugzilla – Bug 1207980
Split up large Liquibase changesets to avoid partial updates
Last modified: 2015-07-21 22:20:15 EDT
Description of problem:
Several of the changesets in 3.7 can take a very long time to run. (That is a problem in itself, for another time.) If the long operation is cancelled (eg appserver times out), mysql finishes the current change and autocommits it, but the DATABASECHANGELOGLOCK is never cleaned up. Eventually, the admin will have to clear the lock, but if the changeset includes multiple changes, Liquibase will attempt the first change again, and fail. Recovering from this is quite slow and painful, especially for anyone who isn't familiar with Liquibase or Zanata.
I suggest that we replace the affected changesets with multiple smaller changesets:
- one change per changeset.
- using a precondition to skip the change if it has already been applied.
Also, we may be able to speed up the changes overall by making multiple alterations to the same table in a single "change" (eg adding several indexes or columns to the same table).
Finally, we should log a message before beginning each large changeset, so that the admin will have some idea why Zanata is taking a long time to start.
Bug 1207994 might help with the logging. (Or if we have to write a simple logging extension, better to use the latest API.)
Assigning to pahuang, since most of the affected changesets are his.
I recommend we move the replacements for changeset email@example.com:3 (adding columns to HTextFlowTarget/History) to appear before the replacements for firstname.lastname@example.org:2 and email@example.com:3 (adding indexes to HTextFlowTarget/History). Adding columns before indexes should save the database from having to redo the indexing work.
When it comes to writing the preconditions for the split-up changesets, I recommend using <not><changeSetExecuted></not>. This should be fairly quick to execute. Preconditions which check for the existence of schema elements (eg <foreignKeyConstraintExists>) can be very slow in Liquibase, because at least some of them are implemented using database snapshots, which are quite slow in some circumstances.
I originally suggested adding multiple indexes to a single table in one changeset as a potential optimisation. This can be faster on MySQL 5.5 (or 5.1 with an updated InnoDB plugin), but I don't think Liquibase will structure the SQL to take advantage of it, and it could lead to problems on MySQL 5.1 without the plugin. So I think we should just stick to the rule: one change per changeset.
As for the logging problem, Liquibase doesn't seem to have any good hooks for this sort of logging. Liquibase itself logs a message before each changeset, but the level is DEBUG, and the category is "liquibase", so DEBUG level will include far too many irrelevant messages. I'll see if I can make an extension which will help.
I have added an extension for pre-changeset logging as part of https://github.com/zanata/zanata-server/pull/773
Reassigning to aeng, since the only changeset which is really, really slow is db.changelog-3.7.xml::3::firstname.lastname@example.org.
We definitely need this fixed for 3.7, but it really needs to be split up as soon as possible, because it really wastes a lot of time when the changeset fails part-way through for whatever reason. (In my case, I was trying to get Zanata working on WildFly 9 and didn't have the longer deployment timeout configured.)
Reminder: the "add column" changesets should be moved above the "add index" changesets, to save the database from indexing twice.
VERIFIED with Zanata 3.7.0-SNAPSHOT (git-server-3.6.2-425-g01da786)