Description of problem: Some of our database schema changes (eg adding a column to HTextFlowTarget) now take over 30 minutes against production-sized databases. Sometimes there may be more than one such change in a Zanata release, and it is only going to become worse as the database grows. We need to switch to a database which can cope with these changes without requiring long outages at upgrade time. Options to consider: 1. PostgreSQL (or other SQL database) - PostgreSQL should be able to match all of MySQL's important features, but with extra features. Includes support for adding NULL columns quickly. Also better locking models, transactions for DDL, support for JSON columns... 2. MongoDB (or other NoSQL database) - "schemaless" (with the attendant runtime complexity) but not ACID compliant. Probably not sufficient by itself for all our database needs, but might be suitable for storing text (eg HTextFlow/Target). Other considerations: a. We need to know how this will affect runtime performance for queries and updates (eg uploading a document). b. We would need to migrate all our production instances from MySQL to PostgreSQL, and provide the means (scripts, instructions) for other instances (eg openstack.org) to be migrated too.
You might find this useful to help make a decision: https://www.youtube.com/watch?v=KtmjkCuV-EU
(In reply to Luke Brooker from comment #1) > You might find this useful to help make a decision: > https://www.youtube.com/watch?v=KtmjkCuV-EU Is there a short version, or a transcript? That's a 30 minute video.
(In reply to Sean Flanigan from comment #2) I don't think I could do it justice with my own summary, but here is an article with similar ideas: https://blog.rotenberg.io/million-user-webchat-with-full-stack-flux-react-redis-and-postgresql/
And to keep this all in one place: http://blog.confluent.io/2015/03/04/turning-the-database-inside-out-with-apache-samza/
This bug is not about switching the Zanata back-end to React, Flux and JavaScript, but if that's meant as a vote for PostgreSQL and a bit of a plug for immutable datastores or event sourcing, that's fine by me. If we want to do more sophisticated things (eg materialised views) with our database, PostgreSQL certainly looks to be a much better platform than MySQL.
Yeah, I know, just thinking about the future as well. But yes it's mainly to discuss ideas around immutable datastores, event sourcing, materialised views etc.
I think we need to start making of initial investigation into this and make a few tasks before we decide on if we are moving to PostgreSQL. Things to investigate: 1) how much to update for our liquibase changes for PostgreSQL 2) performance and queries as suggested by Sean. 3) scripts for migrating existing data into PostgreSQL. (I'm sure it's not as hard for this) 4) Does our infrastructure provides PostgreSQL DB? (just to make sure its available)
PostgreSQL is known to be a much more robust database than MySQL. The fact that the internal Bugzilla team managed to make the switch and get a performance boost should tell us something. Zanata might not currently have the same amount of records as Bugzilla in any of its instances, but if it helps reduce update time when changing columns and such, maybe its a good enough reason. +1 from me
https://github.com/zanata/zanata-server/pull/755 is partly about adding tests to ensure that our liquibase changesets work on H2 (at least on an empty database). I think ensuring our changesets work on H2 (again) would make it a little easier to get our changesets working on PostgreSQL (or another database).
+1 on this.
Just a note that RHEL 7 includes PostgreSQL 9.2, which includes support for JSON and range types: https://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.2#JSON_datatype (JSONB support with new indexing features comes in PostgreSQL 9.4.)
Migrated; check JIRA for bug status: http://zanata.atlassian.net/browse/ZNTA-129