Bug 1205945 - RFE: Replace Zanata's MySQL database (eg with PostgreSQL)
Summary: RFE: Replace Zanata's MySQL database (eg with PostgreSQL)
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Zanata
Classification: Retired
Component: DatabaseChange, Performance, Maintainability
Version: development
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Damian Jansen
QA Contact: Zanata-QA Mailling List
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-03-26 00:37 UTC by Sean Flanigan
Modified: 2015-07-29 02:09 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-07-29 02:09:36 UTC
Embargoed:


Attachments (Terms of Use)

Description Sean Flanigan 2015-03-26 00:37:58 UTC
Description of problem:

Some of our database schema changes (eg adding a column to HTextFlowTarget) now take over 30 minutes against production-sized databases.  Sometimes there may be more than one such change in a Zanata release, and it is only going to become worse as the database grows.  

We need to switch to a database which can cope with these changes without requiring long outages at upgrade time.


Options to consider:

1. PostgreSQL (or other SQL database) - PostgreSQL should be able to match all of MySQL's important features, but with extra features.  Includes support for adding NULL columns quickly.  Also better locking models, transactions for DDL, support for JSON columns...

2. MongoDB (or other NoSQL database) - "schemaless" (with the attendant runtime complexity) but not ACID compliant.  Probably not sufficient by itself for all our database needs, but might be suitable for storing text (eg HTextFlow/Target).


Other considerations:

a. We need to know how this will affect runtime performance for queries and updates (eg uploading a document).

b. We would need to migrate all our production instances from MySQL to PostgreSQL, and provide the means (scripts, instructions) for other instances (eg openstack.org) to be migrated too.

Comment 1 Luke Brooker 2015-03-26 01:40:21 UTC
You might find this useful to help make a decision:
https://www.youtube.com/watch?v=KtmjkCuV-EU

Comment 2 Sean Flanigan 2015-03-26 03:15:43 UTC
(In reply to Luke Brooker from comment #1)
> You might find this useful to help make a decision:
> https://www.youtube.com/watch?v=KtmjkCuV-EU

Is there a short version, or a transcript?  That's a 30 minute video.

Comment 3 Luke Brooker 2015-03-26 03:19:33 UTC
(In reply to Sean Flanigan from comment #2)

I don't think I could do it justice with my own summary, but here is an article with similar ideas:
https://blog.rotenberg.io/million-user-webchat-with-full-stack-flux-react-redis-and-postgresql/

Comment 4 Luke Brooker 2015-03-26 03:33:08 UTC
And to keep this all in one place:

http://blog.confluent.io/2015/03/04/turning-the-database-inside-out-with-apache-samza/

Comment 5 Sean Flanigan 2015-03-26 03:46:09 UTC
This bug is not about switching the Zanata back-end to React, Flux and JavaScript, but if that's meant as a vote for PostgreSQL and a bit of a plug for immutable datastores or event sourcing, that's fine by me.  If we want to do more sophisticated things (eg materialised views) with our database, PostgreSQL certainly looks to be a much better platform than MySQL.

Comment 6 Luke Brooker 2015-03-26 03:50:40 UTC
Yeah, I know, just thinking about the future as well.

But yes it's mainly to discuss ideas around immutable datastores, event sourcing, materialised views etc.

Comment 7 Alex Eng 2015-03-26 03:55:51 UTC
I think we need to start making of initial investigation into this and make a few tasks before we decide on if we are moving to PostgreSQL.

Things to investigate:
1) how much to update for our liquibase changes for PostgreSQL
2) performance and queries as suggested by Sean.
3) scripts for migrating existing data into PostgreSQL. (I'm sure it's not as hard for this)
4) Does our infrastructure provides PostgreSQL DB? (just to make sure its available)

Comment 8 Carlos Munoz 2015-03-29 03:43:33 UTC
PostgreSQL is known to be a much more robust database than MySQL. The fact that the internal Bugzilla team managed to make the switch and get a performance boost should tell us something. Zanata might not currently have the same amount of records as Bugzilla in any of its instances, but if it helps reduce update time when changing columns and such, maybe its a good enough reason.

+1 from me

Comment 9 Sean Flanigan 2015-03-30 07:03:01 UTC
https://github.com/zanata/zanata-server/pull/755 is partly about adding tests to ensure that our liquibase changesets work on H2 (at least on an empty database).  

I think ensuring our changesets work on H2 (again) would make it a little easier to get our changesets working on PostgreSQL (or another database).

Comment 10 Alex Eng 2015-03-30 21:26:26 UTC
+1 on this.

Comment 11 Sean Flanigan 2015-04-15 07:13:33 UTC
Just a note that RHEL 7 includes PostgreSQL 9.2, which includes support for JSON and range types: 
  https://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.2#JSON_datatype

(JSONB support with new indexing features comes in PostgreSQL 9.4.)

Comment 12 Zanata Migrator 2015-07-29 02:09:36 UTC
Migrated; check JIRA for bug status: http://zanata.atlassian.net/browse/ZNTA-129


Note You need to log in before you can comment on or make changes to this bug.