Bug 1069438 - upgraded beaker schema does not match new beaker schema, due to column 'default' vs. 'server_default'
Summary: upgraded beaker schema does not match new beaker schema, due to column 'defau...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Beaker
Classification: Retired
Component: general
Version: 0.15
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: 19.0
Assignee: Dan Callaghan
QA Contact: tools-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-02-25 02:38 UTC by Raymond Mancy
Modified: 2018-02-06 00:41 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-11-25 07:18:50 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1082452 0 unspecified CLOSED Update developer guide to talk about the use of 'default' in sqla's Column() definition. 2021-02-22 00:41:40 UTC

Internal Links: 1082452

Description Raymond Mancy 2014-02-25 02:38:35 UTC
Description of problem:

The 'default' keyword, passed to Column(), does not modify the schema of the Column in the database. We have quite likely erroneously updated the schema of these tables, during upgrade, to have defaults on the columns.

This is a problem because new installations and existing installations may have different schemas.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Add a new column and use the 'default' keyword.
2. use init_db() to initialise a new database
3.

Actual results:

I think previously we may have expected it to modify the schema

Expected results:

We shouldn't

Additional info:


See, http://docs.sqlalchemy.org/en/rel_0_6/core/schema.html?highlight=schema%20column#sqlalchemy.schema.ColumnDefault.


We should either look at updating 'server' to 'server_default', or the alternative of undoing upgrade notes that explicitly modify the schema, or both.

Comment 2 Dan Callaghan 2014-02-25 05:10:39 UTC
Some of the column defaults should be evaluated in Python land only (for example the ones that have default=datetime.utcnow) but all the others were intended to have a database-level default, and we assumed that when we wrote the upgrade notes. So I think any column with a default is affected.

Here is an example, recipe_task.status. On a new schema:

  `status` enum('New','Processed','Queued','Scheduled','Waiting','Running','Completed','Cancelled','Aborted') NOT NULL,

On an upgraded one:

  `status` enum('New','Processed','Queued','Scheduled','Waiting','Running','Completed','Cancelled','Aborted') NOT NULL default 'New',

Comment 3 Raymond Mancy 2014-02-25 07:17:01 UTC
(In reply to Dan Callaghan from comment #2)
> Some of the column defaults should be evaluated in Python land only (for
> example the ones that have default=datetime.utcnow) but all the others were
> intended to have a database-level default, and we assumed that when we wrote
> the upgrade notes. So I think any column with a default is affected.
> 

Why should datetime.utcnow() be a special case ?

Is this because of lag between the INSERT clause being generated in python,
and the the row actually being inserted?

To me it seems much more straight forward to rely (if possible) on the DB generating the default for this as well

> Here is an example, recipe_task.status. On a new schema:
> 
>   `status`
> enum('New','Processed','Queued','Scheduled','Waiting','Running','Completed',
> 'Cancelled','Aborted') NOT NULL,
> 
> On an upgraded one:
> 
>   `status`
> enum('New','Processed','Queued','Scheduled','Waiting','Running','Completed',
> 'Cancelled','Aborted') NOT NULL default 'New',

Comment 4 Dan Callaghan 2014-02-26 01:19:32 UTC
The problem with generating unpredictable defaults on the server side is that SQLAlchemy then has to do an extra roundtrip to fetch back the value that was generated on INSERT.

It has a whole bunch of features in recent versions to make this possible using RETURNING but MySQL does not support that. So I think we will either end up with the datetime missing in Python until we refresh the object, or SQLAlchemy will generate an extra SELECT just to fetch it after every insert. Neither is particularly nice.

Comment 5 Dan Callaghan 2014-03-25 23:14:06 UTC
Another reason we may want to avoid using server_default: adding or changing a server_default means a database schema upgrade, which means it cannot be done in a maintenance release. Changing a server_default to a Python-level default also needs a schema upgrade for the same reason.

Comment 6 Dan Callaghan 2014-03-25 23:17:08 UTC
(In reply to Dan Callaghan from comment #4)

We found that when server_default is used, SQLAlchemy will always issue a SELECT immediately after INSERT to fetch back the value in the newly inserted row. That applies even when the default is a literal value which SQLAlchemy already knows. So that means we incur an extra database roundtrip for every INSERT which has a server_default column.

The good news is that SQLAlchemy handles this transparently so we do not need to worry about attributes being unpopulated.

Comment 7 Raymond Mancy 2014-03-31 03:26:48 UTC
My main (well, my only) concern about using 'default', over 'server_default' is that it adds complexity. I think it adds complexity in the follow ways:

1) It's "normal" to define known default values in the SQL table. If Beaker decides to adopt a way that is not "normal" due to specific features of our ORM, than that's something that not just us beaker devs need to keep in the back of our heads, but new devs, and anyone else wanting to troubleshoot issues or add patches. (By normal I just mean the way that I think most web apps work. There are some that use a lot of SQL'isms like stored procedures etc, but I think this is less common).

2) We need to add extra statements when issuing update commands commands for the schema. Normally we would only have to do the following (which also I think is more intuitive for the reasons ourlined in #1):

    ALTER TABLE test ADD column_a INT DEFAULT 5 NOT NULL

With just 'server' we need to remember to do the following (untested):

    ALTER TABLE test ADD column_a INT;
    UPDATE test SET column_a = 5; 
    ALTER TABLE test MODIFY column_a INT NOT NULL;
However, if we do forget to do it correctly no one will be any the wiser, but we will potentially have different Beaker installations with different schemas. That's currently the case due to our ignorance of how 'default' actually works (not as we expected).

Don't get me wrong, the above problems are not terminal and I think we could work with them, if we thought it was preferable.

The main reasons why we would want to use 'default' are pre optimisations, and limited added flexibility in upgrading minor versions. This would be limited to where we want to add/change/remove a default value of an existing column (although even not always, as we may want to update all the existing rows which means we need to run SQL commands).

I think in some circumstances the optimisations may be worth it, like when we are making a very high frequency of INSERT statements that use a default. I can't think of any current examples off the top of my head that would satisfy this. However, if we did feel we really needed a 'default' generated in python, we then break our 100% consistency of using 'server_default'

Comment 8 Dan Callaghan 2014-03-31 04:44:55 UTC
So either way, new developers joining Beaker have something to learn, and we have to put something in dev guide: either to indicate that we are using server_default (which is contrary to all the examples in the SQLAlchemy docs and probably 95% of the code out in the wild), or to indicate that we are using 'default' and therefore not to put defaults in the DDL.

For 'default' the dev guideline can be quite simple and consistent: "Default values for columns are defined using the SQLAlchemy Python-level 'default' construct. Column defaults are not used at the SQL level. Therefore, when writing schema upgrades, you must use an UPDATE statement..."

Whereas the dev guidelines for 'server_default' would be messier: "Default values for columns are defined in DDL at the SQL level and also at the Python level using the SQLAlchemy 'server_default' construct, except for the following cases where 'default' is used instead: if rows are inserted frequently and we want to avoid the additional database roundtrip incurred when SELECT'ing back the INSERT'ed default; or if the default was added during a maintenance release; or ... In those cases, column defaults are not used at the SQL level. Therefore when writing schema upgrades, ..."

Comment 9 Nick Coghlan 2014-03-31 05:26:39 UTC
The main consideration to be taken into account is whether or not we consider it a supported use case to *update* the Beaker database without going through the ORM. Aside from the migration scripts, I don't think we do - Beaker data mining should all be read-only, and modifications should go through our models, not just for default value handling, but also to handle things like updating the activity logs.

In that context, Dan's preferred approach makes sense - we keep the application more database agnostic by controlling additional behaviours client side rather than including them in the schema. This is actually pretty normal for web applications - for a long time, the standard response from the MySQL devs regarding missing RDBMS features was "eh, you can just deal with that client side".

I also took a look at the implications for an eventual switch to Alembic for database updates: as near as I can tell, it should be able to cope with either approach fairly easily.

Comment 10 Raymond Mancy 2014-03-31 06:22:40 UTC
Ok, fair enough. Let's update the docs and close this ticket.

Comment 11 Dan Callaghan 2014-03-31 06:56:15 UTC
This bug needs to stay open as well, to deal with the fact that our database upgrades have historically defined the default in DDL and therefore an existing migrated database does not match a freshly created one.

The solution is probably to just add a schema upgrade which removes the default from all existing column definitions which may have one.

I'm not sure if it makes sense to go back through old schema upgrade instructions and edit them not to set the default.

Comment 12 Raymond Mancy 2014-04-01 01:25:42 UTC
I think it makes sense in that it will actually make everything consistent and less confusing.

As long as we can do the change and make sure we don't break anything, I think it could be worthwhile.

Comment 13 Dan Callaghan 2014-10-10 06:10:13 UTC
This is one of the inconsistencies which I found and fixed by running our new db migration tests against a production schema.

http://gerrit.beaker-project.org/3395

Comment 16 Dan Callaghan 2014-11-25 07:18:50 UTC
Beaker 19.0 has been released.


Note You need to log in before you can comment on or make changes to this bug.