Bug 1686818 - Altering table's column used in primary key causes data corruption
Summary: Altering table's column used in primary key causes data corruption
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: mariadb
Version: 8.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: rc
: 8.0
Assignee: Michal Schorm
QA Contact: Anna Khaitovich
URL:
Whiteboard:
Depends On:
Blocks: 1346768 1702602
TreeView+ depends on / blocked
 
Reported: 2019-03-08 11:59 UTC by Damien Ciabrini
Modified: 2020-03-04 10:03 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Fixed as a RHEL-8.0.0 zero day update. Now I just removed the patch and rebased to the version wich conatins the patch.
Clone Of:
: 1702602 (view as bug list)
Environment:
Last Closed: 2020-03-04 10:03:57 UTC
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)
SQL commands that trigger the data corruption on disk (1.41 KB, text/plain)
2019-03-08 11:59 UTC, Damien Ciabrini
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2019:3708 None None None 2019-11-05 22:42:42 UTC

Description Damien Ciabrini 2019-03-08 11:59:59 UTC
Created attachment 1542092 [details]
SQL commands that trigger the data corruption on disk

Description of problem:
In OpenStack, when we create tables for OpenStack services, we do so
in an iterative fashion that consists in creating the table and
altering its schema based on each API change that happened in the
history of the OpenStack service. More specifically, a table is
initialized via a succession of SQL commands like e.g.:

    CREATE TABLE standardattributes (
            id BIGINT NOT NULL AUTO_INCREMENT,
            resource_type VARCHAR(255) NOT NULL,
            PRIMARY KEY (id)
    )ENGINE=InnoDB;
    ALTER TABLE standardattributes ADD COLUMN created_at DATETIME;
    ALTER TABLE standardattributes ADD COLUMN updated_at DATETIME;
    ALTER TABLE standardattributes ADD COLUMN description VARCHAR(255);
    ALTER TABLE standardattributes ADD COLUMN revision_number BIGINT NOT NULL DEFAULT '0';

In MariaDB 10.3.11, there are cases where this initialization pattern
triggers a bug that causes a data corruption on disk as soon as the
mysql server is restarted.  Once the data is corrupted, there's no
simple means to repair the table, thus causing an severe outage on
the OpenStack cloud.

For what we could found, the data corruption is 100% reproducible
and is triggered whenever we alter the first column of a table which
is being used in a primary key.

With the attached standalone SQL reproducer, we see that:
  . altering the only field of the primary key doesn't cause any corruption
  . altering the first column of the primary key causes a corruption on server restart
  . altering another column of the primary key doesn't cause any corruption
  . altering a random column in the table doesn't cause any corruption

Checking with upstream MariabDB package, I confirm that this bug is also
present on 10.3.11 (MariaDB-10.3.11-centos73-x86_64*.rpm), and
that it is fixed in 10.3.12 (MariaDB-10.3.12-centos73-x86_64*.rpm).


Version-Release number of selected component (if applicable):
mariadb-server-10.3.11-1.module+el8+2765+cfa4f87b.x86_64

How reproducible:
100% of the time

Steps to Reproduce:
1. create and empty db and start the server with default options:
# rm -rf /var/lib/mysql/ && mkdir -p /var/lib/mysql/ && sudo chown mysql. /var/lib/mysql/ && sudo -u mysql mysql_install_db; mysqld_safe &

2. run the attached SQL script that creates some tables and alter them.
# mysql < ~/mariadb_10_3_11_alter_table_bug.sql

3. verify that the altered table is not corrupted yet:
# mysql bug -e 'show table status;' | grep test2_broken
test2_broken    InnoDB  10      Dynamic 0       0       16384   0       0       0       NULL    2019-03-08 06:47:38     NULL    NULL    latin1_swedish_ci       NULL                    0       N

# mysql bug -e 'select * from test2_broken;'

4. restart the server to trigger the data corruption
# mysqladmin shutdown && mysqld_safe &

5. verify that the altered table is corrupted and one can't fetch data from it anymore.
# mysql bug -e 'show table status;' | grep test2_broken
test2_broken    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    Table 'bug.test2_broken' doesn't exist in engine   NULL     NULL

# mysql bug -e 'select * from test2_broken;'
ERROR 1932 (42S02) at line 1: Table 'bug.test2_broken' doesn't exist in engine


Actual results:
Table's metadata are corrupted on disk, cannot be recovered with innodb_force_recovery or anything 

Expected results:
No corruption occurs

Additional info:
As the next version of OpenStack depends on mariadb-10.3-module, we're currently blocked on that bug, for which there's no easy workaround on our side.

Comment 4 Michael Bayer 2019-03-11 14:36:18 UTC
Here's the upstream bug:

https://jira.mariadb.org/browse/MDEV-18041

Comment 16 Michal Schorm 2019-03-13 01:45:04 UTC
I accidentaly made my previous comment a "minor update" which does not send mails.

However I will need answers to all of my questions there.

Comment 25 Karel Volný 2019-03-21 13:18:15 UTC
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::   TEST PROTOCOL
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

    Test run ID   : 8257001
    Package       : mariadb-server
    Installed     : mariadb-server-10.3.11-2.module+el8+2885+7b8bb354.x86_64
    beakerlib RPM : beakerlib-1.17-19.el8eng.noarch
    bl-redhat RPM : beakerlib-redhat-1-33.el7bkr.noarch
    Test version  : 1.0
    Test started  : 2019-03-21 09:16:40 EDT
    Test finished : 2019-03-21 09:16:49 EDT (still running)
    Test duration : 9 seconds
    Test name     : /CoreOS/mariadb/Regression/bz1686818-altering-primary-key-causes-corruption
    Distro        : Red Hat Enterprise Linux release 8.0 (Ootpa)
    Hostname      : host-8-246-63.host.centralci.eng.rdu2.redhat.com
    Architecture  : x86_64
    CPUs          : 1 x Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
    RAM size      : 1829 MB
    HDD size      : 19.98 GB

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::   Test description
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

PURPOSE of /CoreOS/mariadb/Regression/bz1686818-altering-primary-key-causes-corruption
Description: Test for BZ#1686818 (Altering table's column used in primary key causes)
Author: Karel Volný <kvolny@redhat.com>
Bug summary: Altering table's column used in primary key causes data corruption
Bugzilla link: https://bugzilla.redhat.com/show_bug.cgi?id=1686818


::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::   Setup
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

:: [ 09:16:40 ] :: [   INFO   ] :: rlImport: Found 'mariadb55/basic', version '1.0-68' in /mnt/tests
:: [ 09:16:40 ] :: [   INFO   ] :: rlImport: Will try to import mariadb55/basic from /mnt/tests/CoreOS/mariadb55/Library/basic/lib.sh
:: [ 09:16:41 ] :: [   LOG    ] :: *** Library variables ***
:: [ 09:16:41 ] :: [   LOG    ] :: $RUN_ON_DB              = 
:: [ 09:16:41 ] :: [   LOG    ] :: $COLLECTIONS            = 
:: [ 09:16:41 ] :: [   LOG    ] :: $PACKAGE                = mariadb
:: [ 09:16:41 ] :: [   LOG    ] :: $mariadbCollection      = 0
:: [ 09:16:41 ] :: [   LOG    ] :: $mariadbCollectionName  = 
:: [ 09:16:41 ] :: [   LOG    ] :: $mariadbVersion         = 10.3.11
:: [ 09:16:41 ] :: [   LOG    ] :: $mariadbPkgPrefix       = 
:: [ 09:16:41 ] :: [   LOG    ] :: $mariadbServiceName     = mariadb
:: [ 09:16:41 ] :: [   LOG    ] :: $mariadbSclPrefix       = 
:: [ 09:16:41 ] :: [   LOG    ] :: $mariadbRootDir         = 
:: [ 09:16:41 ] :: [   LOG    ] :: $mariadbCnf             = /etc/my.cnf
:: [ 09:16:41 ] :: [   LOG    ] :: $mariadbCnfDir          = /etc
:: [ 09:16:41 ] :: [   LOG    ] :: $mariadbCnfIncDir       = /etc/my.cnf.d
:: [ 09:16:41 ] :: [   LOG    ] :: $mariadbCnfClient       = /etc/my.cnf.d/client.cnf
:: [ 09:16:41 ] :: [   LOG    ] :: $mariadbCnfServer       = /etc/my.cnf.d/mariadb-server.cnf
:: [ 09:16:41 ] :: [   LOG    ] :: $mariadbVarDir          = /var
:: [ 09:16:41 ] :: [   LOG    ] :: $mariadbDbDir           = /var/lib/mysql
:: [ 09:16:41 ] :: [   LOG    ] :: $mariadbLockFile        = /var/lock/subsys/mariadb
:: [ 09:16:41 ] :: [   LOG    ] :: $mariadbPidFile         = /var/run/mariadb/mariadb.pid
:: [ 09:16:41 ] :: [   LOG    ] :: $mariadbLog             = /var/log/mariadb/mariadb.log
:: [ 09:16:41 ] :: [   LOG    ] :: $mariadbLogDir          = /var/log/mariadb
:: [ 09:16:41 ] :: [   LOG    ] :: $mariadbSocket          = /var/lib/mysql/mysql.sock
:: [ 09:16:41 ] :: [   LOG    ] :: *******************************
:: [ 09:16:41 ] :: [   PASS   ] :: Command 'rlImport mariadb55/basic' (Expected 0, got 0)
:: [ 09:16:41 ] :: [   PASS   ] :: Creating tmp directory (Expected 0, got 0)
:: [ 09:16:41 ] :: [   PASS   ] :: Command 'pushd /tmp/tmp.p8AGMMqABd' (Expected 0, got 0)
:: [ 09:16:41 ] :: [   PASS   ] :: Command 'rlServiceStop "mariadb"' (Expected 0, got 0)
:: [ 09:16:41 ] :: [   INFO   ] :: using '/var/tmp/beakerlib-8257001/backup' as backup destination
:: [ 09:16:41 ] :: [   INFO   ] :: Restoring the log file (workaround bug BZ#832035)
:: [ 09:16:41 ] :: [   PASS   ] :: Command 'mkdir -p /var/log/mariadb' (Expected 0, got 0)
:: [ 09:16:41 ] :: [   PASS   ] :: Creating the logfile (Expected 0, got 0)
:: [ 09:16:41 ] :: [   PASS   ] :: Fixing ownership of the logfile (Expected 0, got 0)
:: [ 09:16:41 ] :: [   PASS   ] :: Fixing permissions of the logfile (Expected 0, got 0)
:: [ 09:16:41 ] :: [   PASS   ] :: Restoring SELinux context of the logfile (Expected 0, got 0)
:: [ 09:16:41 ] :: [   INFO   ] :: GOOD, 'rpm -Vf /var/log/mariadb/mariadb.log' does not complain'
:: [ 09:16:45 ] :: [   LOG    ] :: rlServiceStart: Service mariadb started successfully
:: [ 09:16:45 ] :: [   PASS   ] :: Command 'rlServiceStart mariadb' (Expected 0, got 0)
:: [ 09:16:45 ] :: [   PASS   ] :: Command 'service mariadb status' (Expected 0, got 0)
:: [ 09:16:45 ] :: [   PASS   ] :: Creating the testing database (Expected 0, got 0)
:: [ 09:16:45 ] :: [   PASS   ] :: Creating the testing table (Expected 0, got 0)
:: [ 09:16:45 ] :: [   PASS   ] :: Inserting some testing data (Expected 0, got 0)
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::   Duration: 5s
::   Assertions: 14 good, 0 bad
::   RESULT: PASS


::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::   Test
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

:: [ 09:16:45 ] :: [   PASS   ] :: Running the reproducer query (Expected 0, got 0)
:: [ 09:16:45 ] :: [   PASS   ] :: Trying to read the testing table (Expected 0, got 0)
:: [ 09:16:45 ] :: [   PASS   ] :: File '/var/tmp/rlRun_LOG.tDiydkaW' should contain 'Laurel.*Hardy' 
:: [ 09:16:47 ] :: [   PASS   ] :: Command 'rlServiceStop "mariadb"' (Expected 0, got 0)
:: [ 09:16:47 ] :: [   LOG    ] :: rlServiceStart: Service mariadb started successfully
:: [ 09:16:47 ] :: [   PASS   ] :: Command 'rlServiceStart mariadb' (Expected 0, got 0)
:: [ 09:16:47 ] :: [   PASS   ] :: Command 'service mariadb status' (Expected 0, got 0)
:: [ 09:16:47 ] :: [   PASS   ] :: Trying to read the testing table (Expected 0, got 0)
:: [ 09:16:47 ] :: [   PASS   ] :: File '/var/tmp/rlRun_LOG.Wa5gEJ3p' should not contain 'ERROR' 
:: [ 09:16:47 ] :: [   PASS   ] :: File '/var/tmp/rlRun_LOG.Wa5gEJ3p' should contain 'Laurel.*Hardy' 
:: [ 09:16:47 ] :: [   PASS   ] :: Dropping the testing database (Expected 0, got 0)
:: [ 09:16:47 ] :: [   PASS   ] :: File '/var/tmp/rlRun_LOG.Wa5gEJ3p' should not contain 'ERROR' 
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::   Duration: 2s
::   Assertions: 11 good, 0 bad
::   RESULT: PASS


::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::   Cleanup
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

:: [ 09:16:49 ] :: [   PASS   ] :: Command 'rlServiceStop "mariadb"' (Expected 0, got 0)
:: [ 09:16:49 ] :: [   PASS   ] :: Command 'rlServiceRestore "mariadb"' (Expected 0, got 0)
:: [ 09:16:49 ] :: [   PASS   ] :: Command 'popd' (Expected 0, got 0)
:: [ 09:16:49 ] :: [   PASS   ] :: Removing tmp directory (Expected 0, got 0)
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::   Duration: 2s
::   Assertions: 4 good, 0 bad
::   RESULT: PASS


::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::   /CoreOS/mariadb/Regression/bz1686818-altering-primary-key-causes-corruption
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

:: [ 09:16:49 ] :: [   LOG    ] :: File '/var/tmp/beakerlib-8257001/journal.xml' stored here: /var/tmp/BEAKERLIB_8257001_STORED_journal.xml
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::   Duration: 9s
::   Phases: 3 good, 0 bad
::   OVERALL RESULT: PASS


Note You need to log in before you can comment on or make changes to this bug.