Bug 1686818

Summary: Altering table's column used in primary key causes data corruption
Product: Red Hat Enterprise Linux 8 Reporter: Damien Ciabrini <dciabrin>
Component: mariadbAssignee: Michal Schorm <mschorm>
Status: CLOSED ERRATA QA Contact: Anna Khaitovich <akhaitov>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 8.0CC: akhaitov, chjones, databases-maint, hhorak, jwboyer, kvolny, lmiksik, mbayer, michele, mmcgrath, toneata, vdanek, vhutsky
Target Milestone: rcKeywords: ZStream
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Fixed as a RHEL-8.0.0 zero day update. Now I just removed the patch and rebased to the version wich conatins the patch.
Story Points: ---
Clone Of:
: 1702602 (view as bug list) Environment:
Last Closed: 2020-03-04 10:03:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1346768, 1702602    
Attachments:
Description Flags
SQL commands that trigger the data corruption on disk none

Description Damien Ciabrini 2019-03-08 11:59:59 UTC
Created attachment 1542092 [details]
SQL commands that trigger the data corruption on disk

Description of problem:
In OpenStack, when we create tables for OpenStack services, we do so
in an iterative fashion that consists in creating the table and
altering its schema based on each API change that happened in the
history of the OpenStack service. More specifically, a table is
initialized via a succession of SQL commands like e.g.:

    CREATE TABLE standardattributes (
            id BIGINT NOT NULL AUTO_INCREMENT,
            resource_type VARCHAR(255) NOT NULL,
            PRIMARY KEY (id)
    )ENGINE=InnoDB;
    ALTER TABLE standardattributes ADD COLUMN created_at DATETIME;
    ALTER TABLE standardattributes ADD COLUMN updated_at DATETIME;
    ALTER TABLE standardattributes ADD COLUMN description VARCHAR(255);
    ALTER TABLE standardattributes ADD COLUMN revision_number BIGINT NOT NULL DEFAULT '0';

In MariaDB 10.3.11, there are cases where this initialization pattern
triggers a bug that causes a data corruption on disk as soon as the
mysql server is restarted.  Once the data is corrupted, there's no
simple means to repair the table, thus causing an severe outage on
the OpenStack cloud.

For what we could found, the data corruption is 100% reproducible
and is triggered whenever we alter the first column of a table which
is being used in a primary key.

With the attached standalone SQL reproducer, we see that:
  . altering the only field of the primary key doesn't cause any corruption
  . altering the first column of the primary key causes a corruption on server restart
  . altering another column of the primary key doesn't cause any corruption
  . altering a random column in the table doesn't cause any corruption

Checking with upstream MariabDB package, I confirm that this bug is also
present on 10.3.11 (MariaDB-10.3.11-centos73-x86_64*.rpm), and
that it is fixed in 10.3.12 (MariaDB-10.3.12-centos73-x86_64*.rpm).


Version-Release number of selected component (if applicable):
mariadb-server-10.3.11-1.module+el8+2765+cfa4f87b.x86_64

How reproducible:
100% of the time

Steps to Reproduce:
1. create and empty db and start the server with default options:
# rm -rf /var/lib/mysql/ && mkdir -p /var/lib/mysql/ && sudo chown mysql. /var/lib/mysql/ && sudo -u mysql mysql_install_db; mysqld_safe &

2. run the attached SQL script that creates some tables and alter them.
# mysql < ~/mariadb_10_3_11_alter_table_bug.sql

3. verify that the altered table is not corrupted yet:
# mysql bug -e 'show table status;' | grep test2_broken
test2_broken    InnoDB  10      Dynamic 0       0       16384   0       0       0       NULL    2019-03-08 06:47:38     NULL    NULL    latin1_swedish_ci       NULL                    0       N

# mysql bug -e 'select * from test2_broken;'

4. restart the server to trigger the data corruption
# mysqladmin shutdown && mysqld_safe &

5. verify that the altered table is corrupted and one can't fetch data from it anymore.
# mysql bug -e 'show table status;' | grep test2_broken
test2_broken    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    Table 'bug.test2_broken' doesn't exist in engine   NULL     NULL

# mysql bug -e 'select * from test2_broken;'
ERROR 1932 (42S02) at line 1: Table 'bug.test2_broken' doesn't exist in engine


Actual results:
Table's metadata are corrupted on disk, cannot be recovered with innodb_force_recovery or anything 

Expected results:
No corruption occurs

Additional info:
As the next version of OpenStack depends on mariadb-10.3-module, we're currently blocked on that bug, for which there's no easy workaround on our side.

Comment 4 Michael Bayer 2019-03-11 14:36:18 UTC
Here's the upstream bug:

https://jira.mariadb.org/browse/MDEV-18041

Comment 16 Michal Schorm 2019-03-13 01:45:04 UTC
I accidentaly made my previous comment a "minor update" which does not send mails.

However I will need answers to all of my questions there.

Comment 25 Karel Volný 2019-03-21 13:18:15 UTC
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::   TEST PROTOCOL
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

    Test run ID   : 8257001
    Package       : mariadb-server
    Installed     : mariadb-server-10.3.11-2.module+el8+2885+7b8bb354.x86_64
    beakerlib RPM : beakerlib-1.17-19.el8eng.noarch
    bl-redhat RPM : beakerlib-redhat-1-33.el7bkr.noarch
    Test version  : 1.0
    Test started  : 2019-03-21 09:16:40 EDT
    Test finished : 2019-03-21 09:16:49 EDT (still running)
    Test duration : 9 seconds
    Test name     : /CoreOS/mariadb/Regression/bz1686818-altering-primary-key-causes-corruption
    Distro        : Red Hat Enterprise Linux release 8.0 (Ootpa)
    Hostname      : host-8-246-63.host.centralci.eng.rdu2.redhat.com
    Architecture  : x86_64
    CPUs          : 1 x Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
    RAM size      : 1829 MB
    HDD size      : 19.98 GB

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::   Test description
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

PURPOSE of /CoreOS/mariadb/Regression/bz1686818-altering-primary-key-causes-corruption
Description: Test for BZ#1686818 (Altering table's column used in primary key causes)
Author: Karel Volný <kvolny>
Bug summary: Altering table's column used in primary key causes data corruption
Bugzilla link: https://bugzilla.redhat.com/show_bug.cgi?id=1686818


::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::   Setup
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

:: [ 09:16:40 ] :: [   INFO   ] :: rlImport: Found 'mariadb55/basic', version '1.0-68' in /mnt/tests
:: [ 09:16:40 ] :: [   INFO   ] :: rlImport: Will try to import mariadb55/basic from /mnt/tests/CoreOS/mariadb55/Library/basic/lib.sh
:: [ 09:16:41 ] :: [   LOG    ] :: *** Library variables ***
:: [ 09:16:41 ] :: [   LOG    ] :: $RUN_ON_DB              = 
:: [ 09:16:41 ] :: [   LOG    ] :: $COLLECTIONS            = 
:: [ 09:16:41 ] :: [   LOG    ] :: $PACKAGE                = mariadb
:: [ 09:16:41 ] :: [   LOG    ] :: $mariadbCollection      = 0
:: [ 09:16:41 ] :: [   LOG    ] :: $mariadbCollectionName  = 
:: [ 09:16:41 ] :: [   LOG    ] :: $mariadbVersion         = 10.3.11
:: [ 09:16:41 ] :: [   LOG    ] :: $mariadbPkgPrefix       = 
:: [ 09:16:41 ] :: [   LOG    ] :: $mariadbServiceName     = mariadb
:: [ 09:16:41 ] :: [   LOG    ] :: $mariadbSclPrefix       = 
:: [ 09:16:41 ] :: [   LOG    ] :: $mariadbRootDir         = 
:: [ 09:16:41 ] :: [   LOG    ] :: $mariadbCnf             = /etc/my.cnf
:: [ 09:16:41 ] :: [   LOG    ] :: $mariadbCnfDir          = /etc
:: [ 09:16:41 ] :: [   LOG    ] :: $mariadbCnfIncDir       = /etc/my.cnf.d
:: [ 09:16:41 ] :: [   LOG    ] :: $mariadbCnfClient       = /etc/my.cnf.d/client.cnf
:: [ 09:16:41 ] :: [   LOG    ] :: $mariadbCnfServer       = /etc/my.cnf.d/mariadb-server.cnf
:: [ 09:16:41 ] :: [   LOG    ] :: $mariadbVarDir          = /var
:: [ 09:16:41 ] :: [   LOG    ] :: $mariadbDbDir           = /var/lib/mysql
:: [ 09:16:41 ] :: [   LOG    ] :: $mariadbLockFile        = /var/lock/subsys/mariadb
:: [ 09:16:41 ] :: [   LOG    ] :: $mariadbPidFile         = /var/run/mariadb/mariadb.pid
:: [ 09:16:41 ] :: [   LOG    ] :: $mariadbLog             = /var/log/mariadb/mariadb.log
:: [ 09:16:41 ] :: [   LOG    ] :: $mariadbLogDir          = /var/log/mariadb
:: [ 09:16:41 ] :: [   LOG    ] :: $mariadbSocket          = /var/lib/mysql/mysql.sock
:: [ 09:16:41 ] :: [   LOG    ] :: *******************************
:: [ 09:16:41 ] :: [   PASS   ] :: Command 'rlImport mariadb55/basic' (Expected 0, got 0)
:: [ 09:16:41 ] :: [   PASS   ] :: Creating tmp directory (Expected 0, got 0)
:: [ 09:16:41 ] :: [   PASS   ] :: Command 'pushd /tmp/tmp.p8AGMMqABd' (Expected 0, got 0)
:: [ 09:16:41 ] :: [   PASS   ] :: Command 'rlServiceStop "mariadb"' (Expected 0, got 0)
:: [ 09:16:41 ] :: [   INFO   ] :: using '/var/tmp/beakerlib-8257001/backup' as backup destination
:: [ 09:16:41 ] :: [   INFO   ] :: Restoring the log file (workaround bug BZ#832035)
:: [ 09:16:41 ] :: [   PASS   ] :: Command 'mkdir -p /var/log/mariadb' (Expected 0, got 0)
:: [ 09:16:41 ] :: [   PASS   ] :: Creating the logfile (Expected 0, got 0)
:: [ 09:16:41 ] :: [   PASS   ] :: Fixing ownership of the logfile (Expected 0, got 0)
:: [ 09:16:41 ] :: [   PASS   ] :: Fixing permissions of the logfile (Expected 0, got 0)
:: [ 09:16:41 ] :: [   PASS   ] :: Restoring SELinux context of the logfile (Expected 0, got 0)
:: [ 09:16:41 ] :: [   INFO   ] :: GOOD, 'rpm -Vf /var/log/mariadb/mariadb.log' does not complain'
:: [ 09:16:45 ] :: [   LOG    ] :: rlServiceStart: Service mariadb started successfully
:: [ 09:16:45 ] :: [   PASS   ] :: Command 'rlServiceStart mariadb' (Expected 0, got 0)
:: [ 09:16:45 ] :: [   PASS   ] :: Command 'service mariadb status' (Expected 0, got 0)
:: [ 09:16:45 ] :: [   PASS   ] :: Creating the testing database (Expected 0, got 0)
:: [ 09:16:45 ] :: [   PASS   ] :: Creating the testing table (Expected 0, got 0)
:: [ 09:16:45 ] :: [   PASS   ] :: Inserting some testing data (Expected 0, got 0)
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::   Duration: 5s
::   Assertions: 14 good, 0 bad
::   RESULT: PASS


::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::   Test
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

:: [ 09:16:45 ] :: [   PASS   ] :: Running the reproducer query (Expected 0, got 0)
:: [ 09:16:45 ] :: [   PASS   ] :: Trying to read the testing table (Expected 0, got 0)
:: [ 09:16:45 ] :: [   PASS   ] :: File '/var/tmp/rlRun_LOG.tDiydkaW' should contain 'Laurel.*Hardy' 
:: [ 09:16:47 ] :: [   PASS   ] :: Command 'rlServiceStop "mariadb"' (Expected 0, got 0)
:: [ 09:16:47 ] :: [   LOG    ] :: rlServiceStart: Service mariadb started successfully
:: [ 09:16:47 ] :: [   PASS   ] :: Command 'rlServiceStart mariadb' (Expected 0, got 0)
:: [ 09:16:47 ] :: [   PASS   ] :: Command 'service mariadb status' (Expected 0, got 0)
:: [ 09:16:47 ] :: [   PASS   ] :: Trying to read the testing table (Expected 0, got 0)
:: [ 09:16:47 ] :: [   PASS   ] :: File '/var/tmp/rlRun_LOG.Wa5gEJ3p' should not contain 'ERROR' 
:: [ 09:16:47 ] :: [   PASS   ] :: File '/var/tmp/rlRun_LOG.Wa5gEJ3p' should contain 'Laurel.*Hardy' 
:: [ 09:16:47 ] :: [   PASS   ] :: Dropping the testing database (Expected 0, got 0)
:: [ 09:16:47 ] :: [   PASS   ] :: File '/var/tmp/rlRun_LOG.Wa5gEJ3p' should not contain 'ERROR' 
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::   Duration: 2s
::   Assertions: 11 good, 0 bad
::   RESULT: PASS


::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::   Cleanup
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

:: [ 09:16:49 ] :: [   PASS   ] :: Command 'rlServiceStop "mariadb"' (Expected 0, got 0)
:: [ 09:16:49 ] :: [   PASS   ] :: Command 'rlServiceRestore "mariadb"' (Expected 0, got 0)
:: [ 09:16:49 ] :: [   PASS   ] :: Command 'popd' (Expected 0, got 0)
:: [ 09:16:49 ] :: [   PASS   ] :: Removing tmp directory (Expected 0, got 0)
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::   Duration: 2s
::   Assertions: 4 good, 0 bad
::   RESULT: PASS


::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::   /CoreOS/mariadb/Regression/bz1686818-altering-primary-key-causes-corruption
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

:: [ 09:16:49 ] :: [   LOG    ] :: File '/var/tmp/beakerlib-8257001/journal.xml' stored here: /var/tmp/BEAKERLIB_8257001_STORED_journal.xml
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::   Duration: 9s
::   Phases: 3 good, 0 bad
::   OVERALL RESULT: PASS