Bug 1686818
Summary: | Altering table's column used in primary key causes data corruption | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Damien Ciabrini <dciabrin> | ||||
Component: | mariadb | Assignee: | Michal Schorm <mschorm> | ||||
Status: | CLOSED ERRATA | QA Contact: | Anna Khaitovich <akhaitov> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 8.0 | CC: | akhaitov, chjones, databases-maint, hhorak, jwboyer, kvolny, lmiksik, mbayer, michele, mmcgrath, toneata, vdanek, vhutsky | ||||
Target Milestone: | rc | Keywords: | ZStream | ||||
Target Release: | 8.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | No Doc Update | |||||
Doc Text: |
Fixed as a RHEL-8.0.0 zero day update.
Now I just removed the patch and rebased to the version wich conatins the patch.
|
Story Points: | --- | ||||
Clone Of: | |||||||
: | 1702602 (view as bug list) | Environment: | |||||
Last Closed: | 2020-03-04 10:03:57 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1346768, 1702602 | ||||||
Attachments: |
|
Here's the upstream bug: https://jira.mariadb.org/browse/MDEV-18041 I accidentaly made my previous comment a "minor update" which does not send mails. However I will need answers to all of my questions there. :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :: TEST PROTOCOL :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: Test run ID : 8257001 Package : mariadb-server Installed : mariadb-server-10.3.11-2.module+el8+2885+7b8bb354.x86_64 beakerlib RPM : beakerlib-1.17-19.el8eng.noarch bl-redhat RPM : beakerlib-redhat-1-33.el7bkr.noarch Test version : 1.0 Test started : 2019-03-21 09:16:40 EDT Test finished : 2019-03-21 09:16:49 EDT (still running) Test duration : 9 seconds Test name : /CoreOS/mariadb/Regression/bz1686818-altering-primary-key-causes-corruption Distro : Red Hat Enterprise Linux release 8.0 (Ootpa) Hostname : host-8-246-63.host.centralci.eng.rdu2.redhat.com Architecture : x86_64 CPUs : 1 x Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz RAM size : 1829 MB HDD size : 19.98 GB :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :: Test description :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: PURPOSE of /CoreOS/mariadb/Regression/bz1686818-altering-primary-key-causes-corruption Description: Test for BZ#1686818 (Altering table's column used in primary key causes) Author: Karel Volný <kvolny> Bug summary: Altering table's column used in primary key causes data corruption Bugzilla link: https://bugzilla.redhat.com/show_bug.cgi?id=1686818 :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :: Setup :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :: [ 09:16:40 ] :: [ INFO ] :: rlImport: Found 'mariadb55/basic', version '1.0-68' in /mnt/tests :: [ 09:16:40 ] :: [ INFO ] :: rlImport: Will try to import mariadb55/basic from /mnt/tests/CoreOS/mariadb55/Library/basic/lib.sh :: [ 09:16:41 ] :: [ LOG ] :: *** Library variables *** :: [ 09:16:41 ] :: [ LOG ] :: $RUN_ON_DB = :: [ 09:16:41 ] :: [ LOG ] :: $COLLECTIONS = :: [ 09:16:41 ] :: [ LOG ] :: $PACKAGE = mariadb :: [ 09:16:41 ] :: [ LOG ] :: $mariadbCollection = 0 :: [ 09:16:41 ] :: [ LOG ] :: $mariadbCollectionName = :: [ 09:16:41 ] :: [ LOG ] :: $mariadbVersion = 10.3.11 :: [ 09:16:41 ] :: [ LOG ] :: $mariadbPkgPrefix = :: [ 09:16:41 ] :: [ LOG ] :: $mariadbServiceName = mariadb :: [ 09:16:41 ] :: [ LOG ] :: $mariadbSclPrefix = :: [ 09:16:41 ] :: [ LOG ] :: $mariadbRootDir = :: [ 09:16:41 ] :: [ LOG ] :: $mariadbCnf = /etc/my.cnf :: [ 09:16:41 ] :: [ LOG ] :: $mariadbCnfDir = /etc :: [ 09:16:41 ] :: [ LOG ] :: $mariadbCnfIncDir = /etc/my.cnf.d :: [ 09:16:41 ] :: [ LOG ] :: $mariadbCnfClient = /etc/my.cnf.d/client.cnf :: [ 09:16:41 ] :: [ LOG ] :: $mariadbCnfServer = /etc/my.cnf.d/mariadb-server.cnf :: [ 09:16:41 ] :: [ LOG ] :: $mariadbVarDir = /var :: [ 09:16:41 ] :: [ LOG ] :: $mariadbDbDir = /var/lib/mysql :: [ 09:16:41 ] :: [ LOG ] :: $mariadbLockFile = /var/lock/subsys/mariadb :: [ 09:16:41 ] :: [ LOG ] :: $mariadbPidFile = /var/run/mariadb/mariadb.pid :: [ 09:16:41 ] :: [ LOG ] :: $mariadbLog = /var/log/mariadb/mariadb.log :: [ 09:16:41 ] :: [ LOG ] :: $mariadbLogDir = /var/log/mariadb :: [ 09:16:41 ] :: [ LOG ] :: $mariadbSocket = /var/lib/mysql/mysql.sock :: [ 09:16:41 ] :: [ LOG ] :: ******************************* :: [ 09:16:41 ] :: [ PASS ] :: Command 'rlImport mariadb55/basic' (Expected 0, got 0) :: [ 09:16:41 ] :: [ PASS ] :: Creating tmp directory (Expected 0, got 0) :: [ 09:16:41 ] :: [ PASS ] :: Command 'pushd /tmp/tmp.p8AGMMqABd' (Expected 0, got 0) :: [ 09:16:41 ] :: [ PASS ] :: Command 'rlServiceStop "mariadb"' (Expected 0, got 0) :: [ 09:16:41 ] :: [ INFO ] :: using '/var/tmp/beakerlib-8257001/backup' as backup destination :: [ 09:16:41 ] :: [ INFO ] :: Restoring the log file (workaround bug BZ#832035) :: [ 09:16:41 ] :: [ PASS ] :: Command 'mkdir -p /var/log/mariadb' (Expected 0, got 0) :: [ 09:16:41 ] :: [ PASS ] :: Creating the logfile (Expected 0, got 0) :: [ 09:16:41 ] :: [ PASS ] :: Fixing ownership of the logfile (Expected 0, got 0) :: [ 09:16:41 ] :: [ PASS ] :: Fixing permissions of the logfile (Expected 0, got 0) :: [ 09:16:41 ] :: [ PASS ] :: Restoring SELinux context of the logfile (Expected 0, got 0) :: [ 09:16:41 ] :: [ INFO ] :: GOOD, 'rpm -Vf /var/log/mariadb/mariadb.log' does not complain' :: [ 09:16:45 ] :: [ LOG ] :: rlServiceStart: Service mariadb started successfully :: [ 09:16:45 ] :: [ PASS ] :: Command 'rlServiceStart mariadb' (Expected 0, got 0) :: [ 09:16:45 ] :: [ PASS ] :: Command 'service mariadb status' (Expected 0, got 0) :: [ 09:16:45 ] :: [ PASS ] :: Creating the testing database (Expected 0, got 0) :: [ 09:16:45 ] :: [ PASS ] :: Creating the testing table (Expected 0, got 0) :: [ 09:16:45 ] :: [ PASS ] :: Inserting some testing data (Expected 0, got 0) :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :: Duration: 5s :: Assertions: 14 good, 0 bad :: RESULT: PASS :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :: Test :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :: [ 09:16:45 ] :: [ PASS ] :: Running the reproducer query (Expected 0, got 0) :: [ 09:16:45 ] :: [ PASS ] :: Trying to read the testing table (Expected 0, got 0) :: [ 09:16:45 ] :: [ PASS ] :: File '/var/tmp/rlRun_LOG.tDiydkaW' should contain 'Laurel.*Hardy' :: [ 09:16:47 ] :: [ PASS ] :: Command 'rlServiceStop "mariadb"' (Expected 0, got 0) :: [ 09:16:47 ] :: [ LOG ] :: rlServiceStart: Service mariadb started successfully :: [ 09:16:47 ] :: [ PASS ] :: Command 'rlServiceStart mariadb' (Expected 0, got 0) :: [ 09:16:47 ] :: [ PASS ] :: Command 'service mariadb status' (Expected 0, got 0) :: [ 09:16:47 ] :: [ PASS ] :: Trying to read the testing table (Expected 0, got 0) :: [ 09:16:47 ] :: [ PASS ] :: File '/var/tmp/rlRun_LOG.Wa5gEJ3p' should not contain 'ERROR' :: [ 09:16:47 ] :: [ PASS ] :: File '/var/tmp/rlRun_LOG.Wa5gEJ3p' should contain 'Laurel.*Hardy' :: [ 09:16:47 ] :: [ PASS ] :: Dropping the testing database (Expected 0, got 0) :: [ 09:16:47 ] :: [ PASS ] :: File '/var/tmp/rlRun_LOG.Wa5gEJ3p' should not contain 'ERROR' :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :: Duration: 2s :: Assertions: 11 good, 0 bad :: RESULT: PASS :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :: Cleanup :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :: [ 09:16:49 ] :: [ PASS ] :: Command 'rlServiceStop "mariadb"' (Expected 0, got 0) :: [ 09:16:49 ] :: [ PASS ] :: Command 'rlServiceRestore "mariadb"' (Expected 0, got 0) :: [ 09:16:49 ] :: [ PASS ] :: Command 'popd' (Expected 0, got 0) :: [ 09:16:49 ] :: [ PASS ] :: Removing tmp directory (Expected 0, got 0) :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :: Duration: 2s :: Assertions: 4 good, 0 bad :: RESULT: PASS :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :: /CoreOS/mariadb/Regression/bz1686818-altering-primary-key-causes-corruption :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :: [ 09:16:49 ] :: [ LOG ] :: File '/var/tmp/beakerlib-8257001/journal.xml' stored here: /var/tmp/BEAKERLIB_8257001_STORED_journal.xml :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :: Duration: 9s :: Phases: 3 good, 0 bad :: OVERALL RESULT: PASS |
Created attachment 1542092 [details] SQL commands that trigger the data corruption on disk Description of problem: In OpenStack, when we create tables for OpenStack services, we do so in an iterative fashion that consists in creating the table and altering its schema based on each API change that happened in the history of the OpenStack service. More specifically, a table is initialized via a succession of SQL commands like e.g.: CREATE TABLE standardattributes ( id BIGINT NOT NULL AUTO_INCREMENT, resource_type VARCHAR(255) NOT NULL, PRIMARY KEY (id) )ENGINE=InnoDB; ALTER TABLE standardattributes ADD COLUMN created_at DATETIME; ALTER TABLE standardattributes ADD COLUMN updated_at DATETIME; ALTER TABLE standardattributes ADD COLUMN description VARCHAR(255); ALTER TABLE standardattributes ADD COLUMN revision_number BIGINT NOT NULL DEFAULT '0'; In MariaDB 10.3.11, there are cases where this initialization pattern triggers a bug that causes a data corruption on disk as soon as the mysql server is restarted. Once the data is corrupted, there's no simple means to repair the table, thus causing an severe outage on the OpenStack cloud. For what we could found, the data corruption is 100% reproducible and is triggered whenever we alter the first column of a table which is being used in a primary key. With the attached standalone SQL reproducer, we see that: . altering the only field of the primary key doesn't cause any corruption . altering the first column of the primary key causes a corruption on server restart . altering another column of the primary key doesn't cause any corruption . altering a random column in the table doesn't cause any corruption Checking with upstream MariabDB package, I confirm that this bug is also present on 10.3.11 (MariaDB-10.3.11-centos73-x86_64*.rpm), and that it is fixed in 10.3.12 (MariaDB-10.3.12-centos73-x86_64*.rpm). Version-Release number of selected component (if applicable): mariadb-server-10.3.11-1.module+el8+2765+cfa4f87b.x86_64 How reproducible: 100% of the time Steps to Reproduce: 1. create and empty db and start the server with default options: # rm -rf /var/lib/mysql/ && mkdir -p /var/lib/mysql/ && sudo chown mysql. /var/lib/mysql/ && sudo -u mysql mysql_install_db; mysqld_safe & 2. run the attached SQL script that creates some tables and alter them. # mysql < ~/mariadb_10_3_11_alter_table_bug.sql 3. verify that the altered table is not corrupted yet: # mysql bug -e 'show table status;' | grep test2_broken test2_broken InnoDB 10 Dynamic 0 0 16384 0 0 0 NULL 2019-03-08 06:47:38 NULL NULL latin1_swedish_ci NULL 0 N # mysql bug -e 'select * from test2_broken;' 4. restart the server to trigger the data corruption # mysqladmin shutdown && mysqld_safe & 5. verify that the altered table is corrupted and one can't fetch data from it anymore. # mysql bug -e 'show table status;' | grep test2_broken test2_broken NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL Table 'bug.test2_broken' doesn't exist in engine NULL NULL # mysql bug -e 'select * from test2_broken;' ERROR 1932 (42S02) at line 1: Table 'bug.test2_broken' doesn't exist in engine Actual results: Table's metadata are corrupted on disk, cannot be recovered with innodb_force_recovery or anything Expected results: No corruption occurs Additional info: As the next version of OpenStack depends on mariadb-10.3-module, we're currently blocked on that bug, for which there's no easy workaround on our side.