Bug 1667730

Summary: rocksdb engine causes mysqld to crash at startup
Product: [Fedora] Fedora Reporter: Stefan Becker <chemobejk>
Component: mariadbAssignee: Michal Schorm <mschorm>
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 29CC: dciabrin, hhorak, jstanek, mbayer, mkocka, mmuzila, mschorm, praiskup, SpikeFedora
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-27 23:20:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Example crash report from /var/log/mariadb/mariadb.log none

Description Stefan Becker 2019-01-20 15:31:26 UTC
Created attachment 1521975 [details]
Example crash report from /var/log/mariadb/mariadb.log

Description of problem:

mariadb-server has a weak dependency on mariadb-rocksdb-engine, i.e. a normal user will get rocksdb engine, but it will be unused. Although unused, the rocksdb engine fills up the disk with big empty files without garbage collecting them. Once all disk space has been used up mysqld starts to crash with an assert from the rocksdb plugin (see attachment). The user is then left with mariadb in such a state that he even can't backup his data anymore.


TL;DR: if you are already in this situation then skip to the end for instructions how to revive your mariadb/InnoDB instance.


As far as I can tell from the pages I have read upstream is aware of this crash but I was unable to determine if a fix is available. Such a patch will of course not fix the root cause that rocksdb fills up the disk space.


Version-Release number of selected component (if applicable):

mariadb-10.3.11-1.fc29.x86_64
mariadb-server-10.3.11-1.fc29.x86_64
mariadb-rocksdb-engine-10.3.11-1.fc29.x86_64


How reproducible: Always


Steps to Reproduce:
1. install mariadb-server (which drags in mariadb-rocksdb-engine)
2. systemctl restart mariadb 
3. repeat step 2 and watch the partition where /var/lib/mysql is located fill up


Actual results:

Once all disk space on that partition is used up mysqld starts to crash


Expected results:

An unused feature, dragged into the system through a weak dependency, should not endanger legitimate use cases of the main package, a small mariadb/InnoDB setup in my case.


Additional info:

I woke up this morning to find that my Fedora mythtv DVR machine had become unusable due to the above mentioned crash. After a few hours of unsuccessful searching and reading I accidentally stumbled over a way how to revive my mariadb installation without apparent data loss (well at least I can't see any lost data). After the emergency was handled I started to find a real fix, because I couldn't believe that a mariadb database from which mysqldump gnerates a ~570MB .sql backup file can use up a 20GB /var/lib/mysql partition.

I finally discovered the root cause:

 - mariadb-server has a weak dependency on mariadb-rocksdb-engine
 - in a default setup the default engine is still InnoDB, i.e. rocksdb is unused
 - every time you start mariadb service, rocksdb engine creates new files, one of them a ~120MB .log file, the other a .sst file
   * as far as I can tell both files do not contain any information
 - rocksdb engine does not garbage collect/merge old .log/.sst files
   * I have been unable to discover if there is such a feature/option

In my use case the machine is on average booted up 1-2 times a day and it took about 6 months for the 20GB /var/lib/mysql to fill up.


My suggestions how to address this problem:

 (a) remove the weak dependency from mariadb-server. This will mean that only users who *REALLY DO WANT TO USE* rocksdb engine will install it.
   * IMHO it is unacceptable that a weak dependency can break a working system 
   * this might also apply to mariadb-tokudb-engine, although that doesn't cause any problems
   * as this will not help existing installations IMHO also a Fedora blog entry should be written to encourage to deinstall mariadb-rocksdb-engine

 (b) figure out the options/configuration which tell rocksdb engine to garbage collect/merge .log & .sst files at startup
   * those should be in the default configuration file provided by the mariadb-rocks-engine package


----- IF YOU HAVE ALREADY RUN INTO THE SITUATION THAT YOUR MARIADB/INNODB INSTALLATION DOESN'T START ANYMORE ----

(a) if you are 100% sure you never used rocksdb engine:

 # systemctl stop mariadb
 # dnf remove mariadb-rocksdb-engine
 # rm -rf /var/lib/mysql/#rocksdb
 # systemctl start mariadb

(b) if are not 100% certain and just want to revive your mariadb/InnoDB instance:

 # systemctl stop mariadb
 # rsync -av /var/lib/mysql/#rocksdb/ /path/to/disk/with/some/free/space/
 # rm -rf /var/lib/mysql/#rocksdb/*
 # rsync -av /path/to/disk/with/some/free/space/ /var/lib/mysql/#rocksdb/
 # restorecon -vR /var/lib/mysql/#rocksdb
 # systemctl start mariadb

Explanation: the copy reduces the .log files to their real size (in my case I have seen 120MB -> ~400 bytes reduction).

Please note that (b) will just delay the inevitable, i.e. at some point rocksdb will have created more files than the partition can hold.

Comment 1 Michal Schorm 2019-01-21 11:08:51 UTC
Not good, not good :/

--

Since from the first glimpse it appears to me it is possible ofr the RocksDB potentionally devour all space on the partition, it may seriously affect the whole system.

Starting work ASAP.

--

I won't remove the weak dependency entirely, but it should be probabbly changed from "Recommends" to "Suggests", which at minimum won't install those packages to users *by default*.

As it is not a first serious issue with RocksDB, I'll consider it future in Fedora. The Facebook upstream with their motto "Move fast, break things" are surely successful in both parts :/

Comment 2 Michal Schorm 2019-01-21 14:11:54 UTC
I took a fresh F29 machine and ran two tests:

1)
Install all MariaDB packages 
  $ dnf install -y "*mariadb*"

and restart the server >1000 times.

2)
Same as 1), but create a table in RocksDB engine and after every start, insert data into it.

---

Both variants behaved in a simmilar way.
it holds a few MB of data inside "/var/lib/mysql/#rocksdb" directory when the database is stopped.
Once started, it grew to ~ 80 MB, but when the database is stopped again, it drops back to few MB.

Each time the server starts a log file is created. (~50 - 150 KB)
Each time the data were written, "<number>.sst" file is created (~800 B)

After 1000 restarts and another 1000 restarts with writes (which mean after few hours) the rocksdb directory grew up few hundreds MB, which corresponds with the two thousands log files and another thousand .sst files.

Not anywhere near dangerously devouring all of the free space in a short time - which corresponds with you describing the problem was building up to half a year. (which mean - through several MariaDB releases ?)

---

So to sum it up for now:

* I'll change the weak dependency from "Recommends" to "Requires" as I already mentioned.
* I'll look into the neverending line of rocksdb log and sst files.
* I can look for some tweaks of the configuration file which would be distributed by default.
  Although I'm strongly in favor of *users* configuring the app as they need, before they run it.

* I haven't experienced, that the files would be mostly empty.
  This looks like an important part of the issue, however I havn't found a way to reporoduce, so far. Can it be caused by your filesystem?

* I suspect, the issue may build over several MariaDB releases.

* I don't see this as anyhow urgent issue, but even though I haven't experienced it as severe as you, I see there clearly is a possibility of such issue in the long run.

* I don't currently plan to notify the Fedora users via blog, mailing lists or any such channel.
  Atleast till I found solution / answers for the better handling or garbage collecting of the produced files.


---

And thank you.
The report is well written and if anybody else would bump into the same issue I believe they will find it highly useful :)

Comment 3 Stefan Becker 2019-01-21 16:23:10 UTC
So restarting mariadb isn't the trigger that causes rocksdb engine to generate big empty files. I remember from looking into the rocksdb# directory that the first .log files weren't that big, just like you described. But somewhere along the way during the last 6 months the file size reported by "du -sh" changed to ~120MB and that stayed for every mariadb restart after that.

- the system was first installed with F28 and then upgraded to F29, all with regular updates.

- /var/lib/mysql is mount point for a 20G ext4 filesystem, no special options

- the /var/lib/mysql partition contents stem from an older machine that ran mythtv/mysql|mariadb for ~10 years
  * transferred using dump / ssh / restorefs
  * there were no rocksdb files on the old machine, i.e. the first file is from a date after the install of the new machine

- "dnf upgrade" that touched mariadb were always executed with mariadb stopped

- If I remember correctly either after the initial old machine -> F28 mariadb data transfer or the F28 -> F29 upgrade I had to run mysql_upgrade to stop mysqld to complain with warnings at startup. Could that be the trigger?

Comment 4 Ben Cotton 2019-10-31 18:53:28 UTC
This message is a reminder that Fedora 29 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 29 on 2019-11-26.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '29'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 29 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 5 Ben Cotton 2019-11-27 23:20:35 UTC
Fedora 29 changed to end-of-life (EOL) status on 2019-11-26. Fedora 29 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.