Bug 1335849
Summary: | MariaDB removes all databases! | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Jan ONDREJ <ondrejj> | ||||||||
Component: | mariadb | Assignee: | Jakub Dorňák <jdornak> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 23 | CC: | hhorak, jdornak, jstanek, mmuzila, praiskup | ||||||||
Target Milestone: | --- | Keywords: | Reopened | ||||||||
Target Release: | --- | ||||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | mariadb-10.1.14-2.fc24 mariadb-10.1.14-5.fc24 mariadb-10.1.16-1.fc24 mariadb-10.0.25-3.fc23 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | |||||||||||
: | 1356897 (view as bug list) | Environment: | |||||||||
Last Closed: | 2016-07-24 20:19:57 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 1356897 | ||||||||||
Attachments: |
|
Description
Jan ONDREJ
2016-05-13 11:44:24 UTC
I reported more things in this script. chown and selinux permissions are not applied due to running as user and specially problem of loss of data which are removed by "rm -rf ...". I don't think, that this patch fixes any of these problems. mariadb-10.1.14-2.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-1d438bc429 Often mariadb is on a separate partition. If this partition is an ext2/3/4, then there will be lost+found directory, which prevents directory creation with this new patch. Still, using rm -rf in scripts is very dangerous. Please remove it and add comment, that user should clean it's directory manually. I bet this issue still needs some care. For brainstorming, similar issues were discussed in PG bug https://bugzilla.redhat.com/show_bug.cgi?id=1247477 There seem to be some valid points in this bugreport. Also, there were problems preventing Fedora maintainers to do the db init automatically in PostgreSQL, since those days we require explicit initdb request ... Otherwise, to me, it is not fair to ask administrator to remove the partially created database in case of failure, so removing the leftovers should be done automatically (where rm -rf helps). I always prefer 'rm -rf "$var"' instead of 'rm -rf "$var"/*' if possible (rm has some safety belts), but I wouldn't remove that cleanup, if we agree that the code is written correctly. mariadb-10.1.14-2.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report. I might seem like Pavel's notice were not taken into account, but maybe you guys solved it off-line? I just want to make sure the comments are just not overseen. (In reply to Honza Horak from comment #7) > I might seem like Pavel's notice were not taken into account, but maybe you > guys solved it off-line? I just want to make sure the comments are just not > overseen. Now I read the last status change correctly, that it is actually re-opening the bug (not closing), which seems to be exactly because of the above.. sorry for the noise. (In reply to Fedora Update System from comment #6) > mariadb-10.1.14-2.fc24 has been pushed to the Fedora 24 stable repository. As I mentioned in bodhi, this update is not working. There is not only problem, that not all of problems reported in this bug are fixed, but also added a new problem. After update to this release I can't start mariadb anymore. PreExec script fails and then mariadb is not started. Please, do not play with users. On RHEL/CentOS 7 there is no other mysql compatible database for LAMP. MariaDB is an important part of RHEL7. What happened on my side. I have a separate ext4 partition for mariadb. Ext4 has an "lost+found" directory (/var/lib/mysql/lost+found). This directory should be readable only by root. You are using this code: if ! ls $datadir/* &>/dev/null; then If there is at least one non-readable directory, then ls $datadir/* fails. Trying to prepare data directory, which already exists, which is not a good idea. There are still problems to run chown and restorecon. They has no effect, because this script is run by systemd as user mysql. Don't fail, if owner is already set, but fails, if not. Warning! If you make this script runnable as root, then chown will have security problem. By default chown uses dereference on symlinks, so if an attacker can compromise mysql user, it can compromise all users on system too. My suggestion: Check, if there is $datadir/mysql directory present (mysql database in $datadir). If not, then try to init database. This should be a good test if mysql was never initialized. If prepare of new mysql fails, then please do not remove any files not created by prepare. If you don't know, which files are created by prepare and which don't, then do not remove any files. Thank you. (In reply to Jan ONDREJ from comment #9) > (In reply to Fedora Update System from comment #6) > > mariadb-10.1.14-2.fc24 has been pushed to the Fedora 24 stable repository. > > As I mentioned in bodhi, Thanks for bodhi update, the sum(karma) algorithm does not work everywhere; this is discussed frequently on fedora devel list, off topic here. Sorry for this. > this update is not working. There is not only problem, that not all of > problems reported in this bug are fixed, but also added a new problem. After > update to this release I can't start mariadb anymore. PreExec script fails > and then mariadb is not started. I was about to write here that the situation is not worse than before, but it is probably not truth. For users that use direct mount-points as --datadir, this can cause data-loss with much higher probability: the mysql_install_db is (almost) _always_ called and if that fails (which should not happen) -- data loss comes. Both from your bodhi comment and from comment #4 it looked like the issue is that the script failed to initialize. There should be imo two checks: 1. if data directory is mountpoint, warn user properly (don't fail) 2. if the directory is non-empty, fail and explain this to user You should avoid using direct mount points just because, similarly as with postgresql case, if the directory got unmounted for whatever reason - mysqld could write to files on different storage. Sub-directory will much likely not-exist on the other storage. I'm not sure whether mysqld would ever write there, but just in case .. > Please, do not play with users. On RHEL/CentOS 7 there is no other mysql > compatible database for LAMP. MariaDB is an important part of RHEL7. You raised a valid points which need to be taken into account. This is not RHEL issue (still fedora only) and we still do not have this closed, and as it looks -- we should have this fixed ASAP. Please keep having a look at this bug, your review is really appreciated. > What happened on my side. I have a separate ext4 partition for mariadb. Ext4 > has an "lost+found" directory (/var/lib/mysql/lost+found). This directory > should be readable only by root. You are using this code: > > if ! ls $datadir/* &>/dev/null; then > > If there is at least one non-readable directory, then ls $datadir/* fails. Uh, this is good point (at least I) missed previously. > Trying to prepare data directory, which already exists, which is not a good > idea. Even though it shouldn't be dangerous to run mysql_install_db again against already initialized $datadir and it should succeed, the following code might remove databases if it accidentally failed. > There are still problems to run chown and restorecon. They has no effect, > because this script is run by systemd as user mysql. Don't fail, if owner is > already set, but fails, if not. Deserves cleanup. > Warning! If you make this script runnable as root, then chown will have > security problem. By default chown uses dereference on symlinks, so if an > attacker can compromise mysql user, it can compromise all users on system > too. I think it is clear that we shouldn't make the script to much clever, so fixing permissions and running restorecon doesn't look like a good thing if the directory is already initialized... This should be fixed by admin manually. > My suggestion: > > Check, if there is $datadir/mysql directory present (mysql database in > $datadir). If not, then try to init database. I would slightly prefer completely empty $datadir, taking into account that this is downstream script (this semantic will never need additional tweaking if upstream changes things). Portable checking for empty directory in shell is not trivial, but we don't have to care about portability. > This should be a good test if mysql was never initialized. If prepare of new > mysql fails, then please do not remove any files not created by prepare. If > you don't know, which files are created by prepare and which don't, then do > not remove any files. Thank you. I agree with you here. The question is, however, how to recongize files which are "not created by prepare". That is also why I prefer running the mysql_install_db only if the datadir is really empty. (In reply to Pavel Raiskup from comment #10) > (In reply to Jan ONDREJ from comment #9) > > (In reply to Fedora Update System from comment #6) > > > mariadb-10.1.14-2.fc24 has been pushed to the Fedora 24 stable repository. > There should be imo two checks: > > 1. if data directory is mountpoint, warn user properly (don't fail) I don't think more complications are good for this script. > 2. if the directory is non-empty, fail and explain this to user Should we report normal startup? > You should avoid using direct mount points just because, similarly as with > postgresql case, if the directory got unmounted for whatever reason - mysqld > could write to files on different storage. Sub-directory will much likely > not-exist on the other storage. I'm not sure whether mysqld would ever > write there, but just in case .. For large databases (for example 5GB or more) it's still better to have separate filesystem for mysql/mariadb. Logs and user data can't fail database, it's better to monitor data usage and easier to grow storage if needed. If it's not mounted for any reason, then does not matter, if new database is created. I can change permission to unmounted directory or if it's set in fstab, then server can't boot without this dir, so it's not a point. > You raised a valid points which need to be taken into account. This is not > RHEL issue (still fedora only) and we still do not have this closed, and as > it > looks -- we should have this fixed ASAP. Please keep having a look at this > bug, your review is really appreciated. Are you sure it's not a RHEL issue? Which mysql compatible database is in RHEL7? I see mariadb-5.5.47-1.el7_2.x86_64, which uses same script. > > if ! ls $datadir/* &>/dev/null; then > > > > If there is at least one non-readable directory, then ls $datadir/* fails. > > Uh, this is good point (at least I) missed previously. > > > My suggestion: > > > > Check, if there is $datadir/mysql directory present (mysql database in > > $datadir). If not, then try to init database. > > I would slightly prefer completely empty $datadir, taking into account that > this is downstream script (this semantic will never need additional tweaking > if > upstream changes things). Portable checking for empty directory in shell is > not trivial, but we don't have to care about portability. What's wrong on checking mysql database only? Is it possible to use mysql/mariadb without mysql database? I think this is a system database and can't be changed, so checking any of system tables should be enough to check, if database has been already initialized. > > This should be a good test if mysql was never initialized. If prepare of new > > mysql fails, then please do not remove any files not created by prepare. If > > you don't know, which files are created by prepare and which don't, then do > > not remove any files. Thank you. > > I agree with you here. The question is, however, how to recongize files > which > are "not created by prepare". That is also why I prefer running the > mysql_install_db only if the datadir is really empty. Agree, hard to check files created by prepare. May be this can be done by strace or some inotify, but I don't recommend these. May be we can make a list of basic files created by prepare, but this is hard to maintain. So result is to do not remove datadir contents. After off-bugzilla discussion with Jano (thanks!), it looks like accidental data directory removal after package update to Release == 2 is really more unlikely than I thought before (additional check for user.frm prevents the 'rm -rf' on successfully initialized data). The real issue is that mariadb, after package update, fails to start for correctly initialized servers. Work-around for Jano was to chown the 'lost+found' to mariadb user... More comments FTR in-line. (In reply to Jan ONDREJ from comment #11) > (In reply to Pavel Raiskup from comment #10) > > 1. if data directory is mountpoint, warn user properly (don't fail) > > I don't think more complications are good for this script. Right, I would just prefer that warning, but it is orthogonal issue and detail anyway. > > 2. if the directory is non-empty, fail and explain this to user > > Should we report normal startup? We can not, good point. This would make sense only in case of *explicit* request for database initialization. Checking for non-emptiness is a bad idea then. And it kind of looks like automatic database initialization is too. > Are you sure it's not a RHEL issue? Which mysql compatible database is in > RHEL7? I see mariadb-5.5.47-1.el7_2.x86_64, which uses same script. Sorry, it is of course RHEL related, wrongly spelled. The latest update was Fedora-only. > What's wrong on checking mysql database only? Except for this 'mysql' could for example change to 'mariadb' in future, I agree it is not wrong (and it was done like this before). Having that check in place, however, requires us to stop cleaning the incompletely initialized data directory, which is inconvenient. > So result is to do not remove datadir contents. +1 if we do not initialize the data directory automatically by 'service mariadb start', too. Jakub, what do you think? As for the "chown and restorecon" issues -- yes these are ineffective when running as mysql user (as part of service start), but the script `mysql-prepare-db-dir` might be run by user as root as well (see user specification as arguments or getting it from mariadb.service) when somebody wants to prepare the directory before (not automatically during service start). Then the chown+restorecon is important. (the root use case is mentioned in the comment in the script, but we should make it documented better) As for the fix for the check/remove of datadir content, I've looked into history why this whole `rm` thing was introduced and it is related to discussions in bz#835131. There are cases where mysql_install_db fails during `systemctl mariadb start` and even after user fixes the issues, the service won't start anymore. It seems to me like good practice to clean after script failure, because the information about the mysql_install_db script failure is not stored anywhere and the datadir might look like initialized from the first sight and user won't have idea what is wrong. However, I agree we should do it *only* when we are able to do so (i.e. we're sure we don't remove anything else what was not created by the script). That is we can remove the content of the dir only when the dir was empty before running the mysql_install_db. So what would be my preferred solution is: * check whether dir is empty (or it includes only expected directory lost+found) * remove content of the dir (except lost+found) when mysql_install_db fails * if the directory is not empty but mysql/ directory is missing, then print error and let user to initialize manually or empty the directory I also like printing a warning when the datadir is mounting point, but it can be solved later. The same for better documentation of running mysql-prepare-db-dir as root manually. Created attachment 1179483 [details] Proposed fix: Check and empty datadir more carefully This is my attempt to make the script more careful for the tricky cases. Jan/Pavel/Jakub, what do you think? Scratch build on the way: http://koji.fedoraproject.org/koji/taskinfo?taskID=14887928 Thanks Honzo! (In reply to Honza Horak from comment #13) > So what would be my preferred solution is: > * check whether dir is empty (or it includes only expected directory > lost+found) Done. > * remove content of the dir (except lost+found) when mysql_install_db fails Command: find "$datadir"/* -not -name 'lost+found' -delete removes lost+found contents. This could be used: find "$datadir" -mindepth 1 -maxdepth 1 -not -name lost+found \ -exec rm -rf {} + There is check that we can run this "cleanup" action: if [ ! -e "$datadir/mysql/user.frm" ] && [ -d "$datadir" ] ; then... I'm not sure why [ -d "$datadir" ], perhaps check for non-empty $datadir..? > * if the directory is not empty but mysql/ directory is missing, then print > error and let user to initialize manually or empty the directory This looked like a good idea to me, but was not implemented, the warning is always written to standard output. Created attachment 1179620 [details] Proposed fix: Check and empty datadir more carefully (In reply to Pavel Raiskup from comment #15) > Thanks Honzo! > > (In reply to Honza Horak from comment #13) > > So what would be my preferred solution is: > > * check whether dir is empty (or it includes only expected directory > > lost+found) > > Done. > > > * remove content of the dir (except lost+found) when mysql_install_db fails > > Command: find "$datadir"/* -not -name 'lost+found' -delete > removes lost+found contents. > This could be used: > find "$datadir" -mindepth 1 -maxdepth 1 -not -name lost+found \ > -exec rm -rf {} + Uups, thanks for catching and suggestion. Fixed in updated patch. > There is check that we can run this "cleanup" action: > if [ ! -e "$datadir/mysql/user.frm" ] && [ -d "$datadir" ] ; then... > I'm not sure why [ -d "$datadir" ], perhaps check for non-empty $datadir..? This was meant for avoiding disaster when $datadir was anything else than what we expect (e.g. empty string) from some weird reason. > > * if the directory is not empty but mysql/ directory is missing, then print > > error and let user to initialize manually or empty the directory > > This looked like a good idea to me, but was not implemented, the warning is > always written to standard output. Correct, I've added extra check for this case when $datadir/mysql/ is missing and let the script fail in that case. Please, see the updated patch. Latest script looks OK. I tested some possible situation and works as expected. I can only suggest one more feature, which will prevent data deletion of user data. If you need to know, which files have been created by init_db script, then you should remember date/time before init_db and delete only files created after this date. CURRENT_TIME=`LANG=C date -u` mysql_install_db ... ... find "$datadir" -mindepth 1 -maxdepth 1 -newermt "$CURRENT_TIME" -not -name lost+found -exec rm -rf {} + Still, user can have wrong date at startup, but this gives at least one more condition. That sounds good to me, I'll include it into the patch. Thanks for the update Jan and Honza. The latest patch looks fine to me, too. The timestamp check sounds interesting, two suggestions in case we would go this direction ... I would use 'touch SOMEFILE' and 'find -newer SOMEFILE' (to avoid date parsing issues) and also, we should take care of filesystems that have only 1 second resolution in {c,a,m}time fields... So, probably 'sleep 1' (before mysqld_install_db) would be work-around. (In reply to Pavel Raiskup from comment #19) > The timestamp check sounds interesting, two suggestions in case we would go > this > direction ... I would use 'touch SOMEFILE' and 'find -newer SOMEFILE' (to > avoid date parsing issues) I think this could add more problems, if you create somefile on full filesystems or with any other error. Parsing of "C" date should be OK, or you can use @timestamp syntax. > and also, we should take care of filesystems > that have only 1 second resolution in {c,a,m}time fields... > So, probably 'sleep 1' (before mysqld_install_db) would be work-around. +1 for sleep 1 (In reply to Jan ONDREJ from comment #20) > I think this could add more problems, if you create somefile on full > filesystems or with any other error. OK. Sounds like good reason why mariadb_install_db might fail. Created attachment 1179942 [details]
Proposed fix: Check and empty datadir more carefully
Another iteration with time check.
Comment on attachment 1179942 [details]
Proposed fix: Check and empty datadir more carefully
Looks good and simple, but still there are possibilities to remove files under subdirectories, if directories are modified, but files inside can be older.
Instead of:
find "$datadir" -mindepth 1 -maxdepth 1 -newermt "$CURRENT_TIME" -not -name "lost+found" -exec rm -rf {} +
it would be better to have:
find "$datadir" -newermt "$CURRENT_TIME" -delete
For find -delete implies -depth, which processes directory contents before directory itself.
Also having "-not -name lost+found" has no effect. This directory has been created before timestamp, so no need to specify it exactly. Another problem is, that find is still trying to access lost+found and writes an error, even if it's excluded via -not -name and fails with error code 1.
This can be avoided with -prune, but -prune is not compatible with -depth, which is implied by -delete. I'm not sure, how to deal with this. I think we can only ignore error messages and return code from find. :-(
May be rename my example CURRENT_TIMESTAMP to INITDB_TIMESTAMP.
(In reply to Jan ONDREJ from comment #23) > Comment on attachment 1179942 [details] > Proposed fix: Check and empty datadir more carefully > > Looks good and simple, but still there are possibilities to remove files > under subdirectories, if directories are modified, but files inside can be > older. I tend to disagree here, it is not simple anymore :(. As long as we agree that (except for lost+found) the directory is empty before running mariadb_install_db, the check for timestamp is rather pedantic. It is just matter of additional check, so I don't complain.. however I doubt it is useful to iterate longer, there are other issues worht having a look :). For me, it is enough to check that the directory is not empty. > find "$datadir" -mindepth 1 -maxdepth 1 -newermt "$CURRENT_TIME" -not > -name "lost+found" -exec rm -rf {} + > it would be better to have: > find "$datadir" -newermt "$CURRENT_TIME" -delete > For find -delete implies -depth, which processes directory contents before > directory itself. Why it would be better in this case (except for additional forks)? > Also having "-not -name lost+found" has no effect. This directory has been > created before timestamp, so no need to specify it exactly. Another problem > is, that find is still trying to access lost+found and writes an error, even > if it's excluded via -not -name and fails with error code 1. The -depth is bad in this case. The problem we face now is that the find command is wrongly spelled, you must use logical operators between find expressions: -> find DIR -min/max depth EXPR -and EXPR probably: find /var/lib/mysql -mindepth 1 -maxdepth 1 \ -not -name lost+found \ -and \ -newermt "$date" But I really doubt this is worth having in ; directory is empty before we got into it, so we can remove everything (except for lost+found). OK, I was wrong here, the '-and' seems to be there by default. Well, so far there are these commits pushed: http://pkgs.fedoraproject.org/cgit/rpms/mariadb.git/commit/?h=f24&id=ff0bbd1d4f677f9987a2065c3db02cfb050d6a90 http://pkgs.fedoraproject.org/cgit/rpms/mariadb.git/commit/?h=f24&id=a8ba852168a3dfbbc05c411789c2bd1ce32c8d88 http://pkgs.fedoraproject.org/cgit/rpms/mariadb.git/commit/?h=f24&id=dac8b4cb1edb15efed4df6ed4eb615467d58a6d3 To me, it seems good enough, but we can improve it if necessary. OK, first check doest not allow running removal on directory, which is not empty (except lost+found), so file age checking is only an improvement, no need to make it perfect. :-) Another related commit for using the sleep correcty (found by Pavel): http://pkgs.fedoraproject.org/cgit/rpms/mariadb.git/commit/?id=d12b2c5e157ddf6e4e0968877155f9ab9ab63da1 mariadb-10.1.14-5.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2016-7591801574 mariadb-10.0.25-2.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2016-a84a3e4936 mariadb-10.0.25-2.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-a84a3e4936 mariadb-10.1.14-5.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-7591801574 mariadb-10.1.16-1.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2016-084f825e7b mariadb-10.0.25-3.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2016-0513b8170f mariadb-10.0.25-2.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2016-a84a3e4936 mariadb-10.1.14-5.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report. mariadb-10.0.25-3.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-0513b8170f mariadb-10.1.16-1.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-084f825e7b mariadb-10.0.25-2.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-a84a3e4936 mariadb-10.1.16-1.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report. mariadb-10.0.25-3.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report. |