Bug 1953675
Summary: | Akonadi does not start due to DB failure | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Ben Cotton <bcotton> |
Component: | kf5-akonadi-server | Assignee: | Daniel Vrátil <me> |
Status: | CLOSED EOL | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 34 | CC: | awilliam, cberlinger, damien.ciabrini, hhorak, jreznik, kde-sig, ljavorsk, mbayer, me, mkocka, mmuzila, mschorm, mwc, ngompa13, patrick, sgrubb, SpikeFedora, tomastrnka |
Target Milestone: | --- | Keywords: | CommonBugs |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | https://fedoraproject.org/wiki/Common_F34_bugs#akonadi-upgrade-fail | ||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-06-07 23:54:09 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Ben Cotton
2021-04-26 15:51:22 UTC
This appears to be an upgrade issue. Moving `~/.config/share/akonadi` and running `akonadictl start` allows akonadi to launch successfully. It also works fine from a new F34 install. Curiously, the contents of `~/.config/share/akonadi/mysql.conf` are identical between the old and newly-created directories. It doesn't look like a change to the InnoDB support. It looks like the Akonadi database instance didn't shut down cleanly before or during the upgrade, so now it's dirty and MariaDB is refusing to replay the redo log because of the version change. That error prevents the InnoDB engine from loading, resulting in the "Unknown/unsupported storage engine" error. That sounds right. I renamed the ib log file in .local/share/akonadi/db_data, set innodb_force_recovery=1 in .local/share/akonadi/mysql.conf, and started MariaDB through akonadictl. I then ran mysql_upgrade with the socket that akonadictl opened, and akonadi seems completely happy now. I also hit this after upgrading F33→F34. Checking ~/.local/share/akonadi/db_data/mysql.err.old revealed that the DB shutdown before the upgrade was unclean: 2021-05-01 20:03:11 0 [Note] InnoDB: Buffer pool(s) load completed at 210501 20:03:11 ← last startup on F33 2021-05-02 15:08:02 0 [Note] /usr/libexec/mysqld (initiated by: unknown): Normal shutdown ← start of shutdown sequence 2021-05-02 15:08:02 0 [Note] InnoDB: FTS optimize thread exiting. 2021-05-02 20:55:18 0 [Note] InnoDB: Uses event mutexes ← startup on F34 Before the upgrade, I normally logged out from Plasma, logged in as root on tty and ran dnf system-upgrade download+reboot there, so there was really no good reason for the DB not to exit cleanly. Looking further back in my mysql.err.old revealed shockingly many unclean shutdowns. It looks like something is hard-killing mysqld before it manages to finish the whole shutdown sequence (which takes just a few seconds at most). Perhaps systemd or something is cleaning up leftover processes a bit too early or something. This is a separate bug that needs further investigation. Anyway, so InnoDB managed to recover from unclean shutdown dozens of times on F33, but the 10.5 version on F34 doesn't know how to read the redo log written by 10.4 due to a format change, so it is absolutely crucial to shut down 10.4 cleanly before the upgrade. Of course one could try to just nuke the log as suggested in Comment #3 and hope for the best, but given that the redo log was truly dirty (or 10.5 would have started normally), there will be some data loss. It might be completely inconsequential (perhaps all the changes in the log were already written to the main data file and only a commit record was missing, or the only uncommitted change was a message getting marked as read) or it might lead to major corruption of the database and render it unusable. There's no way to be sure in advance, so I certainly wouldn't recommend trying that approach. Instead, a clean way to recover is to temporarily downgrade to 10.4, recover the log, shutdown cleanly and upgrade back to 10.5: 1. $ akonadictl stop 2. # dnf module switch-to mariadb:10.4 3. $ akonadictl start 4. $ akonadictl stop 5. Now check ~/.local/share/akonadi/db_data/mysql.err to confirm the database shut down cleanly: [Note] InnoDB: Starting crash recovery from checkpoint LSN=42740240443 … [Note] InnoDB: 10.4.18 started; log sequence number 42740240452; transaction id 92259863 [Note] /usr/libexec/mysqld: ready for connections. Version: '10.4.18-MariaDB' socket: '/run/user/500/akonadi/mysql.socket' port: 0 MariaDB Server … [Note] /usr/libexec/mysqld (initiated by: unknown): Normal shutdown [Note] InnoDB: FTS optimize thread exiting. [Note] InnoDB: Starting shutdown... … [Note] InnoDB: Shutdown completed; log sequence number 42740296470; transaction id 92259875 [Note] /usr/libexec/mysqld: Shutdown complete 6. # dnf module reset mariadb 7. # dnf upgrade mariadb-* 8. $ akonadictl start After this, MariaDB 10.5 will do the necessary modifications to the log and start up cleanly, as shown in mysql.err: [Note] InnoDB: Upgrading redo log: 67108864 bytes; LSN=42740296482 [Note] InnoDB: Starting to delete and rewrite log file. [Note] InnoDB: Setting log file ./ib_logfile101 size to 67108864 bytes [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0 [Note] InnoDB: New log file created, LSN=42740296482 … [Note] InnoDB: 10.5.9 started; log sequence number 42740296470; transaction id 92259876 Note that if the affected machine also uses MariaDB for other purposes than just the Akonadi database (such as a standalone SQL server instance), you should consider stopping all such instances before attempting the downgrade to 10.4. It shouldn't be strictly necessary (10.4 will refuse to start with data that was already upgraded to 10.5), but it certainly won't hurt. @me have you had a chance to look into this? Nominating as a Prioritized Bug: https://docs.fedoraproject.org/en-US/program_management/prioritized_bugs/ This was accepted as a Prioritized Bug in today's meeting. Comment #3 and Comment #4 Hit the nail on the head. So there's 2 issues (at least): 1. akonadi'd db shutdown is unclean sometimes (arguably too often). Sometimes there's little to do if users experience session crashes. 2. If (1) occurs for the last akonadi shutdown prior to f33->f34 upgrade, then manual process to fix things is required. This bug is mostly about item 2. comments 3 and 4 give a general outline of how to do the fixing, and I only know enough maria/mysql-fu to be dangerous... will likely need some input from maria db maintainer(s) (or any proxy for a more knowledgeable maria person than I). (temporarily) assigning to mariadb, hoping to get some feedback for how to handle mariadb db upgrades after f33->f34 is done (and migration from f33 content was not yet successful). Hi, we do upgrade documentation for the RHEL releases on how to upgrade the MariaDB server (it's the same for the Fedora), so you could check if you did those steps correctly. https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/deploying_different_types_of_servers/using-databases#proc_upgrading-from-mariadb-10-3-to-mariadb-10-5_using-mariadb (In reply to Rex Dieter from comment #9) > (temporarily) assigning to mariadb, hoping to get some feedback Thanks for the heads-up (In reply to Ben Cotton from comment #0) > [ERROR] InnoDB: Upgrade after a crash is not supported. The redo log was created with MariaDB 10.4.14. > ... > [ERROR] Unknown/unsupported storage engine: innodb > [ERROR] Aborting Since "Upgrade after a crash is not supported", we have to avoid that completely. (In reply to Rex Dieter from comment #8) > 1. akonadi'd db shutdown is unclean sometimes That's Akonadi problem, I'd say, and since this is a long term recurring issue, I'd suggest Akonadi developers to reach following MariaDB's upstream community mailing list which consist of both proficient users and the developers: https://launchpad.net/~maria-discuss (In reply to Rex Dieter from comment #8) > 2. If (1) occurs for the last akonadi shutdown prior to f33->f34 upgrade, > then manual process to fix things is required. The Fedora Linux version upgrade after Akonadi's MariaDB DB unclean InnoDB's SE shutdown is something we can look into. I don't think we can now and here *swiftly* solve why the Akonadi have unclean DB shutdowns in the first place. However we can try to workaround the consequences. Among the first of my thoughts is whether we could "clean up" the shutdown during %pretrans or some phase like that? Would it be possible to restart Akonadi during KDE GUI user session running? Are we able to reproduce generating DB unclean shutdowns to test or verify any potential solution ? (In reply to Michal Schorm from comment #11) > Among the first of my thoughts is whether we could "clean up" the shutdown > during %pretrans or some phase like that? Since the DB is in users' home directories, I'm not sure how feasible it would be to catch this in an RPM transaction, but I'll let @rdieter or another member of the KDE SIG weigh in on that. It is always unfortunate when you (your package) have to tinker with anything under the user's hands. By that I mean to tinker with content that is somehow expected to be managed by the user. E.g. it is usually okay to fix or modify the application, but not okay to modify the data it produced or the configuration user was expected to customize. However, if I assume, that users are NOT expected to tinker with Akonadi and it's back-end; assume that it is a service that users just expect to be running fine, without any user involvement (or knowledge it exists), then I would say, it is justified to fix such bugs even by intervention in the user home directory. I have to add that I do not work in the area of GUI and users home directories. I do not have the experience nor confidence to fully defend this approach as the best one. Here is a bit from the Fedora Packaging Guidelines which hints of what may go wrong: | /home/$USER as users can arbitrarily modify the files in their home directories | and rpm packages that modify those files run the risk of destroying user data. | Additionally, sites may be nfs mounting /home on many different types of systems | or even network automounting them on demand. Modifying files in home directories | that are thus configured will have negative effects in both of these situations. https://docs.fedoraproject.org/en-US/packaging-guidelines/#_no_files_or_directories_under_srv_usrlocal_or_homeuser We might ask experienced packagers (e.g. proven-packagers ?) for their counselings. If the only question would be "how" to do it, we also may ask on fedora-devel mailing list. (Though I don't think we have even decided what exactly we want to do yet) In today's Prioritized Bugs meeting, we decided to remove this as a Prioritized Bug. The cause is sufficiently unknown and mitigation too complex to fix downstream. This should be handled more robustly upstream. https://meetbot.fedoraproject.org/fedora-meeting-1/2021/fedora-meeting-1.2021-08-25-15.00.html Assigning back to 'kf5-akonadi-server'. I don't have anything more to say to it for now. This message is a reminder that Fedora Linux 34 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora Linux 34 on 2022-06-07. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a 'version' of '34'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, change the 'version' to a later Fedora Linux version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora Linux 34 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora Linux, you are encouraged to change the 'version' to a later version prior to this bug being closed. Fedora Linux 34 entered end-of-life (EOL) status on 2022-06-07. Fedora Linux 34 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. Thank you for reporting this bug and we are sorry it could not be fixed. |