1953675 – Akonadi does not start due to DB failure

Bug 1953675 - Akonadi does not start due to DB failure

Summary: Akonadi does not start due to DB failure

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kf5-akonadi-server
Sub Component:
Version:	34
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Daniel Vrátil
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:	https://fedoraproject.org/wiki/Common...
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-04-26 15:51 UTC by Ben Cotton
Modified:	2022-06-07 23:54 UTC (History)
CC List:	18 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2022-06-07 23:54:09 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Ben Cotton 2021-04-26 15:51:22 UTC

Initially reported on Reddit:
https://www.reddit.com/r/Fedora/comments/myrj52/akonadi_fails_to_start_after_upgrade_to_f34_kde/

`akonadictl start` fails. It appears because it expects the database to use InnoDB and MariaDB doesn't know what it's talking about:

bcotton@fpgm ~> cat ~/.local/share/akonadi/db_data/mysql.err
2021-04-26 11:23:22 0 [Note] InnoDB: Uses event mutexes
2021-04-26 11:23:22 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2021-04-26 11:23:22 0 [Note] InnoDB: Number of pools: 1
2021-04-26 11:23:22 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
2021-04-26 11:23:22 0 [Note] mysqld: O_TMPFILE is not supported on /var/tmp (disabling future attempts)
2021-04-26 11:23:22 0 [Note] InnoDB: Using Linux native AIO
2021-04-26 11:23:22 0 [Note] InnoDB: Initializing buffer pool, total size = 134217728, chunk size = 134217728
2021-04-26 11:23:22 0 [Note] InnoDB: Completed initialization of buffer pool
2021-04-26 11:23:22 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2021-04-26 11:23:22 0 [ERROR] InnoDB: Upgrade after a crash is not supported. The redo log was created with MariaDB 10.4.14.
2021-04-26 11:23:22 0 [ERROR] InnoDB: Plugin initialization aborted with error Generic error
2021-04-26 11:23:22 0 [Note] InnoDB: Starting shutdown...
2021-04-26 11:23:22 0 [ERROR] Plugin 'InnoDB' init function returned error.
2021-04-26 11:23:22 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
2021-04-26 11:23:22 0 [Note] Plugin 'FEEDBACK' is disabled.
2021-04-26 11:23:22 0 [ERROR] Unknown/unsupported storage engine: innodb
2021-04-26 11:23:22 0 [ERROR] Aborting

Upstream lists some innodb-related changes between versions 10.4 (F33, works) and 10.5 (F34, broken):
https://mariadb.com/kb/en/upgrading-from-mariadb-104-to-mariadb-105/#incompatible-changes-between-104-and-105

It's not clear to me if that's relevant. The MariaDB 10.5 Change proposal does not explicitly mention anything innodb-related: https://fedoraproject.org/wiki/Changes/MariaDB_10.5

Comment 1 Ben Cotton 2021-04-26 16:06:02 UTC

This appears to be an upgrade issue. Moving `~/.config/share/akonadi` and running `akonadictl start` allows akonadi to launch successfully. It also works fine from a new F34 install.

Curiously, the contents of `~/.config/share/akonadi/mysql.conf` are identical between the old and newly-created directories.

Comment 2 Patrick Barnes 2021-04-29 20:15:20 UTC

It doesn't look like a change to the InnoDB support. It looks like the Akonadi database instance didn't shut down cleanly before or during the upgrade, so now it's dirty and MariaDB is refusing to replay the redo log because of the version change. That error prevents the InnoDB engine from loading, resulting in the "Unknown/unsupported storage engine" error.

Comment 3 Michael Carney 2021-04-30 15:14:36 UTC

That sounds right. I renamed the ib log file in .local/share/akonadi/db_data, set innodb_force_recovery=1 in
.local/share/akonadi/mysql.conf, and  started MariaDB through akonadictl. I then ran mysql_upgrade with the socket
that akonadictl opened, and akonadi seems completely happy now.

Comment 4 Tomáš Trnka 2021-05-04 19:00:11 UTC

I also hit this after upgrading F33→F34. Checking ~/.local/share/akonadi/db_data/mysql.err.old revealed that the DB shutdown before the upgrade was unclean:

2021-05-01 20:03:11 0 [Note] InnoDB: Buffer pool(s) load completed at 210501 20:03:11       ← last startup on F33
2021-05-02 15:08:02 0 [Note] /usr/libexec/mysqld (initiated by: unknown): Normal shutdown   ← start of shutdown sequence
2021-05-02 15:08:02 0 [Note] InnoDB: FTS optimize thread exiting.
2021-05-02 20:55:18 0 [Note] InnoDB: Uses event mutexes                                     ← startup on F34

Before the upgrade, I normally logged out from Plasma, logged in as root on tty and ran dnf system-upgrade download+reboot there, so there was really no good reason for the DB not to exit cleanly. Looking further back in my mysql.err.old revealed shockingly many unclean shutdowns. It looks like something is hard-killing mysqld before it manages to finish the whole shutdown sequence (which takes just a few seconds at most). Perhaps systemd or something is cleaning up leftover processes a bit too early or something. This is a separate bug that needs further investigation.

Anyway, so InnoDB managed to recover from unclean shutdown dozens of times on F33, but the 10.5 version on F34 doesn't know how to read the redo log written by 10.4 due to a format change, so it is absolutely crucial to shut down 10.4 cleanly before the upgrade. Of course one could try to just nuke the log as suggested in Comment #3 and hope for the best, but given that the redo log was truly dirty (or 10.5 would have started normally), there will be some data loss. It might be completely inconsequential (perhaps all the changes in the log were already written to the main data file and only a commit record was missing, or the only uncommitted change was a message getting marked as read) or it might lead to major corruption of the database and render it unusable. There's no way to be sure in advance, so I certainly wouldn't recommend trying that approach.

Instead, a clean way to recover is to temporarily downgrade to 10.4, recover the log, shutdown cleanly and upgrade back to 10.5:

1. $ akonadictl stop
2. # dnf module switch-to mariadb:10.4
3. $ akonadictl start
4. $ akonadictl stop

5. Now check ~/.local/share/akonadi/db_data/mysql.err to confirm the database shut down cleanly:

[Note] InnoDB: Starting crash recovery from checkpoint LSN=42740240443
…
[Note] InnoDB: 10.4.18 started; log sequence number 42740240452; transaction id 92259863
[Note] /usr/libexec/mysqld: ready for connections.
Version: '10.4.18-MariaDB'  socket: '/run/user/500/akonadi/mysql.socket'  port: 0  MariaDB Server
…
[Note] /usr/libexec/mysqld (initiated by: unknown): Normal shutdown
[Note] InnoDB: FTS optimize thread exiting.
[Note] InnoDB: Starting shutdown...
…
[Note] InnoDB: Shutdown completed; log sequence number 42740296470; transaction id 92259875
[Note] /usr/libexec/mysqld: Shutdown complete

6. # dnf module reset mariadb
7. # dnf upgrade mariadb-*
8. $ akonadictl start

After this, MariaDB 10.5 will do the necessary modifications to the log and start up cleanly, as shown in mysql.err:

[Note] InnoDB: Upgrading redo log: 67108864 bytes; LSN=42740296482
[Note] InnoDB: Starting to delete and rewrite log file.
[Note] InnoDB: Setting log file ./ib_logfile101 size to 67108864 bytes
[Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0
[Note] InnoDB: New log file created, LSN=42740296482
…
[Note] InnoDB: 10.5.9 started; log sequence number 42740296470; transaction id 92259876

Note that if the affected machine also uses MariaDB for other purposes than just the Akonadi database (such as a standalone SQL server instance), you should consider stopping all such instances before attempting the downgrade to 10.4. It shouldn't be strictly necessary (10.4 will refuse to start with data that was already upgraded to 10.5), but it certainly won't hurt.

Comment 5 Ben Cotton 2021-05-10 15:11:57 UTC

@me have you had a chance to look into this?

Comment 6 Ben Cotton 2021-05-14 14:28:45 UTC

Nominating as a Prioritized Bug: https://docs.fedoraproject.org/en-US/program_management/prioritized_bugs/

Comment 7 Ben Cotton 2021-05-19 16:04:00 UTC

This was accepted as a Prioritized Bug in today's meeting.

Comment 8 Rex Dieter 2021-07-14 13:58:37 UTC

Comment #3 and Comment #4
Hit the nail on the head.  So there's 2 issues (at least):

1.  akonadi'd db shutdown is unclean sometimes (arguably too often).  Sometimes there's little to do if users experience session crashes.

2.  If (1) occurs for the last akonadi shutdown prior to f33->f34 upgrade, then manual process to fix things is required.

This bug is mostly about item 2.  comments 3 and 4 give a general outline of how to do the fixing, and I only know enough maria/mysql-fu to be dangerous... will likely need some input from maria db maintainer(s) (or any proxy for a more knowledgeable maria person than I).

Comment 9 Rex Dieter 2021-07-26 17:12:15 UTC

(temporarily) assigning to mariadb, hoping to get some feedback for how to handle mariadb db upgrades after f33->f34 is done (and migration from f33 content was not yet successful).

Comment 10 Lukas Javorsky 2021-07-27 08:29:31 UTC

Hi, we do upgrade documentation for the RHEL releases on how to upgrade the MariaDB server (it's the same for the Fedora), so you could check if you did those steps correctly.

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/deploying_different_types_of_servers/using-databases#proc_upgrading-from-mariadb-10-3-to-mariadb-10-5_using-mariadb

Comment 11 Michal Schorm 2021-07-27 09:19:27 UTC

(In reply to Rex Dieter from comment #9)
> (temporarily) assigning to mariadb, hoping to get some feedback

Thanks for the heads-up


(In reply to Ben Cotton from comment #0)
> [ERROR] InnoDB: Upgrade after a crash is not supported. The redo log was created with MariaDB 10.4.14.
> ...
> [ERROR] Unknown/unsupported storage engine: innodb
> [ERROR] Aborting

Since "Upgrade after a crash is not supported", we have to avoid that completely.


(In reply to Rex Dieter from comment #8)
> 1.  akonadi'd db shutdown is unclean sometimes

That's Akonadi problem, I'd say, and since this is a long term recurring issue, I'd suggest Akonadi developers to reach following MariaDB's upstream community mailing list which consist of both proficient users and the developers:
  https://launchpad.net/~maria-discuss


(In reply to Rex Dieter from comment #8)
> 2.  If (1) occurs for the last akonadi shutdown prior to f33->f34 upgrade,
> then manual process to fix things is required.

The Fedora Linux version upgrade after Akonadi's MariaDB DB unclean InnoDB's SE shutdown is something we can look into.
I don't think we can now and here *swiftly* solve why the Akonadi have unclean DB shutdowns in the first place. However we can try to workaround the consequences.

Among the first of my thoughts is whether we could "clean up" the shutdown during %pretrans or some phase like that?
Would it be possible to restart Akonadi during KDE GUI user session running?

Are we able to reproduce generating DB unclean shutdowns to test or verify any potential solution ?

Comment 12 Ben Cotton 2021-08-11 15:41:15 UTC

(In reply to Michal Schorm from comment #11)

> Among the first of my thoughts is whether we could "clean up" the shutdown
> during %pretrans or some phase like that?

Since the DB is in users' home directories, I'm not sure how feasible it would be to catch this in an RPM transaction, but I'll let @rdieter or another member of the KDE SIG weigh in on that.

Comment 13 Michal Schorm 2021-08-11 16:05:29 UTC

It is always unfortunate when you (your package) have to tinker with anything under the user's hands.
By that I mean to tinker with content that is somehow expected to be managed by the user.
E.g. it is usually okay to fix or modify the application, but not okay to modify the data it produced or the configuration user was expected to customize.

However, if I assume, that users are NOT expected to tinker with Akonadi and it's back-end;
assume that it is a service that users just expect to be running fine, without any user involvement (or knowledge it exists),
then I would say, it is justified to fix such bugs even by intervention in the user home directory.

I have to add that I do not work in the area of GUI and users home directories. I do not have the experience nor confidence to fully defend this approach as the best one.
Here is a bit from the Fedora Packaging Guidelines which hints of what may go wrong:

| /home/$USER as users can arbitrarily modify the files in their home directories
| and rpm packages that modify those files run the risk of destroying user data.
| Additionally, sites may be nfs mounting /home on many different types of systems
| or even network automounting them on demand. Modifying files in home directories
| that are thus configured will have negative effects in both of these situations.

https://docs.fedoraproject.org/en-US/packaging-guidelines/#_no_files_or_directories_under_srv_usrlocal_or_homeuser

We might ask experienced packagers (e.g. proven-packagers ?) for their counselings.
If the only question would be "how" to do it, we also may ask on fedora-devel mailing list. (Though I don't think we have even decided what exactly we want to do yet)

Comment 14 Ben Cotton 2021-08-25 15:37:09 UTC

In today's Prioritized Bugs meeting, we decided to remove this as a Prioritized Bug. The cause is sufficiently unknown and mitigation too complex to fix downstream. This should be handled more robustly upstream. https://meetbot.fedoraproject.org/fedora-meeting-1/2021/fedora-meeting-1.2021-08-25-15.00.html

Comment 15 Michal Schorm 2021-08-30 09:35:51 UTC

Assigning back to 'kf5-akonadi-server'.
I don't have anything more to say to it for now.

Comment 16 Ben Cotton 2022-05-12 16:11:58 UTC

This message is a reminder that Fedora Linux 34 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 34 on 2022-06-07.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '34'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 34 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 17 Ben Cotton 2022-06-07 23:54:09 UTC

Fedora Linux 34 entered end-of-life (EOL) status on 2022-06-07.

Fedora Linux 34 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.