Bug 2018882 - io-uring loosing write requests
Summary: io-uring loosing write requests
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 34
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-11-01 06:16 UTC by Daniel Black
Modified: 2022-06-07 22:50 UTC (History)
20 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2022-06-07 22:50:21 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
dmesg-5.14.15.txt (low utility) (94.86 KB, text/plain)
2021-11-01 06:16 UTC, Daniel Black
no flags Details

Description Daniel Black 2021-11-01 06:16:41 UTC
Created attachment 1838803 [details]
dmesg-5.14.15.txt (low utility)

1. Please describe the problem:

While using the kernel IO-uring interface, a write request is lost resulting in MariaDB asserting after 10 minutes because it is never received.

2. What is the Version-Release number of the kernel:

5.14.15-200.fc34.x86_64 and previously
5.14.14-200.fc34.x86_64

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

noticed in 5.13.16-200.fc34.x86_64 (MDEV-26555) as faulty.

This was working at some stage as I was building MariaDB-10.6 and testing frequently on fc33 and fc34 without incident.

Generally repeatable on other distros, 5.11 appears unaffected. Sometime after that.

5.15-rc kernel, it is possible to produce sometimes, much less reliable however.

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:


Yes.

https://jira.mariadb.org/browse/MDEV-26555
https://jira.mariadb.org/browse/MDEV-26674

Marko from MariaDB has validated the user space track traces are missing write requests.

In these MDEV I've tested against a variety of distro and locally built liburing without differing test results.

I have started engagement upstream - https://marc.info/?l=linux-block&m=163489378723217&w=2

A recent build set from our CI:

https://ci.mariadb.org/19583/amd64-fedora-34-rpm-autobake/rpms/

MariaDB-server and MariaDB-test (might need client,common and shared).

Validate that ldd /usr/sbin/mariadbd includes liburing.

To test run:

cd /usr/share/mysql/mysql-test
./mtr --vardir=/tmp/var   --parallel=4 encryption.innochecksum{,,,,,}
 ./mtr --vardir=/tmp/var   --parallel=4 stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb stress.ddl_innodb

A test failure after timeout (10 min) results in the
mariadb error:

2021-10-21  9:08:43 0 [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch. Please refer to https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/


More complete example: https://marc.info/?l=linux-block&m=163516012119400&w=2


MariaDB-10.6.5+ (due out soon has a kernel check that disables native_aio by default and issues warning if forced) unit test:

$  mysql-test/mtr  --mysqld=--innodb_use_native_aio=1 --nowarnings   --parallel=4 --force encryption.innochecksum{,,,,,}


5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

Did cursory test on 5.15.0-0.rc7.20211028git1fc596a56b33.56.fc36.x86_64 without incident however will test gain.

6. Are you running any modules that not shipped with directly Fedora's kernel?:

No.

Comment 1 Daniel Black 2021-11-06 03:35:10 UTC
tested unit tests and sysbench oltp_update_index and was unable to reproduce on 5.15.0-0.rc7.20211028git1fc596a56b33.56.fc36.x86_64

Comment 3 Daniel Black 2021-11-24 02:56:30 UTC
Still producing this:

$ uname -a
Linux localhost.localdomain 5.14.20-200.fc34.x86_64 #1 SMP Thu Nov 18 22:03:20 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux


Name         : kernel
Version      : 5.14.20
Release      : 200.fc34
Architecture : x86_64
Size         : 0.0  
Source       : kernel-5.14.20-200.fc34.src.rpm

Thought it was a 5.14.19 fix
https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.14.19

Will follow up with upstream.

$  mysql-test/mtr  --mysqld=--innodb_use_native_aio=1 --nowarnings   --parallel=4 --force encryption.innochecksum{,,,,,}
Logging: /home/dan/repos/mariadb-server-10.6/mysql-test/mariadb-test-run.pl  --mysqld=--innodb_use_native_aio=1 --nowarnings --parallel=4 --force encryption.innochecksum encryption.innochecksum encryption.innochecksum encryption.innochecksum encryption.innochecksum encryption.innochecksum
VS config: 
vardir: /home/dan/repos/build-mariadb-server-10.6/mysql-test/var
Checking leftover processes...
Removing old var directory...
Creating var directory '/home/dan/repos/build-mariadb-server-10.6/mysql-test/var'...
Checking supported features...
MariaDB Version 10.6.6-MariaDB
 - SSL connections supported
 - binaries built with wsrep patch
Collecting tests...
Installing system database...

==============================================================================

TEST                                  WORKER RESULT   TIME (ms) or COMMENT
--------------------------------------------------------------------------

worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019
worker[2] Using MTR_BUILD_THREAD 301, with reserved ports 16020..16039
worker[3] Using MTR_BUILD_THREAD 302, with reserved ports 16040..16059
worker[4] Using MTR_BUILD_THREAD 303, with reserved ports 16060..16079
encryption.innochecksum '16k,cbc,innodb,strict_crc32' w1 [ pass ]   5797
encryption.innochecksum '16k,cbc,innodb,strict_crc32' w2 [ pass ]   5812

....
encryption.innochecksum '16k,ctr,innodb,strict_crc32' w4 [ fail ]
        Test ended at 2021-11-24 12:53:17
CURRENT_TEST: encryption.innochecksum
mysqltest: At line 40: query 'INSERT INTO t2 SELECT * FROM t1' failed: <Unknown> (2013): Lost connection to server during query

The result from queries just before the failure was:
SET GLOBAL innodb_file_per_table = ON;
set global innodb_compression_algorithm = 1;
# Create and populate a tables
CREATE TABLE t1 (a INT AUTO_INCREMENT PRIMARY KEY, b TEXT) ENGINE=InnoDB ENCRYPTED=YES ENCRYPTION_KEY_ID=4;
CREATE TABLE t2 (a INT AUTO_INCREMENT PRIMARY KEY, b TEXT) ENGINE=InnoDB ROW_FORMAT=COMPRESSED ENCRYPTED=YES ENCRYPTION_KEY_ID=4;
CREATE TABLE t3 (a INT AUTO_INCREMENT PRIMARY KEY, b TEXT) ENGINE=InnoDB ROW_FORMAT=COMPRESSED ENCRYPTED=NO;
CREATE TABLE t4 (a INT AUTO_INCREMENT PRIMARY KEY, b TEXT) ENGINE=InnoDB PAGE_COMPRESSED=1;
CREATE TABLE t5 (a INT AUTO_INCREMENT PRIMARY KEY, b TEXT) ENGINE=InnoDB PAGE_COMPRESSED=1 ENCRYPTED=YES ENCRYPTION_KEY_ID=4;
CREATE TABLE t6 (a INT AUTO_INCREMENT PRIMARY KEY, b TEXT) ENGINE=InnoDB;


Server [mysqld.1 - pid: 79580, winpid: 79580, exit: 256] failed during test run
Server log from this test:
----------SERVER LOG START-----------
2021-11-24 12:53:16 0 [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch. Please refer to https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/
211124 12:53:16 [ERROR] mysqld got signal 6 ;

Comment 4 Daniel Black 2021-11-25 03:03:06 UTC
one final patch needed to make this happen for 5.14 - https://lore.kernel.org/linux-block/CABVffEOXe=mhyW_-Ynz4Z9g_UxvVAms662vQjN9UBfF9NhWu8g@mail.gmail.com/T/#m480893c8e4f5f007f03f8505b404c701c0e90d2d

With the stable series for 5.14.20 finished, if further 5.14 kernels are coming then including this patch is recommended.

Otherwise a 5.15.3+ kernel is also sufficient to close this.

Comment 5 Justin M. Forbes 2021-11-27 15:56:53 UTC
(In reply to Daniel Black from comment #4)
> 
> With the stable series for 5.14.20 finished, if further 5.14 kernels are
> coming then including this patch is recommended.
> 
> Otherwise a 5.15.3+ kernel is also sufficient to close this.

All Fedora releases are on 5.15.4+ at this point if someone would like to verify the fix and close this.

Comment 6 Daniel Black 2022-02-04 07:22:19 UTC
yes, sorry for delay.


all kernel fixes in 5.15.4+ are good.

Comment 7 Ben Cotton 2022-05-12 15:56:23 UTC
This message is a reminder that Fedora Linux 34 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 34 on 2022-06-07.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '34'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 34 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 8 Ben Cotton 2022-06-07 22:50:21 UTC
Fedora Linux 34 entered end-of-life (EOL) status on 2022-06-07.

Fedora Linux 34 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.