Bug 2090016

Summary: block layer: io is issued to low level driver in reversed order when plug is applied
Product: Red Hat Enterprise Linux 9 Reporter: Ming Lei <minlei>
Component: kernelAssignee: Ming Lei <minlei>
kernel sub component: Block Layer QA Contact: ChanghuiZhong <czhong>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: unspecified CC: cwei, czhong, jmoyer, minlei
Version: 9.1Keywords: Triaged
Target Milestone: rc   
Target Release: 9.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-5.14.0-319.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-11-07 08:38:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ming Lei 2022-05-25 00:45:35 UTC
Description of problem:

https://lore.kernel.org/linux-nvme/CAHex0coMNvFa1TPzK+5mZHsiie4d1Jd0Z8ejcZk1Vi1_4F7eRg@mail.gmail.com/T/#t


Version-Release number of selected component (if applicable):


How reproducible:

100%


Steps to Reproduce:

- run libaio with submit/complete batch


Actual results:

io is issued in reverse order


Expected results:

io is issued in sequential order


Additional info:

- v5.18 has the issue, and not fixed in recent linus tree yet
- introduced in the commit: bc490f81731e ("block: change plugging to use a singly linked list")

Comment 4 Ming Lei 2023-05-16 11:10:00 UTC
Hi Changhui,

Can you test this MR so that we can move on?

thanks,

Comment 5 ChanghuiZhong 2023-05-18 08:06:06 UTC
(In reply to Ming Lei from comment #4)
> Hi Changhui,
> 
> Can you test this MR so that we can move on?
> 
> thanks,

Hi,Ming

I got reverse order both on 5.14.0-312.el9 and 5.14.0-298.2356_835743992.el9,

[root@storageqe-101 ~]# uname -r
5.14.0-312.el9.x86_64
[root@storageqe-101 ~]# 
<...>-3030    [014] .......  1475.336100: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=24, cmdid=771, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write slba=192, len=63, ctrl=0x0, dsmgmt=0, reftag=0)
fio-3030    [014] .......  1475.336106: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=24, cmdid=770, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write slba=128, len=63, ctrl=0x0, dsmgmt=0, reftag=0)
fio-3030    [014] .......  1475.336106: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=24, cmdid=769, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write slba=64, len=63, ctrl=0x0, dsmgmt=0, reftag=0)
fio-3030    [014] .......  1475.336107: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=24, cmdid=768, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write slba=0, len=63, ctrl=0x0, dsmgmt=0, reftag=0)

[root@storageqe-101 ~]# uname -r
5.14.0-298.2356_835743992.el9.x86_64
[root@storageqe-101 ~]# 
fio-7637    [002] .......  2490.325932: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=18, cmdid=707, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write slba=192, len=63, ctrl=0x0, dsmgmt=0, reftag=0)
fio-7637    [002] .......  2490.325937: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=18, cmdid=706, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write slba=128, len=63, ctrl=0x0, dsmgmt=0, reftag=0)
fio-7637    [002] .......  2490.325938: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=18, cmdid=705, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write slba=64, len=63, ctrl=0x0, dsmgmt=0, reftag=0)
fio-7637    [002] .......  2490.325939: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=18, cmdid=704, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write slba=0, len=63, ctrl=0x0, dsmgmt=0, reftag=0)
             

             
terminal 1:
cd /sys/kernel/debug/tracing/events/nvme/nvme_setup_cmd
echo 'disk=="nvme0n1"' > filter
echo 1 > enable
cat /sys/kernel/debug/tracing/trace_pipe

termina 2:
echo 32 > /sys/block/nvme0n1/queue/max_sectors_kb
fio --name dbg --filename=/dev/nvme0n1 --rw=write --iodepth=1 --bs=128K --ioengine=libaio --direct=1 --size=1M

is there something wrong with my steps to reproduce?

Thanks,

Comment 6 Ming Lei 2023-05-22 14:22:36 UTC
(In reply to ChanghuiZhong from comment #5)
> (In reply to Ming Lei from comment #4)
> > Hi Changhui,
> > 
> > Can you test this MR so that we can move on?
> > 
> > thanks,
> 
> Hi,Ming
> 
> I got reverse order both on 5.14.0-312.el9 and 5.14.0-298.2356_835743992.el9,
> 
> [root@storageqe-101 ~]# uname -r
> 5.14.0-312.el9.x86_64
> [root@storageqe-101 ~]# 
> <...>-3030    [014] .......  1475.336100: nvme_setup_cmd: nvme0:
> disk=nvme0n1, qid=24, cmdid=771, nsid=1, flags=0x0, meta=0x0,
> cmd=(nvme_cmd_write slba=192, len=63, ctrl=0x0, dsmgmt=0, reftag=0)
> fio-3030    [014] .......  1475.336106: nvme_setup_cmd: nvme0: disk=nvme0n1,
> qid=24, cmdid=770, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write
> slba=128, len=63, ctrl=0x0, dsmgmt=0, reftag=0)
> fio-3030    [014] .......  1475.336106: nvme_setup_cmd: nvme0: disk=nvme0n1,
> qid=24, cmdid=769, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write slba=64,
> len=63, ctrl=0x0, dsmgmt=0, reftag=0)
> fio-3030    [014] .......  1475.336107: nvme_setup_cmd: nvme0: disk=nvme0n1,
> qid=24, cmdid=768, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write slba=0,
> len=63, ctrl=0x0, dsmgmt=0, reftag=0)
> 
> [root@storageqe-101 ~]# uname -r
> 5.14.0-298.2356_835743992.el9.x86_64
> [root@storageqe-101 ~]# 
> fio-7637    [002] .......  2490.325932: nvme_setup_cmd: nvme0: disk=nvme0n1,
> qid=18, cmdid=707, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write
> slba=192, len=63, ctrl=0x0, dsmgmt=0, reftag=0)
> fio-7637    [002] .......  2490.325937: nvme_setup_cmd: nvme0: disk=nvme0n1,
> qid=18, cmdid=706, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write
> slba=128, len=63, ctrl=0x0, dsmgmt=0, reftag=0)
> fio-7637    [002] .......  2490.325938: nvme_setup_cmd: nvme0: disk=nvme0n1,
> qid=18, cmdid=705, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write slba=64,
> len=63, ctrl=0x0, dsmgmt=0, reftag=0)
> fio-7637    [002] .......  2490.325939: nvme_setup_cmd: nvme0: disk=nvme0n1,
> qid=18, cmdid=704, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write slba=0,
> len=63, ctrl=0x0, dsmgmt=0, reftag=0)
>              
> 
>              
> terminal 1:
> cd /sys/kernel/debug/tracing/events/nvme/nvme_setup_cmd
> echo 'disk=="nvme0n1"' > filter
> echo 1 > enable
> cat /sys/kernel/debug/tracing/trace_pipe
> 
> termina 2:
> echo 32 > /sys/block/nvme0n1/queue/max_sectors_kb
> fio --name dbg --filename=/dev/nvme0n1 --rw=write --iodepth=1 --bs=128K
> --ioengine=libaio --direct=1 --size=1M
> 
> is there something wrong with my steps to reproduce?

You are testing split io order, and this MR addresses request order when flushing
plug list, and the two aren't different.

Link [1] mentioned that:

Submitting requests in reverse order has negative impact on
performance for rotational disks (when BFQ is not in use). We observe
10-25% regression in random 4k write throughput, as well as ~20%
regression in MariaDB OLTP benchmark on rotational storage on btrfs
filesystem.


[1] https://lore.kernel.org/all/20230313093002.11756-1-jack@suse.cz/
[PATCH] block: do not reverse request order when flushing plug list

Comment 13 errata-xmlrpc 2023-11-07 08:38:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: kernel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6583