2247140 – RHEL9 random write performance significantly less than RHEL8

Bug 2247140 - RHEL9 random write performance significantly less than RHEL8

Summary: RHEL9 random write performance significantly less than RHEL8

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	NVMeOF
Sub Component:
Version:	7.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	7.1
Assignee:	Aviv Caro
QA Contact:	Paul Cuzner
Docs Contact:	ceph-doc-bot
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2023-10-30 21:58 UTC by Paul Cuzner
Modified:	2024-10-12 04:25 UTC (History)
CC List:	6 users (show)
Fixed In Version:	ceph-18.2.1-157.el9cp
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2024-06-13 14:22:40 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
rhel8 log captures (90.57 KB, application/gzip) 2023-10-30 22:12 UTC, Paul Cuzner	no flags	Details
rhel9 log captures (87.78 KB, application/gzip) 2023-10-30 22:13 UTC, Paul Cuzner	no flags	Details
fio output from RHEL8 client (8.20 KB, text/plain) 2023-10-30 22:14 UTC, Paul Cuzner	no flags	Details
fio output form thel9 client (8.07 KB, text/plain) 2023-10-30 22:14 UTC, Paul Cuzner	no flags	Details
tcpdumps from rhel8 and rhel9 during connect (12.85 MB, application/gzip) 2023-11-01 01:43 UTC, Paul Cuzner	no flags	Details
ceph nvmeof gw conf file (1.20 KB, text/plain) 2023-11-01 02:17 UTC, Paul Cuzner	no flags	Details
output from get_bdevs (78.33 KB, text/plain) 2023-11-01 02:28 UTC, Paul Cuzner	no flags	Details
nvme list output from rhel9 (3.74 KB, text/plain) 2023-11-01 02:31 UTC, Paul Cuzner	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHCEPH-7820	0	None	None	None	2023-10-30 22:01:41 UTC
Red Hat Product Errata	RHSA-2024:3925	0	None	None	None	2024-06-13 14:22:43 UTC

Description Paul Cuzner 2023-10-30 21:58:43 UTC

Description of problem:
With a random write workload, RHEL8 delivers approx 3x the IOPS as the same workload on RHEL9. ESXi also sees this drop in performance.

Version-Release number of selected component (if applicable):
7.0

How reproducible:
Every time

Steps to Reproduce:
1. Create RHEL8.8 and RHEL9.2 clients connecting to different nvmeof subsystems (same gateway)
2. Provide 8 namespaces to each client
3. use fio to run random write workload at 4KB blocksize across all namespaces concurrently - with each namespace loaded to qdepth of 128
4. Review fio results file

Actual results:
Examples
RHEL8 IOPS = 313,924
RHEL9 IOPS = 134,174


Expected results:
For the same workload, variance is expected but not to this degree. Normal expecation would be 5-10%.


Additional info:

Comment 1 Paul Cuzner 2023-10-30 22:03:17 UTC

To help identify the issue, output from the following commands have been requested.
1. during the connect:
   - tcpdump.

2. shortly after the connect:
   - dmesg -T
   - nvmf_get_transports 
   - nvmf_get_subsystems
   - rpc_nvmf_subsystem_get_qpairs
   - rpc_nvmf_subsystem_get_controllers
   - rpc_nvmf_subsystem_get_listeners
3. during io
   - nvmf_get_stats
   - bdev_get_iostat

Output from these commands is attached as two separate tar files, one for rhel8 and the other for rhel9

Comment 6 Paul Cuzner 2023-10-31 03:27:49 UTC

I have tried different parameters on the connect command in RHEL9
-i 8 -Q 1024
-W 8 -Q 1024

And also different tuned-profiles
network-latency
throughput-performance

Comment 7 Paul Cuzner 2023-11-01 01:43:44 UTC

Created attachment 1996497 [details]
tcpdumps from rhel8 and rhel9 during connect

Comment 8 Paul Cuzner 2023-11-01 02:10:48 UTC

I looked at the tcpdumps with wireshark and when I applied the nvme-tcp filter for the rhel9 tcpdump, nothing was shown! With RHEL8, it just worked, and I can see the nvme/tcp packets.

Although the "problem" client is already RHEL9.2, I registered the server and ran an update which had some interesting updates: kernel (from 5.14.0-284.11.1 to 5.14.0-284.30.1), nvme-cli and libnvme.
However, a repeat of the test run did NOT close the gap with the RHEL8 IOPS result.

Comment 9 Paul Cuzner 2023-11-01 02:17:30 UTC

Created attachment 1996499 [details]
ceph nvmeof gw conf file

Comment 10 Paul Cuzner 2023-11-01 02:28:30 UTC

Created attachment 1996502 [details]
output from get_bdevs

Comment 11 Paul Cuzner 2023-11-01 02:31:48 UTC

Created attachment 1996503 [details]
nvme list output from rhel9

Comment 12 Aviv Caro 2024-04-17 16:02:26 UTC

I think we need to try and reproduce it in 7.1?

Comment 16 Paul Cuzner 2024-05-02 20:18:58 UTC

I don't have any free hardware in the Scalelab to test this - everything is ESX8 or RHEL9.

I didn't think RHEL8 was going to be supported anyway - so perhaps this issue should just move to the upstream backlog?

@aviv.caro what do you think>?

Comment 17 Rahul Lepakshi 2024-05-22 05:01:58 UTC

Per Paul's comment at https://ibm-systems-storage.slack.com/archives/C05AM6G7ZF1/p1716200930999259 these can be closed.

Comment 18 errata-xmlrpc 2024-06-13 14:22:40 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Critical: Red Hat Ceph Storage 7.1 security, enhancements, and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:3925

Comment 19 Red Hat Bugzilla 2024-10-12 04:25:11 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days

Note You need to log in before you can comment on or make changes to this bug.