Bug 2059206

Summary: NFS Ganesha slow small-file performance on RHCS 5.0
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: mcurrier
Component: NFS-GaneshaAssignee: Frank Filz <ffilz>
Status: CLOSED WORKSFORME QA Contact: Vidushi Mishra <vimishra>
Severity: high Docs Contact:
Priority: high    
Version: 5.0CC: bniver, gfarnum, gfidente, gmeno, gouthamr, kkeithle, madam, mbenjamin, rraja, vereddy
Target Milestone: ---Keywords: Performance
Target Release: 5.2   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-06-01 15:27:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description mcurrier 2022-02-28 14:00:07 UTC
Description of problem:

In recent project testing CephFS with smallfile workload, it was uncovered
that NFS Ganesha performance was very slow compared to native CephFS. See

https://docs.google.com/document/d/1IuOnChnWJxvMtkDHX_tNJCZsFJYFzoS-B1sxWb5EjE4/edit

for details.

NFS is 3.75% of the performance of CephFS for create
NFS is 14.5% of the performance of CephFS for read



Version-Release number of selected component (if applicable):

RHCS 5.0
ceph version 16.2.0-146.el8cp pacific (stable)

How reproducible:


Steps to Reproduce:
1. configure NFS Ganesha
2. run smallfile workload over 4 nfs mt points
3. see https://docs.google.com/document/d/1bYoYfuemHQOEOElv-PsGZhjh5PoaTZwEG1In47uZGa0/edit# for details 

Actual results:


Expected results:


Additional info:

Comment 1 Yaniv Kaul 2022-02-28 15:08:32 UTC
Raising severity and priority to High/High, as the numbers are really significantly different. I wonder if we have previous release data to compare. After all, OSP is using NFS (in both Manila and Manila CSI use cases) and should have complained already.

Comment 2 Frank Filz 2022-03-01 18:42:48 UTC
Some time ago now I did some performance testing with Ganesha V3.3 using vdbench to test with multiple clients. I was unable to do much metadata testing, though it does report on the creates while setting up the files. It doesn't do lots of files (really one file per thread running) and they are larger files, but still a useful throughput benchmark.

A report I did:

https://docs.google.com/document/d/1hq_3-o8FEYB9ChDTBLbVRMLa7WztOFoknS82v1FyF3Y/edit?usp=sharing

There is a link to more raw data in that report. I saved off ALL the raw data.

From what I saw on that, yes, Ganesha CephFS performance is not great, but it is better than 50% of Ganesha's FSAL_VFS throughput on an XFS file system and that was better than knfsd's throughput on the same XFS file system. 

Looking at the raw data, I see that the create rate and create response time are comparablle to FSAL_VFS, but that doesn't represent a huge create load.

We would definitely need more examination to see what might be going on here.

Comment 3 Frank Filz 2022-03-04 22:27:40 UTC
I also just realized, with RHCS 5.0 this is with Ganesha V2.5 which really is ancient, with significant performance enhancements available in V3.x.

Comment 4 Ben England 2022-03-07 18:42:50 UTC
Thx Frank, that sounds very reasonable.   So getting an up-to-date NFS Ganesha sounds like step 1.   But that has a big impact on OpenStack (and soon ODF) so we have to check with those teams before just doing it.   cc'ing Giulio Fidente (OpenStack) and Michael Adam (ODF), they can update cc to whomever would be interested in this.

Comment 8 Ben England 2022-03-08 14:41:11 UTC
also added the word "small-file" to title so that it was clearer what this was about.