RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1924337 - RFE: virtiofs performs worse than nfs
Summary: RFE: virtiofs performs worse than nfs
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: qemu-kvm
Version: unspecified
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: 9.0
Assignee: Vivek Goyal
QA Contact: xiagao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-02-02 22:43 UTC by Alex Williamson
Modified: 2022-08-31 01:39 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-02 07:28:09 UTC
Type: Feature Request
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Alex Williamson 2021-02-02 22:43:05 UTC
Description of problem:

This is actually produced on Fedora Host with RHEL8 guest, but Dave Gilbert asked me to file a bug anyway.

Evaluating virtio for sharing a development tree from host to guest using the following XML:

     <filesystem type='mount' accessmode='passthrough'>
       <driver type='virtiofs' queue='1024'/>
       <binary path='/usr/libexec/virtiofsd' xattr='on'>
         <cache mode='always'/>
         <lock posix='on' flock='on'/>
       </binary>
       <source dir='/home/alwillia/Work/'/>
       <target dir='/mnt/virtiofs/Work'/>
       <alias name='fs0'/>
       <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
     </filesystem>

I used a kernel build test:

# make O=/tmp/build defconfig
# time make O=/tmp/build -j16

I ran this test within the VM (a) on a local fs running on a raw image file backed by nvme storage, (b) a virtiofs mount backed by ext4 on the host on the same nvme drive, (c) and nfs export of the same directory via virtio in the guest across a bridge in the host.  Results:

(a) 2 min
(b) 5 min (+3 min ~ 150%)
(c) 3.5 min (+1.5 min ~ 75%)

ie. virtiofs is 2x worse than nfs.

NB, xattr='off' does not meaningfully change the results

bonnie++ comparision:

(b)

Version  1.98       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
rhel8vm         63G   34k  15  159m  23 77.4m  21 4537k  98  676m  35 909.1 2273
Latency               306ms     155ms     760ms    4911us   14050us    8663ms
Version  1.98       ------Sequential Create------ --------Random Create--------
rhel8vm             -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16     0  19 +++++ +++ 724249376  23     0  20 +++++ +++ 724249376  20
Latency              3305us    1004us    1418us    8692us     138us    1527us
1.98,1.98,rhel8vm,1,1612296254,63G,,8192,5,34,15,162746,23,79274,21,4537,98,691995,35,909.1,2273,16,,,,,6219,19,+++++,+++,12740,23,5408,20,+++++,+++,11100,20,306ms,155ms,760ms,4911us,14050us,8663ms,3305us,1004us,1418us,8692us,138us,1527us

(c)

Version  1.98       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
rhel8vm         63G 2249k  99  1.0g  62  383m  24 1131k  98  627m  25 +++++ +++
Latency              8726us   79113us     439ms    1998ms   13400us   13499us
Version  1.98       ------Sequential Create------ --------Random Create--------
rhel8vm             -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16     0  16 +++++ +++ 724249376  21     0  17 +++++ +++ 724249376  18
Latency             10159us    9647us     840us    4388us     953us    1668us
1.98,1.98,rhel8vm,1,1612271133,63G,,8192,5,2249,99,1057659,62,391862,24,1131,98,641886,25,+++++,+++,16,,,,,3266,16,+++++,+++,12911,21,3323,17,+++++,+++,11985,18,8726us,79113us,439ms,1998ms,13400us,13499us,10159us,9647us,840us,4388us,953us,1668us

NFS server simply configured with defaults:

$ grep -v ^# /etc/nfs.conf
[general]
[exports]
[exportfs]
[gssd]
use-gss-proxy=1
[lockd]
[mountd]
[nfsdcld]
[nfsdcltrack]
[nfsd]
[statd]
[sm-notify]

Exported as:

/home/alwillia/Work xxx.xx.xx.0/24(rw,async)

Mounted in guest:

omen:/home/alwillia/Work on /mnt/nfs type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=xxx.xx.xx.xxx,local_lock=none,addr=xxx.xx.xx.xxx)

virtiofs mount:

/mnt/virtiofs/Work on /mnt/virtiofs/Work type virtiofs (rw,relatime)

host bridge:

$ brctl show bridge0
bridge name	bridge id		STP enabled	interfaces
bridge0		8000.16e483508dd2	yes		eno1
							vnet4


Version-Release number of selected component (if applicable):

Host:
kernel-5.10.7-200.fc33.x86_64
qemu-system-x86-5.2.0-4.fc33.x86_64
libvirt-daemon-7.0.0-1.fc33.x86_64

Guest:
kernel-4.18.0-240.8.1.el8_3.x86_64

How reproducible:
100%

Steps to Reproduce:
1. As described above
2.
3.

Actual results:
NFS is a better choice than virtiofs for environments where a virtual network connection to the host is acceptable.

Expected results:
Expected virtiofs to be a high performance alternative to network filesystems.

Additional info:
Is the niche of virtiofs only to share a filesystem to the guest without network connectivity to the host, or is it also meant to compete performance-wise to network solutions?

Comment 1 Dr. David Alan Gilbert 2021-02-03 09:18:21 UTC
Thanks for filing it; yes we should look at this sometime, while being slower than a block device doesn't surprise me, we should
be able to find a way to match NFS; conceptually they're similar (stuff file operations into a protocol, shuffle them
over a channel and handle them in a host daemon).

Comment 2 Vivek Goyal 2021-02-04 17:35:38 UTC
Running slower than nfs is definitely surprising. I have yet to setup, NFS for my VM. I am experimenting with different configurations of virtio-fs and trying to figure out which runs fastest for this workload. With cache=auto, I am seeing a runtime of around 4 and with DAX enabled it came close to 3.5 minutes. So some improvement. Note, DAX support is in upstream kernel but not in qemu yet. It is still being worked on.

cache=auto, xattr=off,thread-pool-size=0
----------------------------------------

real    4m4.506s
user    19m0.169s
sys     26m21.216s

cache=auto, dax=on, xattr=off, thrad-pool-size=0
------------------------------------------------
real    3m36.421s
user    18m55.979s
sys     24m2.325s


Maybe NFS is cutting down on number of round trips (and not per round trip time) and probably that helps with speed. This will require more investigation.

Comment 3 Vivek Goyal 2021-02-04 19:40:57 UTC
And tried with nfs now. It is faster. Results vary significantly from run to run. Here are results of 3 runs.

real    3m7.838s
user    18m51.170s
sys     15m51.583s

real	3m3.128s
user	18m51.658s
sys	15m51.310s

real	2m46.605s
user	18m38.966s
sys	15m27.063s

So yes, nfs seems faster... Using DAX with virtiofs reduces the gap, but still nfs is faster.

Comment 4 Vivek Goyal 2021-02-04 21:46:51 UTC
Looks like, virtiofs is not too bad if build output is going back over filesystem as well. IOW, so far we are testing with O=/tmp/build. Alex, try sending writes back over filesystem. O=/mnt/nfs/linux/build/. I notice that difference is not as much any more. I suspect we are not as efficient on read path as we could have been.

In fact if I test this with DAX enabled, I see numbers better than nfs. So here are the numbers with build output going back to host.

nfs
===
real    4m6.918s
user    19m24.189s
sys     17m12.816s

real    4m3.101s
user    19m17.690s
sys     16m52.923s

real    4m10.282s
user    19m22.076s
sys     16m56.548s

virtiofs (cache=auto, -o xattr, --thread-pool-size=0)
=====================================================

real    4m34.014s
user    18m18.311s
sys     19m12.187s

real    4m25.076s
user    18m24.972s
sys     19m14.242s

real    4m16.685s
user    18m27.491s
sys     19m8.420s

virtiofs (cache=auto, -o xattr, --thread-pool-size=0, dax enabled)
==================================================================
real    3m26.864s
user    18m29.511s
sys     16m42.756s

real    3m31.442s
user    18m25.636s
sys     16m44.484s

real    3m26.130s
user    18m25.922s
sys     16m40.312s


So without DAX, difference between nfs and virtiofs is not as much. And with DAX, virtiofs is performing better for this particular workload/setting.

Comment 5 Alex Williamson 2021-02-04 21:56:15 UTC
TBH, this looks like you're slowing down nfs to show that virtiofs can be proportionally less worse.

Comment 6 Vivek Goyal 2021-02-04 22:03:54 UTC
(In reply to Alex Williamson from comment #5)
> TBH, this looks like you're slowing down nfs to show that virtiofs can be
> proportionally less worse.

I think a more fair characterization will be that with a slightly different workload, virtiofs is faster. So our task at hand will be to see if we can improve performance of virtiofs for the workload you threw at it.

Comment 8 Dr. David Alan Gilbert 2021-02-09 10:25:04 UTC
One thing I remembered is that while virtiofsd is a user space process, NFS is done in the kernel - I don't know what % of the benefit comes from that, but it's
something to keep in mind.

Comment 9 Miklos Szeredi 2021-02-19 13:41:31 UTC
One thing that NFS server allows is delegations: it's a kind lease granted to the client.  If a client receives a read/(write) delegations it can open/read/stat/(write/chmod/...) the file without going out to the network.

The downside for virtualized clients is that a broken client can DoS the server, though I guess there are timeouts involved, as with other NFS operations.

It would be interesting to compare the number of roundtrips made by NFS and virtiofs for a certain workload.  RTT statistics would also be interesting.

Comment 10 xiagao 2021-03-16 08:42:31 UTC
@Yanhui,hi
I see this bug is about performance and we don't cover it in our test, can you take it?

Comment 11 Klaus Heinrich Kiwi 2021-09-08 20:09:27 UTC
I'm going to assign this to Vivek and raise the severity and priority, given that I think this needs to be at least well understood before we can generally support virtiofs...

Comment 12 Klaus Heinrich Kiwi 2022-01-27 17:23:38 UTC
(In reply to Klaus Heinrich Kiwi from comment #11)
> I'm going to assign this to Vivek and raise the severity and priority, given
> that I think this needs to be at least well understood before we can
> generally support virtiofs...

Vivek,

 have you had a chance to check if we're any better with the latest iterations? I've tagged this with RFE as I don't think it's a blocking bug for RHEL 9 GA, but let me know if we need to discuss.

Comment 14 Vivek Goyal 2022-01-27 21:44:35 UTC
(In reply to Klaus Heinrich Kiwi from comment #12)
> (In reply to Klaus Heinrich Kiwi from comment #11)
> > I'm going to assign this to Vivek and raise the severity and priority, given
> > that I think this needs to be at least well understood before we can
> > generally support virtiofs...
> 
> Vivek,
> 
>  have you had a chance to check if we're any better with the latest
> iterations? I've tagged this with RFE as I don't think it's a blocking bug
> for RHEL 9 GA, but let me know if we need to discuss.

Hi Klaus,

I don't think this is a blocking bug. There is no hard requirement that virtiofs has to be faster than NFS in all cases. It will be nice though.

Have not got a chance to test it on latest configurations. In general, I have an internal TODO item to do some investigation and see how virtiofs performance can be improved.

One of the things I was thinking what if we run virtiofs as part of qemu (And not as a separate vhost-user device). Can that give us some performance boost.

Another thing we are relying on is DAX support in qemu. Which can give us performance boost.

Vivek

Comment 20 xiagao 2022-07-04 02:37:04 UTC
Hello Vivek,
This bug will be auto-close,if you want to keep it open could you pls extend the 'stale date'?

Regards,
Xiaoling

Comment 21 Vivek Goyal 2022-07-05 12:22:24 UTC
Hi Xiaoling,

I think we can let this bug auto close. Improving performance of virtiofs is an ongoing item. DAX support (qemu side) is in progress as well. So its not one thing which needs to be done. Over a period of time our target is to keep on improving performance of virtiofs when we identify a bottleneck.

I consider this to be more of a desirable property of virtiofs (and any software component in general) and not necessarily a bug. 

So I think its better that this bug is closed.

Vivek

Comment 22 RHEL Program Management 2022-08-02 07:28:09 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.


Note You need to log in before you can comment on or make changes to this bug.