1622616 – Inconsistency over fio utility benchmarking

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1622616 - Inconsistency over fio utility benchmarking

Summary: Inconsistency over fio utility benchmarking

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Satellite
Classification:	Red Hat
Component:	Satellite Maintain
Sub Component:
Version:	6.4
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	Unspecified
Assignee:	Kavita
QA Contact:	Nikhil Kathole
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1122832 1619394
TreeView+	depends on / blocked

Reported:	2018-08-27 15:33 UTC by anerurka
Modified:	2024-03-25 15:07 UTC (History)
CC List:	17 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-04-02 12:32:34 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Foreman Issue Tracker	24959	0	Normal	New	Inconsistency over fio utility benchmarking	2020-12-14 16:22:56 UTC
Red Hat Bugzilla	1563284	0	unspecified	CLOSED	Remove use of hdparm for IO tests - misleading results	2023-09-15 00:07:19 UTC

Internal Links: 1563284

Description anerurka 2018-08-27 15:33:15 UTC

Description of problem:

Inconsistency over fio utility benchmarking 


Version-Release number of selected component (if applicable):

satellite-installer-6.4.0.7-1.beta.el7sat.noarch
fio-3.1-2.el7.x86_64

How reproducible:

Steps to Reproduce:

1.  # foreman-maintain upgrade check --target-version 6.4

Actual results:

pre-upgrade-step fails because we are using a VM:


- Check for recommended disk speed of pulp, mongodb, pgsql dir.:
- Finished

Disk speed : 24 MB/sec                                                [FAIL]
Slow disk detected /var/lib/pulp mounted on /dev/mapper/vg_app1-lv_pulp.
             Actual disk speed: 24 MB/sec
             Expected disk speed: 80 MB/sec.


Expected results:

- Fio test looks extremely inconsistent while calculating Disk Speed, please review additional information for detailed analysis.

Additional info:

Fio results:

-----------
root@server1:~#  sudo fio --name=job1 --rw=read --size=1g --directory=/var/lib/pulp --direct=1

Run status group 0 (all jobs):
   READ: bw=24.0MiB/s (26.2MB/s), 24.0MiB/s-24.0MiB/s (26.2MB/s-26.2MB/s), io=1024MiB (1074MB), run=40996-40996msec
-----------

>>   fio test on an other server that has both local disk and a SAN disk. The SAN disk reports the same read performance.

root@server2 ~$ sudo fio --name=job1 --rw=read --size=1g --direct=1 --directory=/hana/log

Run status group 0 (all jobs):
   READ: bw=30.8MiB/s (32.3MB/s), 30.8MiB/s-30.8MiB/s (32.3MB/s-32.3MB/s), io=1024MiB (1074MB), run=33264-33264msec
   
root@server2 ~$ sudo fio --name=job1 --rw=read --size=1g --direct=1 --directory=/var/tmp

Run status group 0 (all jobs):
   READ: bw=92.0MiB/s (96.5MB/s), 92.0MiB/s-92.0MiB/s (96.5MB/s-96.5MB/s), io=1024MiB (1074MB), run=11129-11129msec

>> Executing on the same hardware setup with different fio parameters gives a difference view
-----------

>> Also testing different block sizes on the VM gives already different results:

root@server1:~#  sudo fio --name=job1 --rw=read --bs=4k --size=1g --directory=/var/lib/pulp --direct=1

Run status group 0 (all jobs):
   READ: bw=24.0MiB/s (26.2MB/s), 24.0MiB/s-24.0MiB/s (26.2MB/s-26.2MB/s), io=1024MiB (1074MB), run=41011-41011msec

-----------
root@server1:~#  sudo fio --name=job1 --rw=read --bs=8k --size=1g --directory=/var/lib/pulp --direct=1

Run status group 0 (all jobs):
   READ: bw=46.6MiB/s (48.9MB/s), 46.6MiB/s-46.6MiB/s (48.9MB/s-48.9MB/s), io=1024MiB (1074MB), run=21954-21954msec

-----------
root@server1:~#  sudo fio --name=job1 --rw=read --bs=16k --size=1g --directory=/var/lib/pulp --direct=1

Run status group 0 (all jobs):
   READ: bw=84.3MiB/s (88.4MB/s), 84.3MiB/s-84.3MiB/s (88.4MB/s-88.4MB/s), io=1024MiB (1074MB), run=12141-12141msec

-----------

Review the currently IO tests needs clarification on exact :

E.g. bandwidth of latency and different load blocksizes and if sequential or random/

The difference in physical hardware were in the current  test mode, the local physcial-HDD was outpefroming the enterprise class AllFlash SAN.

Putting in a bit more randomness and more jibs shows that the SAN beats the local physical-HDD by a factor 8x.

-----------

root@server2 ~$ sudo fio --name=randread --rw=randread --direct=1 --bs=8k --numjobs=16 --size=1G --runtime=30 --group_reporting --directory=/var/tmp

Run status group 0 (all jobs):
   READ: bw=40.4MiB/s (42.4MB/s), 40.4MiB/s-40.4MiB/s (42.4MB/s-42.4MB/s), io=1215MiB (1274MB), run=30048-30048msec

-----------
root@server2 ~$ sudo fio --name=randread --rw=randread --direct=1 --bs=8k --numjobs=16 --size=1G --runtime=30 --group_reporting --directory=/hana/log

Run status group 0 (all jobs):
   READ: bw=291MiB/s (306MB/s), 291MiB/s-291MiB/s (306MB/s-306MB/s), io=8745MiB (9169MB), run=30002-30002msec
-----------

Based on the above results it looks like, fio test can not be trusted for real workload simulation.

Comment 2 Kavita 2018-09-17 13:32:29 UTC

Created redmine issue http://projects.theforeman.org/issues/24959 from this bug

Comment 3 Satellite Program 2018-09-17 14:06:40 UTC

Upstream bug assigned to kgaikwad

Comment 6 Sitsofe Wheeler 2019-01-13 08:56:02 UTC

(Random passerby comment)

At least the first fio job listed here (fio --name=job1 --rw=read --size=1g --directory=/var/lib/pulp --direct=1) has the same issue as described over in bug 1641784 comment 12 - you're reading synchronous I/O where you're ONLY sending a 4KByte block down as direct I/O and waiting for it to come back before sending any more. You're bypassing the Linux page cache because you asked for direct I/O (so no coalescing of those tiny I/Os, you won't get/benefit from readahead). You aren't sending I/O down in parallel. Are you sure something like a spinning disk can really do more than 24 MBytes a second with such tight restraints?

> Based on the above results it looks like, fio test can not be trusted for real workload simulation.

That's an extreme statement which might need some qualification words around it :-) Perhaps the fio jobs being requested haven't been fully understood? Maybe someone could sit down and take in the huge range of options and caveats mentioned over in the fio help - https://fio.readthedocs.io/en/latest/fio_doc.html and chat with your Linux (disk I/O) folks about these affect kernel submission...

Comment 7 Mark Jackson 2019-05-13 16:01:03 UTC

The tool provided for pre upgrade checks also reported this for myself. I was able to get results that were in line with what I would expect by using a bigger block size and/or more threads. The problem however, is that the upgrade tool and documentation [https://access.redhat.com/solutions/3397771] are misleading. Looking at the documentation you are told to run 'fio --name=job1 --rw=read --size=1g  --directory=/var --direct=1'. If something else should be run, the documentation, and the upgrade check tool (assuming it is just running fio), should also be updated.

Comment 14 Bryan Kearney 2020-03-04 14:08:09 UTC

The Satellite Team is attempting to provide an accurate backlog of bugzilla requests which we feel will be resolved in the next few releases. We do not believe this bugzilla will meet that criteria, and have plans to close it out in 1 month. This is not a reflection on the validity of the request, but a reflection of the many priorities for the product. If you have any concerns about this, feel free to contact Red Hat Technical Support or your account team. If we do not hear from you, we will close this bug out. Thank you.

Comment 15 Bryan Kearney 2020-04-02 12:32:34 UTC

Thank you for your interest in Satellite 6. We have evaluated this request, and while we recognize that it is a valid request, we do not expect this to be implemented in the product in the foreseeable future. This is due to other priorities for the product, and not a reflection on the request itself. We are therefore closing this out as WONTFIX. If you have any concerns about this, please do not reopen. Instead, feel free to contact Red Hat Technical Support. Thank you.

Note You need to log in before you can comment on or make changes to this bug.