1641784 – Disk performance task during foreman-maintain upgrade check fails with low disk speed when using the latest version of rubygem-foreman_maintain (0.2.11-1)

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1641784 - Disk performance task during foreman-maintain upgrade check fails with low disk speed when using the latest version of rubygem-foreman_maintain (0.2.11-1)

Summary: Disk performance task during foreman-maintain upgrade check fails with low di...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Satellite
Classification:	Red Hat
Component:	Satellite Maintain
Sub Component:
Version:	6.4.0
Hardware:	All
OS:	All
Priority:	high
Severity:	medium
Target Milestone:	Released
Assignee:	Kavita
QA Contact:	Jameer Pathan
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	1640705 1807153 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-10-22 18:27 UTC by Rajan Gupta
Modified:	2023-12-15 16:11 UTC (History)
CC List:	30 users (show)
Fixed In Version:	rubygem-foreman_maintain-0.3.3
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-05-14 19:57:47 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Foreman Issue Tracker	26236	0	Normal	Closed	Make changes to disk check so that it should display warning instead of failure for versions > sat 6.2	2021-02-19 10:34:10 UTC
Red Hat Knowledge Base (Solution)	3662601	0	Upgrade	None	Red Hat Satellite upgrade check from 6.3 to 6.4 fails at "disk-performance" stage	2019-02-18 05:39:47 UTC

Description Rajan Gupta 2018-10-22 18:27:52 UTC

Description of problem:
Disk performance task during foreman-maintain upgrade check fails with low disk speed when using the latest version of rubygem-foreman_maintain (0.2.11-1)

- With rubygem-foreman_maintain-0.2.11-1:

# foreman-maintain health check --label disk-performance
~~~~~~~~~~~~~~~~~~~~
Running ForemanMaintain::Scenario::FilteredScenario
================================================================================
Check for recommended disk speed of pulp, mongodb, pgsql dir.:
\ Finished

Disk speed : 25 MB/sec                                                [FAIL]
Slow disk detected /var/lib/pulp mounted on /dev/mapper/vgsys-lgsysvar.
             Actual disk speed: 25 MB/sec
             Expected disk speed: 60 MB/sec.
--------------------------------------------------------------------------------
Scenario [ForemanMaintain::Scenario::FilteredScenario] failed.

The following steps ended up in failing state:

  [disk-performance]

Resolve the failed steps and rerun
the command. In case the failures are false positives,
use --whitelist="disk-performance"
~~~~~~~~~~~~~~~~~~~~

- With rubygem-foreman_maintain-0.1.6-1:
~~~~~~~~~~~~~~~~~~~~
# foreman-maintain health check --label disk-performance
Running ForemanMaintain::Scenario::FilteredScenario
================================================================================
Check for recommended disk speed of pulp, mongodb, pgsql dir.:
/ Finished

/var/lib/pulp : 296 MB/sec
/var/lib/mongodb : 342 MB/sec
/var/lib/pgsql : 2360 MB/sec                                          [OK]
~~~~~~~~~~~~~~~~~~~~

Version-Release number of selected component (if applicable):


Steps to Reproduce:
1. Update the rubygem-foreman_maintain to version 0.2.11-1.
2. Run foreman-maintain health check --label disk-performance

Actual results:
rubygem-foreman_maintain-0.2.11-1 always shows low speed and upgrade check fails.
rubygem-foreman_maintain-0.1.6-1 always shows good speed.

Expected results:
rubygem-foreman_maintain-0.2.11-1 always should show correct disk speed check and upgrade check should be working fine.

Additional info:

Comment 1 Robert Morrow 2018-10-23 16:03:07 UTC

It looks like this was caused by foreman-maintain switching from hdparm to use only fio. 

looking at strace it shows version 1.6-1 is using hdparm
execve("/bin/sh", ["sh", "-c", "hdparm -t /dev/mapper/sansatvg-pulp | awk 'NF' 2>&1"]

and version 2.11-1 is using fio
execve("/bin/sh", ["sh", "-c", "fio --name=job1 --rw=read --size=1g --output-format=json                  --directory=/var/lib/pulp --direct=1 2>&1"]

There is a commit on Jun 7th on the foreman_maintain github removing hdparm and to only use fio. The fio and hdparm values are not close to the same.

Comment 2 Rajan Gupta 2018-10-23 16:57:16 UTC

The customer is running an upgrade check before running an actual upgrade command and expectation is that the disk check shows the correct information which is showing the older version of the rubygem-foreman_maintain which is 0.1.6-1.

Comment 3 jcallaha 2018-10-24 19:32:27 UTC

I recommended downgrading the severity of this issue, as you can skip this check by adding   --whitelist="disk-performance"    to the command.

Comment 7 Brad Buckingham 2018-10-26 14:30:45 UTC

*** Bug 1640705 has been marked as a duplicate of this bug. ***

Comment 8 Brad Buckingham 2018-10-26 14:32:01 UTC

When resolving this bugzilla, please be aware of bug 1640705, which was marked as a duplicate.  That bugzilla is also raising concern over the new behavior.

Comment 9 Mike McCune 2018-11-01 15:28:11 UTC

We are going to work to bump up the block size used in this test to most likely 16kb which better represents real-world use cases, the default of 4k leads to often too low of a test result that can be a false-negative.

Users can run:

fio --name=job1 --rw=read --size=1g --directory=/var/lib/pulp --direct=1 --blocksize=16k

and check to see if they get in the 60MB/sec range or above, then it is safe to whitelist this check.

Comment 10 Ben 2018-11-01 15:39:13 UTC

I just tried that and got 

[root@satellite1 ~]# fio --name=job1 --rw=read --size=1g --directory=/var/lib/pulp --direct=1 --blocksize=16k
job1: (g=0): rw=read, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=1
fio-3.1
Starting 1 process
Jobs: 1 (f=1): [R(1)][96.2%][r=37.9MiB/s,w=0KiB/s][r=2423,w=0 IOPS][eta 00m:01s]
job1: (groupid=0, jobs=1): err= 0: pid=23806: Thu Nov  1 15:32:51 2018
   read: IOPS=2533, BW=39.6MiB/s (41.5MB/s)(1024MiB/25869msec)
    clat (usec): min=169, max=59639, avg=385.58, stdev=466.23
     lat (usec): min=170, max=59641, avg=387.21, stdev=466.26
    clat percentiles (usec):
     |  1.00th=[  249],  5.00th=[  273], 10.00th=[  281], 20.00th=[  293],
     | 30.00th=[  306], 40.00th=[  314], 50.00th=[  326], 60.00th=[  338],
     | 70.00th=[  355], 80.00th=[  388], 90.00th=[  461], 95.00th=[  603],
     | 99.00th=[ 1647], 99.50th=[ 2442], 99.90th=[ 4490], 99.95th=[ 5997],
     | 99.99th=[18482]
   bw (  KiB/s): min=24992, max=48609, per=99.83%, avg=40463.96, stdev=4711.74, samples=51
   iops        : min= 1562, max= 3038, avg=2528.80, stdev=294.45, samples=51
  lat (usec)   : 250=1.01%, 500=91.09%, 750=4.80%, 1000=1.28%
  lat (msec)   : 2=1.13%, 4=0.56%, 10=0.11%, 20=0.02%, 50=0.01%
  lat (msec)   : 100=0.01%
  cpu          : usr=4.57%, sys=12.18%, ctx=65538, majf=0, minf=37
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=65536,0,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=39.6MiB/s (41.5MB/s), 39.6MiB/s-39.6MiB/s (41.5MB/s-41.5MB/s), io=1024MiB (1074MB), run=25869-25869msec

Disk stats (read/write):
    dm-2: ios=64830/339, merge=0/0, ticks=21628/496, in_queue=22132, util=84.23%, aggrios=65536/337, aggrmerge=0/2, aggrticks=21803/488, aggrin_queue=22100, aggrutil=83.33%
  sdb: ios=65536/337, merge=0/2, ticks=21803/488, in_queue=22100, util=83.33%

Which I'm having trouble parsing, not having used fio before but... 41.5MB/s?  In which case, it appears I have a problem.  Which is a surprise as this VMWare guest is hosted on a Dell/EMC Compellent, almost certainly on SSDs.  The old foreman-maintain usually averages ~530MB/s...

Comment 12 Sitsofe Wheeler 2018-11-17 08:18:34 UTC

(Random internet passerby comment. I am not speaking for or representing Red Hat in any way. These comments are my own. Red Hat employees - feel free to say if the following is wrong/unhelpful etc)

The fio job you posted is using direct I/O but is only allowing one outstanding I/O at any given time. You are likely using the psync I/O engine (https://fio.readthedocs.io/en/latest/fio_doc.html#cmdoption-arg-ioengine ) as that's the default on Linux if you don't specify an ioengine (https://github.com/axboe/fio/blob/46bfd4e5170ec950c1eb2e27c2ae67fa9b84ee12/os/os.h#L160 ) and iodepth defaults to 1 when unset (see https://fio.readthedocs.io/en/latest/fio_doc.html#cmdoption-arg-iodepth but changing the iodepth on synchronous ioengines doesn't tend to make sense). So you've actually asked fio to measure the speed you can do sending down only a single block of size 16Kbytes at any given time and then waiting for it to be acknowledged as done by the disk before sending the next 16Kbytes. If you were using buffered I/O (direct=0) you would see better speeds as the kernel will take care of batching I/Os when you're doing writes and speeds would likely be higher (but they would also be heavily distorted by the kernel's cache).

If you are "just" looking for ever-so-slightly higher numbers that are still trying to avoid the kerne's cache then you will likely see them by using the libaio ioengine with direct I/O and a higher iodepth (e.g. 32 but the correct depth will depend on your setup). Also note your chosen file size is very small and may be being satisfied entirely (or even partially) by the cache in things like fancy controllers or even the disks themselves.

Good luck!

Comment 13 Ben 2018-11-20 12:27:42 UTC

With the above comment in mind, is there a chance of new version of rubygem-foreman_maintain that uses more useful settings?

Comment 14 whitedm 2019-02-08 21:18:35 UTC

FWIW, this is also happening on an upgrade from Satellite 6.2 -> 6.3. 

Same version of rubygem-foreman_maintain:

Name        : rubygem-foreman_maintain
Arch        : noarch
Version     : 0.2.11
Release     : 1.el7sat
Size        : 337 k
Repo        : installed
From repo   : rhel-7-server-satellite-maintenance-6-rpms

Comment 19 Bryan Kearney 2019-04-04 12:05:23 UTC

Moving this bug to POST for triage into Satellite 6 since the upstream issue https://projects.theforeman.org/issues/26236 has been resolved.

Comment 24 Jameer Pathan 2019-05-02 09:03:04 UTC

Verified
@satellite 6.5.0 snap 26
@rubygem-foreman_maintain-0.3.3-1.el7sat.noarch

steps:
- Run foreman-maintain health check --label disk-performance

Observation:
1. if disk speed not matching with threshold foreman_maintain will run disk speed check silently(shows warning)
 for satellite > 6.2 and for satellite <= 6.2, it gives fail result.

2. for satellite 6.5.0
# foreman-maintain health check --label disk-performance
Running ForemanMaintain::Scenario::FilteredScenario
================================================================================
Check for recommended disk speed of pulp, mongodb, pgsql dir.: 
| Finished                                                                      

Disk speed : 33 MB/sec                                                [WARNING]
Slow disk detected /var/lib/pulp mounted on /dev/mapper/vg_dhcp201-lv_root.
             Actual disk speed: 33 MB/sec
             Expected disk speed: 60 MB/sec.
WARNING: Low disk speed might have a negative impact on the system.
See https://access.redhat.com/solutions/3397771 before proceeding
--------------------------------------------------------------------------------
Scenario [ForemanMaintain::Scenario::FilteredScenario] failed.

The following steps ended up in warning state:

  [disk-performance]

The steps in warning state itself might not mean there is an error,
but it should be reviewed to ensure the behavior is expected

3. for satellite 6.2.16
# foreman-maintain health check --label disk-performance
Running ForemanMaintain::Scenario::FilteredScenario
================================================================================
Check for recommended disk speed of pulp, mongodb, pgsql dir.: 
/ Finished                                                                      

Disk speed : 6 MB/sec                                                 [FAIL]
Slow disk detected /var/lib/pulp mounted on /dev/mapper/rhel_hp--nehalem--02-root.
             Actual disk speed: 6 MB/sec
             Expected disk speed: 60 MB/sec.
--------------------------------------------------------------------------------
Scenario [ForemanMaintain::Scenario::FilteredScenario] failed.

The following steps ended up in failing state:

  [disk-performance]

Resolve the failed steps and rerun
the command. In case the failures are false positives,
use --whitelist="disk-performance"

Additional info:
- see comment #18 and comment #16

Comment 25 Bryan Kearney 2019-05-14 19:57:47 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:1222

Comment 26 Ryan Deussing 2021-07-20 15:54:58 UTC

*** Bug 1807153 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.

ahumbe
ajoseph
akarimi
alexander.lackner
apatel
aperotti
ben.argyle
brian.millett
ddf-bot
egolov
foremar1
inecas
iwindon
janarula
kgaikwad
ktordeur
kupadhya
mbacovsk
mschwabe
omankame
pdwyer
rchauhan
ricardo.reyes
rjerrido
Robert-Morrowjr
sadas
shisingh
t.h.amundsen
whitedm
zhunting