Bug 1641784

Summary: Disk performance task during foreman-maintain upgrade check fails with low disk speed when using the latest version of rubygem-foreman_maintain (0.2.11-1)
Product: Red Hat Satellite Reporter: Rajan Gupta <rajgupta>
Component: Satellite MaintainAssignee: Kavita <kgaikwad>
Status: CLOSED ERRATA QA Contact: Jameer Pathan <jpathan>
Severity: medium Docs Contact:
Priority: high    
Version: 6.4.0CC: ahumbe, ajoseph, akarimi, alexander.lackner, apatel, aperotti, ben.argyle, brian.millett, ddf-bot, egolov, foremar1, inecas, iwindon, janarula, kgaikwad, ktordeur, kupadhya, mbacovsk, mschwabe, omankame, pdwyer, rchauhan, ricardo.reyes, rjerrido, Robert-Morrowjr, sadas, shisingh, t.h.amundsen, whitedm, zhunting
Target Milestone: ReleasedKeywords: PrioBumpGSS, Triaged, UserExperience
Target Release: Unused   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: rubygem-foreman_maintain-0.3.3 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-05-14 19:57:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rajan Gupta 2018-10-22 18:27:52 UTC
Description of problem:
Disk performance task during foreman-maintain upgrade check fails with low disk speed when using the latest version of rubygem-foreman_maintain (0.2.11-1)

- With rubygem-foreman_maintain-0.2.11-1:

# foreman-maintain health check --label disk-performance
~~~~~~~~~~~~~~~~~~~~
Running ForemanMaintain::Scenario::FilteredScenario
================================================================================
Check for recommended disk speed of pulp, mongodb, pgsql dir.:
\ Finished

Disk speed : 25 MB/sec                                                [FAIL]
Slow disk detected /var/lib/pulp mounted on /dev/mapper/vgsys-lgsysvar.
             Actual disk speed: 25 MB/sec
             Expected disk speed: 60 MB/sec.
--------------------------------------------------------------------------------
Scenario [ForemanMaintain::Scenario::FilteredScenario] failed.

The following steps ended up in failing state:

  [disk-performance]

Resolve the failed steps and rerun
the command. In case the failures are false positives,
use --whitelist="disk-performance"
~~~~~~~~~~~~~~~~~~~~

- With rubygem-foreman_maintain-0.1.6-1:
~~~~~~~~~~~~~~~~~~~~
# foreman-maintain health check --label disk-performance
Running ForemanMaintain::Scenario::FilteredScenario
================================================================================
Check for recommended disk speed of pulp, mongodb, pgsql dir.:
/ Finished

/var/lib/pulp : 296 MB/sec
/var/lib/mongodb : 342 MB/sec
/var/lib/pgsql : 2360 MB/sec                                          [OK]
~~~~~~~~~~~~~~~~~~~~

Version-Release number of selected component (if applicable):


Steps to Reproduce:
1. Update the rubygem-foreman_maintain to version 0.2.11-1.
2. Run foreman-maintain health check --label disk-performance

Actual results:
rubygem-foreman_maintain-0.2.11-1 always shows low speed and upgrade check fails.
rubygem-foreman_maintain-0.1.6-1 always shows good speed.

Expected results:
rubygem-foreman_maintain-0.2.11-1 always should show correct disk speed check and upgrade check should be working fine.

Additional info:

Comment 1 Robert Morrow 2018-10-23 16:03:07 UTC
It looks like this was caused by foreman-maintain switching from hdparm to use only fio. 

looking at strace it shows version 1.6-1 is using hdparm
execve("/bin/sh", ["sh", "-c", "hdparm -t /dev/mapper/sansatvg-pulp | awk 'NF' 2>&1"]

and version 2.11-1 is using fio
execve("/bin/sh", ["sh", "-c", "fio --name=job1 --rw=read --size=1g --output-format=json                  --directory=/var/lib/pulp --direct=1 2>&1"]

There is a commit on Jun 7th on the foreman_maintain github removing hdparm and to only use fio. The fio and hdparm values are not close to the same.

Comment 2 Rajan Gupta 2018-10-23 16:57:16 UTC
The customer is running an upgrade check before running an actual upgrade command and expectation is that the disk check shows the correct information which is showing the older version of the rubygem-foreman_maintain which is 0.1.6-1.

Comment 3 jcallaha 2018-10-24 19:32:27 UTC
I recommended downgrading the severity of this issue, as you can skip this check by adding   --whitelist="disk-performance"    to the command.

Comment 7 Brad Buckingham 2018-10-26 14:30:45 UTC
*** Bug 1640705 has been marked as a duplicate of this bug. ***

Comment 8 Brad Buckingham 2018-10-26 14:32:01 UTC
When resolving this bugzilla, please be aware of bug 1640705, which was marked as a duplicate.  That bugzilla is also raising concern over the new behavior.

Comment 9 Mike McCune 2018-11-01 15:28:11 UTC
We are going to work to bump up the block size used in this test to most likely 16kb which better represents real-world use cases, the default of 4k leads to often too low of a test result that can be a false-negative.

Users can run:

fio --name=job1 --rw=read --size=1g --directory=/var/lib/pulp --direct=1 --blocksize=16k

and check to see if they get in the 60MB/sec range or above, then it is safe to whitelist this check.

Comment 10 Ben 2018-11-01 15:39:13 UTC
I just tried that and got 

[root@satellite1 ~]# fio --name=job1 --rw=read --size=1g --directory=/var/lib/pulp --direct=1 --blocksize=16k
job1: (g=0): rw=read, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=1
fio-3.1
Starting 1 process
Jobs: 1 (f=1): [R(1)][96.2%][r=37.9MiB/s,w=0KiB/s][r=2423,w=0 IOPS][eta 00m:01s]
job1: (groupid=0, jobs=1): err= 0: pid=23806: Thu Nov  1 15:32:51 2018
   read: IOPS=2533, BW=39.6MiB/s (41.5MB/s)(1024MiB/25869msec)
    clat (usec): min=169, max=59639, avg=385.58, stdev=466.23
     lat (usec): min=170, max=59641, avg=387.21, stdev=466.26
    clat percentiles (usec):
     |  1.00th=[  249],  5.00th=[  273], 10.00th=[  281], 20.00th=[  293],
     | 30.00th=[  306], 40.00th=[  314], 50.00th=[  326], 60.00th=[  338],
     | 70.00th=[  355], 80.00th=[  388], 90.00th=[  461], 95.00th=[  603],
     | 99.00th=[ 1647], 99.50th=[ 2442], 99.90th=[ 4490], 99.95th=[ 5997],
     | 99.99th=[18482]
   bw (  KiB/s): min=24992, max=48609, per=99.83%, avg=40463.96, stdev=4711.74, samples=51
   iops        : min= 1562, max= 3038, avg=2528.80, stdev=294.45, samples=51
  lat (usec)   : 250=1.01%, 500=91.09%, 750=4.80%, 1000=1.28%
  lat (msec)   : 2=1.13%, 4=0.56%, 10=0.11%, 20=0.02%, 50=0.01%
  lat (msec)   : 100=0.01%
  cpu          : usr=4.57%, sys=12.18%, ctx=65538, majf=0, minf=37
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=65536,0,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=39.6MiB/s (41.5MB/s), 39.6MiB/s-39.6MiB/s (41.5MB/s-41.5MB/s), io=1024MiB (1074MB), run=25869-25869msec

Disk stats (read/write):
    dm-2: ios=64830/339, merge=0/0, ticks=21628/496, in_queue=22132, util=84.23%, aggrios=65536/337, aggrmerge=0/2, aggrticks=21803/488, aggrin_queue=22100, aggrutil=83.33%
  sdb: ios=65536/337, merge=0/2, ticks=21803/488, in_queue=22100, util=83.33%

Which I'm having trouble parsing, not having used fio before but... 41.5MB/s?  In which case, it appears I have a problem.  Which is a surprise as this VMWare guest is hosted on a Dell/EMC Compellent, almost certainly on SSDs.  The old foreman-maintain usually averages ~530MB/s...

Comment 12 Sitsofe Wheeler 2018-11-17 08:18:34 UTC
(Random internet passerby comment. I am not speaking for or representing Red Hat in any way. These comments are my own. Red Hat employees - feel free to say if the following is wrong/unhelpful etc)

The fio job you posted is using direct I/O but is only allowing one outstanding I/O at any given time. You are likely using the psync I/O engine (https://fio.readthedocs.io/en/latest/fio_doc.html#cmdoption-arg-ioengine ) as that's the default on Linux if you don't specify an ioengine (https://github.com/axboe/fio/blob/46bfd4e5170ec950c1eb2e27c2ae67fa9b84ee12/os/os.h#L160 ) and iodepth defaults to 1 when unset (see https://fio.readthedocs.io/en/latest/fio_doc.html#cmdoption-arg-iodepth but changing the iodepth on synchronous ioengines doesn't tend to make sense). So you've actually asked fio to measure the speed you can do sending down only a single block of size 16Kbytes at any given time and then waiting for it to be acknowledged as done by the disk before sending the next 16Kbytes. If you were using buffered I/O (direct=0) you would see better speeds as the kernel will take care of batching I/Os when you're doing writes and speeds would likely be higher (but they would also be heavily distorted by the kernel's cache).

If you are "just" looking for ever-so-slightly higher numbers that are still trying to avoid the kerne's cache then you will likely see them by using the libaio ioengine with direct I/O and a higher iodepth (e.g. 32 but the correct depth will depend on your setup). Also note your chosen file size is very small and may be being satisfied entirely (or even partially) by the cache in things like fancy controllers or even the disks themselves.

Good luck!

Comment 13 Ben 2018-11-20 12:27:42 UTC
With the above comment in mind, is there a chance of new version of rubygem-foreman_maintain that uses more useful settings?

Comment 14 whitedm 2019-02-08 21:18:35 UTC
FWIW, this is also happening on an upgrade from Satellite 6.2 -> 6.3. 

Same version of rubygem-foreman_maintain:

Name        : rubygem-foreman_maintain
Arch        : noarch
Version     : 0.2.11
Release     : 1.el7sat
Size        : 337 k
Repo        : installed
From repo   : rhel-7-server-satellite-maintenance-6-rpms

Comment 19 Bryan Kearney 2019-04-04 12:05:23 UTC
Moving this bug to POST for triage into Satellite 6 since the upstream issue https://projects.theforeman.org/issues/26236 has been resolved.

Comment 24 Jameer Pathan 2019-05-02 09:03:04 UTC
Verified
@satellite 6.5.0 snap 26
@rubygem-foreman_maintain-0.3.3-1.el7sat.noarch

steps:
- Run foreman-maintain health check --label disk-performance

Observation:
1. if disk speed not matching with threshold foreman_maintain will run disk speed check silently(shows warning)
 for satellite > 6.2 and for satellite <= 6.2, it gives fail result.

2. for satellite 6.5.0
# foreman-maintain health check --label disk-performance
Running ForemanMaintain::Scenario::FilteredScenario
================================================================================
Check for recommended disk speed of pulp, mongodb, pgsql dir.: 
| Finished                                                                      

Disk speed : 33 MB/sec                                                [WARNING]
Slow disk detected /var/lib/pulp mounted on /dev/mapper/vg_dhcp201-lv_root.
             Actual disk speed: 33 MB/sec
             Expected disk speed: 60 MB/sec.
WARNING: Low disk speed might have a negative impact on the system.
See https://access.redhat.com/solutions/3397771 before proceeding
--------------------------------------------------------------------------------
Scenario [ForemanMaintain::Scenario::FilteredScenario] failed.

The following steps ended up in warning state:

  [disk-performance]

The steps in warning state itself might not mean there is an error,
but it should be reviewed to ensure the behavior is expected

3. for satellite 6.2.16
# foreman-maintain health check --label disk-performance
Running ForemanMaintain::Scenario::FilteredScenario
================================================================================
Check for recommended disk speed of pulp, mongodb, pgsql dir.: 
/ Finished                                                                      

Disk speed : 6 MB/sec                                                 [FAIL]
Slow disk detected /var/lib/pulp mounted on /dev/mapper/rhel_hp--nehalem--02-root.
             Actual disk speed: 6 MB/sec
             Expected disk speed: 60 MB/sec.
--------------------------------------------------------------------------------
Scenario [ForemanMaintain::Scenario::FilteredScenario] failed.

The following steps ended up in failing state:

  [disk-performance]

Resolve the failed steps and rerun
the command. In case the failures are false positives,
use --whitelist="disk-performance"

Additional info:
- see comment #18 and comment #16

Comment 25 Bryan Kearney 2019-05-14 19:57:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:1222

Comment 26 Ryan Deussing 2021-07-20 15:54:58 UTC
*** Bug 1807153 has been marked as a duplicate of this bug. ***