Bug 1641784 - Disk performance task during foreman-maintain upgrade check fails with low disk speed when using the latest version of rubygem-foreman_maintain (0.2.11-1)
Summary: Disk performance task during foreman-maintain upgrade check fails with low di...
Status: ASSIGNED
Alias: None
Product: Red Hat Satellite 6
Classification: Red Hat
Component: Foreman Maintain   
(Show other bugs)
Version: 6.4.0
Hardware: All
OS: All
high
medium vote
Target Milestone: 6.5.0
Assignee: Kavita
QA Contact: Nikhil Kathole
URL:
Whiteboard:
Keywords: PrioBumpGSS, Triaged, UserExperience
: 1640705 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-10-22 18:27 UTC by Rajan Gupta
Modified: 2019-03-22 03:15 UTC (History)
27 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Foreman Issue Tracker 26236 None None None 2019-03-05 14:26 UTC
Red Hat Knowledge Base (Solution) 3662601 Upgrade None Red Hat Satellite upgrade check from 6.3 to 6.4 fails at "disk-performance" stage 2019-02-18 05:39 UTC

Description Rajan Gupta 2018-10-22 18:27:52 UTC
Description of problem:
Disk performance task during foreman-maintain upgrade check fails with low disk speed when using the latest version of rubygem-foreman_maintain (0.2.11-1)

- With rubygem-foreman_maintain-0.2.11-1:

# foreman-maintain health check --label disk-performance
~~~~~~~~~~~~~~~~~~~~
Running ForemanMaintain::Scenario::FilteredScenario
================================================================================
Check for recommended disk speed of pulp, mongodb, pgsql dir.:
\ Finished

Disk speed : 25 MB/sec                                                [FAIL]
Slow disk detected /var/lib/pulp mounted on /dev/mapper/vgsys-lgsysvar.
             Actual disk speed: 25 MB/sec
             Expected disk speed: 60 MB/sec.
--------------------------------------------------------------------------------
Scenario [ForemanMaintain::Scenario::FilteredScenario] failed.

The following steps ended up in failing state:

  [disk-performance]

Resolve the failed steps and rerun
the command. In case the failures are false positives,
use --whitelist="disk-performance"
~~~~~~~~~~~~~~~~~~~~

- With rubygem-foreman_maintain-0.1.6-1:
~~~~~~~~~~~~~~~~~~~~
# foreman-maintain health check --label disk-performance
Running ForemanMaintain::Scenario::FilteredScenario
================================================================================
Check for recommended disk speed of pulp, mongodb, pgsql dir.:
/ Finished

/var/lib/pulp : 296 MB/sec
/var/lib/mongodb : 342 MB/sec
/var/lib/pgsql : 2360 MB/sec                                          [OK]
~~~~~~~~~~~~~~~~~~~~

Version-Release number of selected component (if applicable):


Steps to Reproduce:
1. Update the rubygem-foreman_maintain to version 0.2.11-1.
2. Run foreman-maintain health check --label disk-performance

Actual results:
rubygem-foreman_maintain-0.2.11-1 always shows low speed and upgrade check fails.
rubygem-foreman_maintain-0.1.6-1 always shows good speed.

Expected results:
rubygem-foreman_maintain-0.2.11-1 always should show correct disk speed check and upgrade check should be working fine.

Additional info:

Comment 1 Robert Morrow 2018-10-23 16:03:07 UTC
It looks like this was caused by foreman-maintain switching from hdparm to use only fio. 

looking at strace it shows version 1.6-1 is using hdparm
execve("/bin/sh", ["sh", "-c", "hdparm -t /dev/mapper/sansatvg-pulp | awk 'NF' 2>&1"]

and version 2.11-1 is using fio
execve("/bin/sh", ["sh", "-c", "fio --name=job1 --rw=read --size=1g --output-format=json                  --directory=/var/lib/pulp --direct=1 2>&1"]

There is a commit on Jun 7th on the foreman_maintain github removing hdparm and to only use fio. The fio and hdparm values are not close to the same.

Comment 2 Rajan Gupta 2018-10-23 16:57:16 UTC
The customer is running an upgrade check before running an actual upgrade command and expectation is that the disk check shows the correct information which is showing the older version of the rubygem-foreman_maintain which is 0.1.6-1.

Comment 3 jcallaha 2018-10-24 19:32:27 UTC
I recommended downgrading the severity of this issue, as you can skip this check by adding   --whitelist="disk-performance"    to the command.

Comment 7 Brad Buckingham 2018-10-26 14:30:45 UTC
*** Bug 1640705 has been marked as a duplicate of this bug. ***

Comment 8 Brad Buckingham 2018-10-26 14:32:01 UTC
When resolving this bugzilla, please be aware of bug 1640705, which was marked as a duplicate.  That bugzilla is also raising concern over the new behavior.

Comment 9 Mike McCune 2018-11-01 15:28:11 UTC
We are going to work to bump up the block size used in this test to most likely 16kb which better represents real-world use cases, the default of 4k leads to often too low of a test result that can be a false-negative.

Users can run:

fio --name=job1 --rw=read --size=1g --directory=/var/lib/pulp --direct=1 --blocksize=16k

and check to see if they get in the 60MB/sec range or above, then it is safe to whitelist this check.

Comment 10 Ben 2018-11-01 15:39:13 UTC
I just tried that and got 

[root@satellite1 ~]# fio --name=job1 --rw=read --size=1g --directory=/var/lib/pulp --direct=1 --blocksize=16k
job1: (g=0): rw=read, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=1
fio-3.1
Starting 1 process
Jobs: 1 (f=1): [R(1)][96.2%][r=37.9MiB/s,w=0KiB/s][r=2423,w=0 IOPS][eta 00m:01s]
job1: (groupid=0, jobs=1): err= 0: pid=23806: Thu Nov  1 15:32:51 2018
   read: IOPS=2533, BW=39.6MiB/s (41.5MB/s)(1024MiB/25869msec)
    clat (usec): min=169, max=59639, avg=385.58, stdev=466.23
     lat (usec): min=170, max=59641, avg=387.21, stdev=466.26
    clat percentiles (usec):
     |  1.00th=[  249],  5.00th=[  273], 10.00th=[  281], 20.00th=[  293],
     | 30.00th=[  306], 40.00th=[  314], 50.00th=[  326], 60.00th=[  338],
     | 70.00th=[  355], 80.00th=[  388], 90.00th=[  461], 95.00th=[  603],
     | 99.00th=[ 1647], 99.50th=[ 2442], 99.90th=[ 4490], 99.95th=[ 5997],
     | 99.99th=[18482]
   bw (  KiB/s): min=24992, max=48609, per=99.83%, avg=40463.96, stdev=4711.74, samples=51
   iops        : min= 1562, max= 3038, avg=2528.80, stdev=294.45, samples=51
  lat (usec)   : 250=1.01%, 500=91.09%, 750=4.80%, 1000=1.28%
  lat (msec)   : 2=1.13%, 4=0.56%, 10=0.11%, 20=0.02%, 50=0.01%
  lat (msec)   : 100=0.01%
  cpu          : usr=4.57%, sys=12.18%, ctx=65538, majf=0, minf=37
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=65536,0,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=39.6MiB/s (41.5MB/s), 39.6MiB/s-39.6MiB/s (41.5MB/s-41.5MB/s), io=1024MiB (1074MB), run=25869-25869msec

Disk stats (read/write):
    dm-2: ios=64830/339, merge=0/0, ticks=21628/496, in_queue=22132, util=84.23%, aggrios=65536/337, aggrmerge=0/2, aggrticks=21803/488, aggrin_queue=22100, aggrutil=83.33%
  sdb: ios=65536/337, merge=0/2, ticks=21803/488, in_queue=22100, util=83.33%

Which I'm having trouble parsing, not having used fio before but... 41.5MB/s?  In which case, it appears I have a problem.  Which is a surprise as this VMWare guest is hosted on a Dell/EMC Compellent, almost certainly on SSDs.  The old foreman-maintain usually averages ~530MB/s...

Comment 12 Sitsofe Wheeler 2018-11-17 08:18:34 UTC
(Random internet passerby comment. I am not speaking for or representing Red Hat in any way. These comments are my own. Red Hat employees - feel free to say if the following is wrong/unhelpful etc)

The fio job you posted is using direct I/O but is only allowing one outstanding I/O at any given time. You are likely using the psync I/O engine (https://fio.readthedocs.io/en/latest/fio_doc.html#cmdoption-arg-ioengine ) as that's the default on Linux if you don't specify an ioengine (https://github.com/axboe/fio/blob/46bfd4e5170ec950c1eb2e27c2ae67fa9b84ee12/os/os.h#L160 ) and iodepth defaults to 1 when unset (see https://fio.readthedocs.io/en/latest/fio_doc.html#cmdoption-arg-iodepth but changing the iodepth on synchronous ioengines doesn't tend to make sense). So you've actually asked fio to measure the speed you can do sending down only a single block of size 16Kbytes at any given time and then waiting for it to be acknowledged as done by the disk before sending the next 16Kbytes. If you were using buffered I/O (direct=0) you would see better speeds as the kernel will take care of batching I/Os when you're doing writes and speeds would likely be higher (but they would also be heavily distorted by the kernel's cache).

If you are "just" looking for ever-so-slightly higher numbers that are still trying to avoid the kerne's cache then you will likely see them by using the libaio ioengine with direct I/O and a higher iodepth (e.g. 32 but the correct depth will depend on your setup). Also note your chosen file size is very small and may be being satisfied entirely (or even partially) by the cache in things like fancy controllers or even the disks themselves.

Good luck!

Comment 13 Ben 2018-11-20 12:27:42 UTC
With the above comment in mind, is there a chance of new version of rubygem-foreman_maintain that uses more useful settings?

Comment 14 whitedm 2019-02-08 21:18:35 UTC
FWIW, this is also happening on an upgrade from Satellite 6.2 -> 6.3. 

Same version of rubygem-foreman_maintain:

Name        : rubygem-foreman_maintain
Arch        : noarch
Version     : 0.2.11
Release     : 1.el7sat
Size        : 337 k
Repo        : installed
From repo   : rhel-7-server-satellite-maintenance-6-rpms


Note You need to log in before you can comment on or make changes to this bug.