Bug 1299050 - Inaccurate ec2 disk activity [NEEDINFO]
Inaccurate ec2 disk activity
Status: CLOSED NOTABUG
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: C&U Capacity and Utilization (Show other bugs)
5.5.0
Unspecified Unspecified
high Severity high
: GA
: 5.6.0
Assigned To: Marcel Hild
Nandini Chandra
ec2:c&u
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-01-15 14:50 EST by Nandini Chandra
Modified: 2016-08-17 09:41 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-08-17 09:41:30 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
mhild: needinfo? (nachandr)


Attachments (Terms of Use)

  None (edit)
Description Nandini Chandra 2016-01-15 14:50:02 EST
Description of problem:
----------------------
It is my understanding that CFME captures disk activity for ec2 instances with ephemeral storage only[1].On an f20 instance with an attached ephemeral volume,data is being written to a file on this ephemeral volume through a script.The iostat output(kB_wrtn/s) from the F20 instance shows that an average of 10,000 KB is being written per minute.The metrics database table and C&U graphs don't reflect these values. 

Ephemeral volume mounted on /mnt/CU
/dev/xvdc1 on /mnt/CU type ext3 (rw,relatime,seclabel,data=ordered)

iostat output when no data is being written to
----------------------------
[root@ip-10-142-253-73 CU]# iostat 60 
Linux 3.11.10-301.fc20.x86_64 (ip-10-142-253-73.ec2.internal) 	01/15/2016 	_x86_64_	(1 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.40    0.00    0.37   99.22

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvdc              0.58         0.03        72.62       2836    5954624

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00    0.00    0.00  100.00

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvdc              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00    0.00    0.00  100.00

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvdc              0.00         0.00         0.00          0          


iostat output while data is being written to xvdc
-----------------------------------
a[root@ip-10-142-253-73 fedora]# iostat 60
Linux 3.11.10-301.fc20.x86_64 (ip-10-142-253-73.ec2.internal) 	01/15/2016 	_x86_64_	(1 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    1.05    0.01    0.97   97.96

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvdc              1.57         0.03       197.88       2904   16452864

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00   51.89    0.86   47.24    0.00

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn

xvdc             79.66         0.00     10143.50          0     645228

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00   52.16    0.27   47.57    0.00

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvdc             80.98         0.00     10300.99          0     655452


[1]http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/viewing_metrics_with_cloudwatch.html

Version-Release number of selected component (if applicable):
------------------------
5.5.0.13


How reproducible:
-----------------
Always


Steps to Reproduce:
------------------
1.Attach ephemeral storage to an ec2 instance.
2.Generate some disk activity
3.View disk io graphs for the instance on CFME.


Actual results:
---------------
The ec2 disk activity seen on CFME is not accurate.


Expected results:
----------------
The ec2 disk activity seen on CFME should be accurate.


Additional info:
---------------
Comment 4 Marcel Hild 2016-02-11 15:04:37 EST
Nandini, nice investigation on the mojo page, good job!

Have you checked the logs for something odd?

This is the place where something might go wrong reading the metrics data:
https://github.com/durandom/manageiq/blob/7d0ed09773801afa98c87eb2b6cc21f96122b36f/app/models/metric/ci_mixin/capture.rb#L161-L176

And this is where the processing and inserting into our DB happens:
https://github.com/durandom/manageiq/blob/40aeadcac0cedefd84f739694f144a0c9d467b42/app/models/metric/ci_mixin/processing.rb

Look for `_log.` calls and try to match them with your logs.

If you don't find anything suspicious, can you attach a log excerpt of a capture run?
Comment 6 Nandini Chandra 2016-02-29 15:04:51 EST
Emailed info to Marcel.
Comment 8 Marcel Hild 2016-04-27 07:28:20 EDT
@nandini: now that it's a high prio bug and we are targeted for GM, could you have a look at my response?
I am tempted to close this as NOTABUG. What do you think?
Comment 9 Marcel Hild 2016-08-17 09:41:30 EDT
closing due to inactivity. In case there is more info, please reopen

Note You need to log in before you can comment on or make changes to this bug.