Bug 1297494

Summary: pcp's pmiostat does not output all block devices as expected
Product: Red Hat Enterprise Linux 6 Reporter: Dwight (Bud) Brown <bubrown>
Component: pcpAssignee: Nathan Scott <nathans>
Status: CLOSED ERRATA QA Contact: Miloš Prchlík <mprchlik>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 6.7CC: brolley, fche, lberk, mbenitez, mgoodwin, mprchlik
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: pcp-3.10.9-4.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-10 21:13:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dwight (Bud) Brown 2016-01-11 16:20:47 UTC
Description of problem:
While iostat outputs cciss, fio, and nvme devices, pmiostat on rhel6 only outputs sd and dm devices

iostat:

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await   svctm  %util 
nvme0n1           0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
nvme0n1           0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
nvme0n1           0.00     0.00 104572.00    0.00 418288.00     0.00     8.00     2.93    0.03   0.00  27.80
nvme0n1           0.00     0.00 156114.29    0.00 624457.14     0.00     8.00     4.06    0.03   0.00  41.04
nvme0n1           0.00     0.00 288139.13    0.00 1152556.52     0.00     8.00     6.88    0.02   0.00  76.74
nvme0n1           0.00     0.00 299450.00    0.00 1197800.00     0.00     8.00     7.22    0.02   0.00  80.10
nvme0n1           0.00     0.00 223078.00    0.00 892312.00     0.00     8.00     5.38    0.02   0.00  57.00
nvme0n1           0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
nvme0n1           0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

$ pmiostat       | grep nvme
$ pmiostat -x dm | grep nmve

Version-Release number of selected component (if applicable):
latest in RH release stream:
pcp-libs-3.10.3-3.el6.x86_64
pcp-conf-3.10.3-3.el6.x86_64
python-pcp-3.10.3-3.el6.x86_64
pcp-3.10.3-3.el6.x86_64


How reproducible:
100%

Steps to Reproduce:
1.
2.
3.

Actual results:
no nvme drives output

Expected results:
nvme and other types of non-sdX, non-dm drives (cciss, fio, etc) that are present in /proc/diskstats output.

Additional info:

Comment 2 Nathan Scott 2016-01-12 01:11:39 UTC
Hi Bud,

As we discussed, I believe the NVME device aspect of this is covered by recently merged code from folk at Intel, which will be in the rebase.  I need a copy of /proc/diskstats from running systems with cciss and fio devices present to verify a fix for those ... hence, marking this BZ as needinfo for now.  If anyone can supply that info (please also make a note of kernel version from whence it came), that'd be greatly appreciated - thanks!

cheers.

Comment 3 Mark Goodwin 2016-01-12 01:30:58 UTC
We can get the needed info from the kernel src, anywhere add_disk() is called. e.g. for cciss :

/*
 * cciss_add_disk sets up the block device queue for a logical drive
 */
static int cciss_add_disk(ctlr_info_t *h, struct gendisk *disk,
                                int drv_index)
{
        disk->queue = blk_init_queue(do_cciss_request, &h->lock);
        if (!disk->queue)
                goto init_queue_failure;
        sprintf(disk->disk_name, "cciss/c%dd%d", h->ctlr, drv_index);
...
        add_disk(disk);
        return 0;

Comment 4 Nathan Scott 2016-01-12 01:52:02 UTC
Good point, thanks Mark.  I'm not seeing a fio driver below drivers/block though, and it'd be good to get actual /proc/diskstats anyway (for both) just in case there's some other name manipulation before exposure via /proc.

Comment 5 Nathan Scott 2016-01-12 01:54:53 UTC
The contents of /sys/block/<bdev> for some cciss and fio devices would be good to have a copy of, if possible.

Comment 6 Nathan Scott 2016-01-12 02:15:52 UTC
With a manually tweaked /proc/diskstats to inject cciss pattern (but no code changes)...

$ pminfo -f disk.dev.blktotal
    [...]
    inst [1 or "cciss/c12d11"] value 512800

So, it looks like this aspect at least may be OK in the pmdalinux code, but perhaps filtered in pmiostat Mark?

Comment 7 Mark Goodwin 2016-01-12 03:14:26 UTC
fusion-io driver isn't open source yet apparently (Fusion-IO now owned by Sandisk, who are now owned by Western Digital ....) but the kernel module binary for the block driver can be downloaded, see https://access.redhat.com/articles/40948

Comment 8 Nathan Scott 2016-01-12 03:24:49 UTC
(In reply to Nathan Scott from comment #6)
> So, it looks like this aspect at least may be OK in the pmdalinux code, but
> perhaps filtered in pmiostat Mark?

That aspect looks OK too ...

# Device      rrqm/s  wrqm/s    r/s    w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await %util
cciss/c12d11     0.0     0.0    0.0    0.0      0.0      0.0     0.00     0.00     0.0     0.0     0.0   0.0
sda              0.0     0.0    0.0    0.0      0.0      0.0     0.00     0.00     0.0     0.0     0.0   0.0
cciss/c12d11     0.0     0.0    0.0    0.0      0.0      0.0     0.00     0.00     0.0     0.0     0.0   0.0
sda              0.0     0.0    0.0    0.0      0.0      0.0     0.00     0.00     0.0     0.0     0.0   0.0

I think we may be done here, Bud - with the NVME fix already included in the 6.8 rebase already.  Unless there's been an observed fio driver problem, I'll close this one out.

Comment 9 Mark Goodwin 2016-01-12 04:18:43 UTC
pmiostat doesn't do any client side filtering, other than to report *either* dm devices (using the disk.dm.* instance domain) or all other block devices (disk.dev.* instance domain).

See also -

BZ #1293642 - "RFE: pmiostat -x dm does not display all dm devices" where we will be adding support for pmiostat -xdm,sd  to report for all block devices listed in /proc/diskstats (both dm and all non-dm). The current default is to only report non-dm block devices. Might need to rethink this a bit since "non-dm" devices actually includes more than just scsi disk devices (e.g sd, cciss, nvme, fio, etc.) so perhaps the default should be all block devices, with -xdm to restrict it to just dm devices.

BZ #1293444 - "RFE: need hba and fc target aggregation" where we will implement filtering and aggregation functionality. With this feature, filtering would provide a superset of the new functionality proposed in BZ #1293642.

So overall perhaps BZ #1293642 and this BZ #1297494 should be dup'd to BZ #1293444 and we make the default (in the absence of any filtering) include both dm and non-dm devices, i.e. all block devices.

Comment 10 Nathan Scott 2016-01-12 04:36:13 UTC
(In reply to Mark Goodwin from comment #9)
> [...]   Might need to rethink this a bit since
> "non-dm" devices actually includes more than just scsi disk devices (e.g sd,
> cciss, nvme, fio, etc.) so perhaps the default should be all block devices,
> with -xdm to restrict it to just dm devices.

Yep, that'd be ideal I think.

> BZ #1293444 - "RFE: need hba and fc target aggregation" where we will
> implement filtering and aggregation functionality. With this feature,
> filtering would provide a superset of the new functionality proposed in BZ
> #1293642.
> 
> So overall perhaps BZ #1293642 and this BZ #1297494 should be dup'd to BZ
> #1293444 and we make the default (in the absence of any filtering) include
> both dm and non-dm devices, i.e. all block devices.

+1

Comment 11 Dwight (Bud) Brown 2016-01-12 21:26:39 UTC
(In reply to Nathan Scott from comment #8)
> I think we may be done here, Bud - with the NVME fix already included in the
> 6.8 rebase already.  Unless there's been an observed fio driver problem,
> I'll close this one out.

Please wait until after the code ships and is publically available before closing this. Assuming the rebased/shipped kit in 6.8 will be pcp-3.10.9-3.el6?

Comment 12 Nathan Scott 2016-01-12 21:38:51 UTC
(In reply to Dwight (Bud) Brown from comment #11)
> Assuming the rebased/shipped kit in 6.8 will be pcp-3.10.9-3.el6?

Yep (-3 or later), and the NVME driver fix was in upstream pcp-3.10.9.

Comment 13 Dwight (Bud) Brown 2016-01-12 22:44:52 UTC
From a procfs grab on 2.6.18-348.el5 (but field format is the same for RHEL 6):

$ cat proc/diskstats
   1    0 ram0 0 0 0 0 0 0 0 0 0 0 0
   1    1 ram1 0 0 0 0 0 0 0 0 0 0 0
   1    2 ram2 0 0 0 0 0 0 0 0 0 0 0
   1    3 ram3 0 0 0 0 0 0 0 0 0 0 0
   1    4 ram4 0 0 0 0 0 0 0 0 0 0 0
   1    5 ram5 0 0 0 0 0 0 0 0 0 0 0
   1    6 ram6 0 0 0 0 0 0 0 0 0 0 0
   1    7 ram7 0 0 0 0 0 0 0 0 0 0 0
   1    8 ram8 0 0 0 0 0 0 0 0 0 0 0
   1    9 ram9 0 0 0 0 0 0 0 0 0 0 0
   1   10 ram10 0 0 0 0 0 0 0 0 0 0 0
   1   11 ram11 0 0 0 0 0 0 0 0 0 0 0
   1   12 ram12 0 0 0 0 0 0 0 0 0 0 0
   1   13 ram13 0 0 0 0 0 0 0 0 0 0 0
   1   14 ram14 0 0 0 0 0 0 0 0 0 0 0
   1   15 ram15 0 0 0 0 0 0 0 0 0 0 0
   8    0 sda 14181 14972 898418 125977 19819 52939 582524 2007972 0 294789 2133949
   8    1 sda1 74 997 2160 501 2 0 4 24 0 419 525
   8    2 sda2 14085 13958 895946 125308 19817 52939 582520 2007948 0 294450 2133257
   8   16 sdb 200 199 3192 394 0 0 0 0 0 391 394
   8   17 sdb1 32 78 880 89 0 0 0 0 0 89 89
   8   18 sdb2 136 55 1528 266 0 0 0 0 0 266 266
 104    0 cciss/c0d0 177 416 4744 529 0 0 0 0 0 372 529
 104    1 cciss/c0d0p1 34 78 896 107 0 0 0 0 0 107 107
 104    2 cciss/c0d0p2 34 78 896 70 0 0 0 0 0 70 70
 104    3 cciss/c0d0p3 34 78 896 204 0 0 0 0 0 204 204
 104    4 cciss/c0d0p4 35 77 896 72 0 0 0 0 0 72 72
 104   16 cciss/c0d1 63 453 1716 159 0 0 0 0 0 159 159
 104   17 cciss/c0d1p1 32 372 820 80 0 0 0 0 0 80 80
 104   32 cciss/c0d2 36 78 912 103 0 0 0 0 0 103 103
 104   48 cciss/c0d3 159 2457 3510 364 0 0 0 0 0 248 363
 104   49 cciss/c0d3p1 36 796 850 100 0 0 0 0 0 100 100
 104   50 cciss/c0d3p2 45 787 850 92 0 0 0 0 0 83 92
 104   51 cciss/c0d3p3 45 787 850 88 0 0 0 0 0 70 88
 105    0 cciss/c1d0 37 78 920 108 0 0 0 0 0 108 108
 253    0 dm-0 27860 0 894618 539113 72815 0 582520 10384808 0 294260 10923926
 253    1 dm-1 129 0 1032 1173 0 0 0 0 0 157 1173
  11    0 sr0 0 0 0 0 0 0 0 0 0 0 0
  11    1 sr1 0 0 0 0 0 0 0 0 0 0 0
   8   32 sdc 48 31 632 31 0 0 0 0 0 31 31
   8   48 sdd 47 31 624 89 0 0 0 0 0 89 89
   8   64 sde 62 194 892 133 0 0 0 0 0 133 133
   8   65 sde1 23 172 404 65 0 0 0 0 0 65 65
   8   80 sdf 167 808 2214 293 0 0 0 0 0 139 293
   8   81 sdf1 123 777 1614 247 0 0 0 0 0 93 247
   8   85 sdf5 0 0 0 0 0 0 0 0 0 0 0
   8   86 sdf6 0 0 0 0 0 0 0 0 0 0 0
   8   87 sdf7 0 0 0 0 0 0 0 0 0 0 0
   8   96 sdg 62 194 892 122 0 0 0 0 0 122 122
   8   97 sdg1 23 172 404 62 0 0 0 0 0 62 62
   8  112 sdh 65 100 912 127 0 0 0 0 0 127 127
   8  113 sdh1 24 78 408 91 0 0 0 0 0 91 91
   8  128 sdi 47 31 624 10 0 0 0 0 0 10 10
   8  144 sdj 54 46 800 10 0 0 0 0 0 10 10
   8  160 sdk 48 31 632 5 0 0 0 0 0 5 5
   8  176 sdl 54 46 800 9 0 0 0 0 0 9 9
   8  192 sdm 60 52 896 13 0 0 0 0 0 13 13
   8  193 sdm1 49 31 640 11 0 0 0 0 0 11 11
   8  208 sdn 48 31 632 40 0 0 0 0 0 40 40
   8  224 sdo 47 31 624 31 0 0 0 0 0 31 31
   8  240 sdp 62 194 892 33 0 0 0 0 0 33 33
   8  241 sdp1 23 172 404 19 0 0 0 0 0 19 19
  65    0 sdq 167 808 2214 315 0 0 0 0 0 206 315
  65    1 sdq1 123 777 1614 234 0 0 0 0 0 125 234
  65    5 sdq5 0 0 0 0 0 0 0 0 0 0 0
  65    6 sdq6 0 0 0 0 0 0 0 0 0 0 0
  65    7 sdq7 0 0 0 0 0 0 0 0 0 0 0
  65   16 sdr 62 194 892 38 0 0 0 0 0 38 38
  65   17 sdr1 23 172 404 26 0 0 0 0 0 26 26
  65   32 sds 65 100 912 64 0 0 0 0 0 64 64
  65   33 sds1 24 78 408 33 0 0 0 0 0 33 33
  65   48 sdt 47 31 624 9 0 0 0 0 0 9 9
  65   64 sdu 54 46 800 7 0 0 0 0 0 7 7
  65   80 sdv 48 31 632 5 0 0 0 0 0 5 5
  65   96 sdw 54 46 800 6 0 0 0 0 0 6 6
  65  112 sdx 60 52 896 12 0 0 0 0 0 12 12
  65  113 sdx1 49 31 640 8 0 0 0 0 0 8 8
   9    0 md0 0 0 0 0 0 0 0 0 0 0 0
 253    2 dm-2 15 0 120 37 0 0 0 0 0 37 37
 253    3 dm-3 15 0 120 13 0 0 0 0 0 13 13
 253    4 dm-4 15 0 120 37 0 0 0 0 0 37 37


Can't help with fio... well maybeeee I can, I'll bug the perf group and if they aren't running tests on their fio array I may be able to get on their system to grab the data.  But most the time the system is busy running performance tests with 3rd party software packages like SAP, Oracle, etc.

Comment 14 Nathan Scott 2016-01-13 02:24:27 UTC
(In reply to Dwight (Bud) Brown from comment #13)
> From a procfs grab on 2.6.18-348.el5 (but field format is the same for RHEL
> 6):
> 
> $ cat proc/diskstats
> [...]

Thanks Bud, I've added that to the automated regression tests for PCP.

> Can't help with fio... well maybeeee I can, I'll bug the perf group and if
> they aren't running tests on their fio array I may be able to get on their
> system to grab the data.  But most the time the system is busy running
> performance tests with 3rd party software packages like SAP, Oracle, etc.

That'd be great if you can manage it.

cheers.

Comment 16 Miloš Prchlík 2016-02-08 09:45:08 UTC
Verified for build pcp-3.10.9-5.el6.

Comment 18 errata-xmlrpc 2016-05-10 21:13:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0825.html