Bug 1779419
| Summary: | dstat does not show disk statistics | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | olaf.weiser <olaf.weiser> | ||||
| Component: | pcp | Assignee: | Nathan Scott <nathans> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Jan Kurik <jkurik> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 8.1 | CC: | agerstmayr, chorn, jkurik, mgoodwin, mnewsome, myllynen, nathans, patrickm | ||||
| Target Milestone: | rc | Keywords: | Bugfix, Triaged | ||||
| Target Release: | 8.2 | Flags: | pm-rhel:
mirror+
|
||||
| Hardware: | ppc64le | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | pcp-5.0.2 | Doc Type: | No Doc Update | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2020-04-28 15:40:22 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
olaf.weiser@de.ibm.com
2019-12-03 23:34:51 UTC
Hi there Olaf, The values in the disk column comes from ... $ head -14 /etc/pcp/dstat/disk # # pcp-dstat(1) configuration file - see pcp-dstat(5) # [disk] label = dsk/%I printtype = b precision = 0 grouptype = 2 reads = disk.dev.read_bytes reads.label = read writes = disk.dev.write_bytes writes.label = writ ... could you attach the output from the following command on this system? $ pminfo -L -f disk.dev.read_bytes $ cat /proc/diskstats Thanks! [root@ ~]# pminfo -L -f disk.dev.read_bytes | wc -l
1274
[root@ ~]# pminfo -L -f disk.dev.read_bytes | head -20
disk.dev.read_bytes
inst [0 or "sda"] value 46088
inst [1 or "sdb"] value 4148913
inst [2 or "sde"] value 40380656
inst [3 or "sdc"] value 36606744
inst [4 or "sdr"] value 91868924
inst [5 or "sdk"] value 40163592
inst [6 or "sdd"] value 35397968
inst [7 or "sdaf"] value 40063352
inst [8 or "sdaa"] value 54474488
inst [9 or "sdah"] value 55986024
inst [10 or "sdg"] value 40295152
inst [11 or "sdaj"] value 40170128
inst [12 or "sdx"] value 39838808
inst [13 or "sdab"] value 55742472
inst [14 or "sdn"] value 91288616
inst [15 or "sdp"] value 56656768
inst [16 or "sdl"] value 91364784
inst [17 or "sdh"] value 61786808
[root@ ~]# cat /proc/diskstats | wc -l
2551
[root@ ~]# cat /proc/diskstats | head -20
8 0 sda 982 0 92177 3212 115 8 8176 733 0 4560 2310 0 0 0 0
8 1 sda1 750 0 72585 2059 87 8 8176 523 0 3870 1520 0 0 0 0
8 16 sdb 73077 11217 8297826 443241 4639861 305612 214851048 40944798 0 26608180 27116860 0 0 0 0
8 17 sdb1 1298 0 161232 8308 0 0 0 0 0 5720 5480 0 0 0 0
8 18 sdb2 662 0 170089 4408 44 0 4528 415 0 3590 3130 0 0 0 0
8 19 sdb3 70899 11217 7948705 425815 3838946 305612 214846520 22308160 0 26163540 12570110 0 0 0 0
8 64 sde 1224255 7781 80761312 745075 1274 0 5169680 34945 0 481770 440650 0 0 0 0
8 65 sde1 13992 0 449568 2445 0 0 0 0 0 2490 10 0 0 0 0
8 32 sdc 19625 0 73213488 784906 186683 0 762787208 12988736 0 3694200 12747410 0 0 0 0
65 16 sdr 1248817 10671 183737848 1908524 204695 0 837042984 17617101 0 4600100 17979960 0 0 0 0
65 17 sdr1 14014 0 540032 3426 0 0 0 0 0 2990 610 0 0 0 0
8 160 sdk 1224443 7542 80327184 706375 1266 0 5144880 33811 0 477930 403840 0 0 0 0
8 161 sdk1 13992 0 449568 2481 0 0 0 0 0 2520 20 0 0 0 0
8 48 sdd 19095 0 70795936 735641 181047 0 740351848 10617225 0 3605480 10357910 0 0 0 0
65 240 sdaf 1222902 9016 80126704 769254 1243 0 5053904 33106 0 485770 448070 0 0 0 0
65 241 sdaf1 13965 27 449568 2510 0 0 0 0 0 2560 50 0 0 0 0
65 160 sdaa 1232979 8369 108948976 1199837 176427 0 721319160 10416379 0 3781320 10351750 0 0 0 0
65 161 sdaa1 14009 0 519472 3102 0 0 0 0 0 2930 480 0 0 0 0
66 16 sdah 1234511 7510 111972048 1212308 196714 1 804880552 14425492 0 4105150 14257540 0 0 0 0
66 17 sdah1 14014 0 540032 3587 0 0 0 0 0 3000 670 0 0 0 0
pls tell me, if you need all lines ..
| pls tell me, if you need all lines .. Yes please (via 'Add an attachment' in bugzilla), that should allow me to reproduce the problem exactly as you see it and confirm any fix we come up with. Created attachment 1642094 [details]
output of commands
Thanks Olaf, I've been able to reproduce the problem with those files. I also have a workaround for you.
The fundamental issue is that you have more devices than a hard limit being imposed by one of our python libraries, and the way dstat interfaces to that library is not sufficiently dynamic to handle this limitation (resolving this is what I'll work on next).
The temporary workaround is to edit your local dstat (python) script as follows:
diff --git a/src/pcp/dstat/pcp-dstat.py b/src/pcp/dstat/pcp-dstat.py
index 29d63bf01..632f83795 100755
--- a/src/pcp/dstat/pcp-dstat.py
+++ b/src/pcp/dstat/pcp-dstat.py
@@ -1024,7 +1024,7 @@ class DstatTool(object):
sys.exit(1)
self.pmconfig.validate_common_options()
- self.pmconfig.validate_metrics()
+ self.pmconfig.validate_metrics(False, 2048)
for i, plugin in enumerate(self.totlist):
for name in self.metrics:
This raises an internal limit to cater for your case of more than 1024 disk instances. In the meantime, I'll work on making the code dynamically adjust this value based on the number of instances observed.
cheers.
Hallo Nathan, I changed that line as proposed .. and now .. it works like a charm thank you very much .. . to doc what it looks like before: [root@ ~]# dstat You did not select any stats, using -cdngy by default. ----total-usage---- -dsk/total- -net/total- ---paging-- ---system-- usr sys idl wai stl| read writ| recv send| in out | int csw 1 2 97 0 0| | 33k 133k| 0 0 |5309 3772 1 1 98 0 0| | 26k 191k| 0 0 |4844 3597 0 1 99 0 0| | 14k 1628 | 0 0 |1705 3341 ^C after / with the changed line: [root@ ~]# dstat You did not select any stats, using -cdngy by default. ----total-usage---- -dsk/total- -net/total- ---paging-- ---system-- usr sys idl wai stl| read writ| recv send| in out | int csw 0 1 98 0 0| 0 0 | 13k 642 | 0 0 |1347 3222 1 1 98 0 0| 291M 0 | 12k 306 | 0 0 |1541 1383 1 1 98 0 0| 251M 0 | 11k 330 | 0 0 |1208 1837 1 1 98 0 0| 20M 36k| 12k 330 | 0 0 |1387 3110 ^C Nathan, anything else you need from our side ? Nothing else thanks Olaf - the complete fix is now merged upstream and will make its way into the next PCP RHEL update (later this week for 8.2). Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:1628 Hi Laff and Nathan,
just an FYI that we are hitting this also on rhel7.
As per rhel7 life cycle, this is probably not fitting rhel7.x errata criteria.
Reproducer for anybody who wants to replicate, a KVM guest with rhel7 is enough:
# modprobe scsi-debug add_host=16 num_parts=32 num_tgts=64
# wc -l /proc/partitions
5130 /proc/partitions
# pcp dstat
You did not select any stats, using -cdngy by default.
----total-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai stl| read writ| recv send| in out | int csw
1 3 96 0 0| | 63 159 | 0 0 | 236 158
0 1 99 0 0| | 398 666 | 0 0 | 103 106
0 1 99 0 0| | 66 308 | 0 0 | 105 79
The patch from #5 works on the replicator.
Christian
Thanks Christian, One more thing worth mentioning for anyone reading this BZ... Since dstat (aka pcp-dstat) is now a PCP client tool, you can use a remote (newer) system for running analysis. IOW if you have a production system that cannot be upgraded you can make use of pcp-dstat on a remote, recent (even Fedora or Ubuntu) distribution to run dstat pointing at the production host, $ pcp --host <hostname> dstat Thus, you can use the newer, fixed version of pcp-dstat using the pmcd and metrics of a RHEL7 host. Similarly, we can take PCP archives from that production host and analyse them with more recent versions of PCP, ... $ pcp --archive <path> dstat --all --time cheers. Great idea, had not thought of that. kbase accordingly modified, that might help some affected customers. |