Description of problem: Since collectd is case sensitive and defaults to uppercase hostnames, collectd calls can fail if salt knows the host with lowercase letters. Example error: May 26 05:18:00 rhs-c skyring[6122]: 2016-05-26T05:18:00+0000 INFO saltwrapper.py:32 saltwrapper.wrapper] rv={'rhcs2': {'PercentUsed': "command execution failed\ncommand: ['collectdctl', 'getval', 'rhcs2/cpu/percent-user']\nexit code: 255\nstderr: ERROR: Server error: No such value\n\nstdout: \n"}} On the host itself we see: [root@rhcs2 vagrant]# collectdctl getval rhcs2/cpu/percent-user ERROR: Server error: No such value [root@rhcs2 vagrant]# collectdctl getval cpu/percent-user value=1.584689e+01 [root@rhcs2 vagrant]# collectdctl getval RHCS2/cpu/percent-user value=1.584689e+01 So either we need to convert the hostname to uppercase letters before sending it, or we omit it (might not work with every command) Version-Release number of selected component (if applicable): Most recent How reproducible: Every time Actual results: Graphs in RHS-C UI are empty Expected results: Graphs get populated Additional info:
I think the easiest way to fix this is to syncronise the way we identify hosts. When we request the information we currently use the salt id: https://github.com/skyrings/skyring/blob/master/salt_module/collectd.py#L41 While when we tell collectd its hostname, we use the fqdn returned by salt grains: https://github.com/skyrings/skyring/blob/master/backend/salt/conf/collectd/collectd.conf#L10 Suggestion: Use the salt id in the template for the collectd.conf file
QA don't see this issue because our environment has hostnames in lowercase only. Could you please check if you still see this issue with last builds?
I'm currently busy with an engagement, so I will probably not get to this till Friday.
Tested by code review of https://review.gerrithub.io/#/c/278729/3/backend/salt/conf/collectd/collectd.conf and I think it can be fixed. Let's wait on Chris testing on Friday.
it seems that there is still this issue: 2016-08-11T23:32:15.75+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:899d2db7-ecf3-4cab-8975-46bb030e689c - Error Failed to get the instant stat of ping.ping-MKUDLEJ-USM3-SERVER resource of MKUDLEJ-USM3-MON3 2016-08-11T23:32:15.773+02:00 WARNING monitoring.go:284 FetchAggregatedStatsFromGraphite] skyring:899d2db7-ecf3-4cab-8975-46bb030e689c - Error Failed to get the instant stat of disk-*.disk_ops.write resource of MKUDLEJ-USM3-MON3 2016-08-11T23:32:15.777+02:00 WARNING monitoring.go:284 FetchAggregatedStatsFromGraphite] skyring:899d2db7-ecf3-4cab-8975-46bb030e689c - Error Failed to get the instant stat of interface*.if_octets.rx resource of MKUDLEJ-USM3-MON3 2016-08-11T23:32:15.782+02:00 WARNING monitoring.go:284 FetchAggregatedStatsFromGraphite] skyring:899d2db7-ecf3-4cab-8975-46bb030e689c - Error Failed to get the instant stat of interface*.if_octets.tx resource of MKUDLEJ-USM3-MON3 2016-08-11T23:32:15.786+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:899d2db7-ecf3-4cab-8975-46bb030e689c - Error Failed to get the instant stat of ping.ping-MKUDLEJ-USM3-SERVER resource of MKUDLEJ-USM3-NODE1 2016-08-11T23:32:15.791+02:00 WARNING monitoring.go:284 FetchAggregatedStatsFromGraphite] skyring:899d2db7-ecf3-4cab-8975-46bb030e689c - Error Failed to get the instant stat of disk-*.disk_ops.read resource of MKUDLEJ-USM3-NODE1 2016-08-11T23:32:15.795+02:00 WARNING monitoring.go:284 FetchAggregatedStatsFromGraphite] skyring:899d2db7-ecf3-4cab-8975-46bb030e689c - Error Failed to get the instant stat of disk-*.disk_ops.write resource of MKUDLEJ-USM3-NODE1 2016-08-11T23:32:15.81+02:00 WARNING monitoring.go:284 FetchAggregatedStatsFromGraphite] skyring:899d2db7-ecf3-4cab-8975-46bb030e689c - Error Failed to get the instant stat of interface*.if_octets.tx resource of MKUDLEJ-USM3-NODE1 2016-08-11T23:32:15.814+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:899d2db7-ecf3-4cab-8975-46bb030e689c - Error Failed to get the instant stat of ping.ping-MKUDLEJ-USM3-SERVER resource of MKUDLEJ-USM3-NODE4 2016-08-11T23:32:15.818+02:00 WARNING monitoring.go:284 FetchAggregatedStatsFromGraphite] skyring:899d2db7-ecf3-4cab-8975-46bb030e689c - Error Failed to get the instant stat of disk-*.disk_ops.read resource of MKUDLEJ-USM3-NODE4 2016-08-11T23:32:15.822+02:00 WARNING monitoring.go:284 FetchAggregatedStatsFromGraphite] skyring:899d2db7-ecf3-4cab-8975-46bb030e689c - Error Failed to get the instant stat of disk-*.disk_ops.write resource of MKUDLEJ-USM3-NODE4 2016-08-11T23:32:15.826+02:00 WARNING monitoring.go:284 FetchAggregatedStatsFromGraphite] skyring:899d2db7-ecf3-4cab-8975-46bb030e689c - Error Failed to get the instant stat of interface*.if_octets.rx resource of MKUDLEJ-USM3-NODE4 2016-08-11T23:32:15.843+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:899d2db7-ecf3-4cab-8975-46bb030e689c - Error Failed to get the instant stat of ping.ping-MKUDLEJ-USM3-SERVER resource of MKUDLEJ-USM3-NODE2 2016-08-11T23:32:15.847+02:00 WARNING monitoring.go:284 FetchAggregatedStatsFromGraphite] skyring:899d2db7-ecf3-4cab-8975-46bb030e689c - Error Failed to get the instant stat of disk-*.disk_ops.read resource of MKUDLEJ-USM3-NODE2 2016-08-11T23:32:15.851+02:00 WARNING monitoring.go:284 FetchAggregatedStatsFromGraphite] skyring:899d2db7-ecf3-4cab-8975-46bb030e689c - Error Failed to get the instant stat of disk-*.disk_ops.write resource of MKUDLEJ-USM3-NODE2 2016-08-11T23:32:15.855+02:00 WARNING monitoring.go:284 FetchAggregatedStatsFromGraphite] skyring:899d2db7-ecf3-4cab-8975-46bb030e689c - Error Failed to get the instant stat of interface*.if_octets.rx resource of MKUDLEJ-USM3-NODE2 2016-08-11T23:32:15.859+02:00 WARNING monitoring.go:284 FetchAggregatedStatsFromGraphite] skyring:899d2db7-ecf3-4cab-8975-46bb030e689c - Error Failed to get the instant stat of interface*.if_octets.tx resource of MKUDLEJ-USM3-NODE2 2016-08-11T23:35:14.974+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of aggregation-memory-sum.memory resource of MKUDLEJ-USM3-MON3 2016-08-11T23:35:15.009+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of memory.percent-used resource of MKUDLEJ-USM3-NODE4 2016-08-11T23:35:15.016+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of memory.memory-used resource of MKUDLEJ-USM3-MON3 2016-08-11T23:35:15.047+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of aggregation-memory-sum.memory resource of MKUDLEJ-USM3-NODE1 2016-08-11T23:35:15.052+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of aggregation-memory-sum.memory resource of MKUDLEJ-USM3-NODE3 2016-08-11T23:35:15.052+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of cpu.percent-user resource of MKUDLEJ-USM3-MON3 2016-08-11T23:35:15.053+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of aggregation-memory-sum.memory resource of MKUDLEJ-USM3-NODE4 2016-08-11T23:35:15.08+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of memory.memory-used resource of MKUDLEJ-USM3-NODE1 2016-08-11T23:35:15.095+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of cpu.percent-user resource of MKUDLEJ-USM3-NODE1 2016-08-11T23:35:15.115+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of swap.swap-used resource of MKUDLEJ-USM3-MON3 2016-08-11T23:35:15.115+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of memory.memory-used resource of MKUDLEJ-USM3-NODE4 2016-08-11T23:35:15.122+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of cpu.percent-user resource of MKUDLEJ-USM3-NODE4 2016-08-11T23:35:15.128+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of swap.swap-used resource of MKUDLEJ-USM3-NODE1 2016-08-11T23:35:15.128+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of cpu.percent-user resource of MKUDLEJ-USM3-NODE3 2016-08-11T23:35:15.133+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of swap.swap-used resource of MKUDLEJ-USM3-NODE4 2016-08-11T23:35:15.133+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of swap.percent-used resource of MKUDLEJ-USM3-MON3 2016-08-11T23:35:15.139+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of swap.percent-used resource of MKUDLEJ-USM3-NODE1 2016-08-11T23:35:15.143+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of swap.percent-used resource of MKUDLEJ-USM3-NODE4 2016-08-11T23:35:15.179+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of aggregation-swap-sum.swap resource of MKUDLEJ-USM3-NODE4 2016-08-11T23:35:15.161+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of aggregation-swap-sum.swap resource of MKUDLEJ-USM3-NODE1 2016-08-11T23:35:15.191+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of interface-average.bytes-total_bandwidth_used resource of MKUDLEJ-USM3-NODE1 2016-08-11T23:35:15.197+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of swap.percent-used resource of MKUDLEJ-USM3-NODE3 2016-08-11T23:35:15.2+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of interface-average.bytes-total_bandwidth resource of MKUDLEJ-USM3-NODE1 2016-08-11T23:35:15.207+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of interface-average.percent-network_utilization resource of MKUDLEJ-USM3-NODE1 2016-08-11T23:35:15.207+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of interface-average.bytes-total_bandwidth_used resource of MKUDLEJ-USM3-NODE4 2016-08-11T23:35:15.269+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of aggregation-swap-sum.swap resource of MKUDLEJ-USM3-NODE3 2016-08-11T23:35:15.27+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of interface-average.percent-network_utilization resource of MKUDLEJ-USM3-MON3 2016-08-11T23:35:15.275+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of interface-average.bytes-total_bandwidth resource of MKUDLEJ-USM3-NODE4 2016-08-11T23:35:15.304+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of interface-average.percent-network_utilization resource of MKUDLEJ-USM3-NODE4 2016-08-11T23:35:15.304+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of interface-average.bytes-total_bandwidth_used resource of MKUDLEJ-USM3-NODE3 2016-08-11T23:35:15.323+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of interface-average.bytes-total_bandwidth resource of MKUDLEJ-USM3-NODE3 2016-08-11T23:35:15.361+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of interface-average.percent-network_utilization resource of MKUDLEJ-USM3-NODE3 2016-08-11T23:35:15.395+02:00 WARNING monitoring.go:284 FetchAggregatedStatsFromGraphite] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of disk-*.disk_ops.read resource of MKUDLEJ-USM3-NODE4 2016-08-11T23:35:15.412+02:00 WARNING monitoring.go:284 FetchAggregatedStatsFromGraphite] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of disk-*.disk_ops.write resource of MKUDLEJ-USM3-NODE4 2016-08-11T23:35:15.422+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of interface-average.percent-network_utilization resource of MKUDLEJ-USM3-MON2 2016-08-11T23:35:15.483+02:00 WARNING monitoring.go:284 FetchAggregatedStatsFromGraphite] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of interface*.if_octets.rx resource of MKUDLEJ-USM3-NODE4 2016-08-11T23:35:15.484+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of interface-average.percent-network_utilization resource of MKUDLEJ-USM3-MON1 2016-08-11T23:35:15.489+02:00 WARNING monitoring.go:284 FetchAggregatedStatsFromGraphite] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of disk-*.disk_ops.write resource of MKUDLEJ-USM3-MON3 2016-08-11T23:35:15.548+02:00 WARNING monitoring.go:284 FetchAggregatedStatsFromGraphite] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of interface*.if_octets.tx resource of MKUDLEJ-USM3-NODE1 2016-08-11T23:35:15.55+02:00 WARNING monitoring.go:284 FetchAggregatedStatsFromGraphite] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of interface*.if_octets.rx resource of MKUDLEJ-USM3-MON3 2016-08-11T23:35:15.617+02:00 WARNING monitoring.go:284 FetchAggregatedStatsFromGraphite] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of interface*.if_octets.tx resource of MKUDLEJ-USM3-MON3 2016-08-11T23:35:15.93+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of aggregation-memory-sum.memory resource of MKUDLEJ-USM3-NODE2 2016-08-11T23:35:15.934+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of memory.memory-used resource of MKUDLEJ-USM3-NODE2 2016-08-11T23:35:15.938+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of cpu.percent-user resource of MKUDLEJ-USM3-NODE2 2016-08-11T23:35:15.955+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of swap.percent-used resource of MKUDLEJ-USM3-NODE2 2016-08-11T23:35:15.959+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of aggregation-swap-sum.swap resource of MKUDLEJ-USM3-NODE2 2016-08-11T23:35:15.963+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of interface-average.bytes-total_bandwidth_used resource of MKUDLEJ-USM3-NODE2 2016-08-11T23:35:15.966+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of interface-average.bytes-total_bandwidth resource of MKUDLEJ-USM3-NODE2 2016-08-11T23:35:15.971+02:00 WARNING monitoring.go:271 FetchStatFromGraphiteWithErrorIndicate] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of interface-average.percent-network_utilization resource of MKUDLEJ-USM3-NODE2 2016-08-11T23:35:15.976+02:00 WARNING monitoring.go:284 FetchAggregatedStatsFromGraphite] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of disk-*.disk_ops.read resource of MKUDLEJ-USM3-NODE2 2016-08-11T23:35:15.98+02:00 WARNING monitoring.go:284 FetchAggregatedStatsFromGraphite] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of disk-*.disk_ops.write resource of MKUDLEJ-USM3-NODE2 2016-08-11T23:35:15.984+02:00 WARNING monitoring.go:284 FetchAggregatedStatsFromGraphite] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of interface*.if_octets.rx resource of MKUDLEJ-USM3-NODE2 2016-08-11T23:35:15.988+02:00 WARNING monitoring.go:284 FetchAggregatedStatsFromGraphite] skyring:84e26be4-2487-4c8d-9a4d-0dac5267d946 - Error Failed to get the instant stat of interface*.if_octets.tx resource of MKUDLEJ-USM3-NODE2
Martin, These logs are coming because console is not able to fetch the stats from graphite this is because it takes some time to the tune of 30 mins-1 hour post cluster creation for all stats to appear in graphite. Until then these warning logs appear whenever stat fetch fails. It would be helpful if you could share the setup where you faced the issue in the bug.
Martin, also could you please check if the value after "Hostname" in collectd.conf(/etc/collectd.conf in rhel and /etc/collectd/collectd.conf in ubuntu) and the o/p of hostname command are same.
Martin, as we discussed things are working fine, no change required. So I am moving it to ON QA, can you please mark it verified.
It seems that these errors were in logs because of 30 minutes update data delay. --> VERIFIED
do you still need my input?