Bug 1631300
| Summary: | metrics data is empty in CRI-O env | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Junqi Zhao <juzhao> |
| Component: | Installer | Assignee: | Russell Teague <rteague> |
| Installer sub component: | openshift-ansible | QA Contact: | Johnny Liu <jialiu> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | high | ||
| Priority: | high | CC: | aos-bugs, dapark, jokerman, mmccomas, sdodson, ssadhale, xtian |
| Version: | 3.11.0 | Keywords: | Regression, Reopened |
| Target Milestone: | --- | ||
| Target Release: | 3.11.z | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-03-05 14:39:05 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Attachments: | |||
|
Description
Junqi Zhao
2018-09-20 11:09:58 UTC
Blocks metrics testing on HA env Created attachment 1485103 [details]
metrics pods log
After some initial debugging, it appears that heapster is not sending metrics for any project. The only metric data stored in Cassandra is for the _system tenant which does come from heapster. This explains why the graphs are empty. I increased logging for heapster. I am not seeing any errors. We need to figure out whether or not heapster is actually collecting pod level metrics from different projects. Based on the logs I suspect it is not collecting pod metrics. Tested with
openshift_crio_var_sock: "/var/run/crio/crio.sock"
configmap qe-master in openshift-node is still have unix prefix
*****************************************************
container-runtime-endpoint:
- unix:///var/run/crio/crio.sock
image-service-endpoint:
- unix:///var/run/crio/crio.sock
*****************************************************
and /etc/origin/node/node-config.yaml in all nodes have unix prefix
*****************************************************
container-runtime-endpoint:
- unix:///var/run/crio/crio.sock
image-service-endpoint:
- unix:///var/run/crio/crio.sock
*****************************************************
after removing the unix prefix and restarting all atomic-openshift-node.service
we could see the data from web UI
maybe there are other playbooks needed to changed
for different nodes, we should also modify the confimaps for different nodes # cat /etc/sysconfig/atomic-openshift-node | grep BOOTSTRAP_CONFIG_NAME To change an existing cluster, change the config maps for the nodes i.e. BOOTSTRAP_CONFIG_NAME and restart the atomic-openshift-node service on each node. If you don't want to restart the services, the sync DS pod should restart the kubelet when it eventually notices (3m max) that the config map has changed. Tested again, the issue is fixed actually, see the attached picture, metrics data can be shown in web UI. The reason I thought it is not fixed and need workaround as mentioned in Comment 9 is because there is one error in our OCP environment build template. Although the issue is fixed, we can see the warning info, it is recommend us to use format "unix:///var/run/crio/crio.sock". # master-logs etcd etcd W0925 07:16:36.726552 19184 util_unix.go:75] Using "/var/run/crio/crio.sock" as endpoint is deprecated, please consider using full url format "unix:///var/run/crio/crio.sock". # openshift version openshift v3.11.14 openshift-ansible-3.11.14-1 Please change it to ON_QA, then we can close it Created attachment 1486724 [details]
metrics data is shown in web UI
add crio version cri-o://1.11.5 There are 3 namespaces(kube-system, openshift-node, openshift-sdn) could show network metrics graph, but it is empty for other namespace. See the attached pictures. Workaround is restart atomic-openshift-node on every nodes, then network metrics graph will be shown. maybe there should be fix in openshift-ansible Created attachment 1498395 [details]
network metrics graph could be shown - kube-system
Created attachment 1498396 [details]
There is notnetwork metrics graph - openshift-infra
Created attachment 1498397 [details]
network metrics graph could be shown after restarting atomic-openshift-node.service - openshift-infra
per c#15 Related fix: https://github.com/openshift/origin/pull/21398 This bug is tracking the openshift-ansible fix so this is just fyi. (In reply to Seth Jennings from comment #26) > Related fix: > https://github.com/openshift/origin/pull/21398 Is this PR fixed the empty network graph issue? My understanding of the matter is that the installer, in order to stop emitting deprecation warnings updated the default crio socket to be unix:///var/run/crio/crio.sock and that broke metrics gathering. Seth's PRs to openshift-ansible reverted that change so now configs are generated with a format that works for metrics gathering. If they had previously generated their configmaps they will either need to patch them or re-create them in order for this change to be distributed via the sync pod and services restarted automatically. Further, the PR referenced in comment 26 updates the kubelet to work with the new format. It has not merged, I have no idea when it will, so we have to rely on the installer fix for now to address this issue. Will verify this bug after the PR referenced in comment 26 is merged. I think this bug should be tested as is. We have resolved the problem for new installs and upgrades. The pull request referenced in comment 26 is supplementary but not required. Issue in Comment 20 is reported in Bug 1646886, close this defect |