Bug 1755915
Summary: | VM count is missing at the prometheus | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Container Native Virtualization (CNV) | Reporter: | guy chen <guchen> | ||||||
Component: | Virtualization | Assignee: | Daniel Belenky <dbelenky> | ||||||
Status: | CLOSED ERRATA | QA Contact: | guy chen <guchen> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 2.1.0 | CC: | cnv-qe-bugs, dbelenky, fdeutsch, fromani, igulina, ipinto, mgoldboi, msivak, ncredi, rmohr, sgordon, sgott, stirabos | ||||||
Target Milestone: | --- | ||||||||
Target Release: | 2.2.0 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | hyperconverged-cluster-operator-container-v2.2.0-37 virt-operator-container-v2.2.0-15 | Doc Type: | If docs needed, set a value | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2020-01-30 16:27:15 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1761501 | ||||||||
Attachments: |
|
Description
guy chen
2019-09-26 12:53:09 UTC
Description of problem: Installed a system and created a VM that started to run. Opened the prometheus and looked for cnv:vmi_status_running:count matric. There was no matric of vmi_status_running:count. Version-Release number of selected component (if applicable): kubevirt-ssp-operator:v2.1.0-17 virt-operator:v2.1.0-12 How reproducible: Always Steps to Reproduce: 1.Installed a system 2.created VM 3.Start VMI 4.Open prometheus and search for cnv:vmi_status_running:count matric. Actual results: There was no matric of vmi_status_running:count. Expected results: matric vmi_status_running:count will show 1 Additional info: After investigation with fromani, fix is - https://github.com/MarSik/kubevirt-ssp-operator/pull/98 The documentation part is handled in https://bugzilla.redhat.com/show_bug.cgi?id=1761501 The u/s backport to stable branch is here: https://github.com/MarSik/kubevirt-ssp-operator/pull/112 Still the vm count is missing, attached kubevirt matrices Steps: 1. Create and run VM 2. Connect to prometheus 3. Query the vm count results: Missing matric SSP version: container-native-virtualization-kubevirt-ssp-operator:v2.2.0-14 Created attachment 1646641 [details]
prometheus kubevirt list
Daniel, can you please take a look? Absolutely. OpenShift's monitoring-operator is looking for resources of kind "PrometheusRule" under namespaces that are marked with the openshift.io/cluster-monitoring="true" label. In the testing environment, a custom deploy script is used to deploy the HCO (https://pkgs.devel.redhat.com/cgit/containers/hco-bundle-registry/plain/marketplace-qe-testing.sh?h=cnv-2.2-rhel-8). When I've added the label to the namespace (openshift-cnv) the monitoring-operator picked it the next time it ran the reconcile loop and it appeared in the metrics board as expected. I don't want to just add the label to the QE's deploy script because I need to first understand why we're using a different deploy script in QE than what we give to our clients. We risk having false positives here, and the fix should go to the code our clients use. A PR to patch the deployment *upstream only*: https://github.com/kubevirt/hyperconverged-cluster-operator/pull/389. I still need to figure out how our customers are instructed to install the HCO because someone still has to create the namespace for it manually. Sent a patch to OpenShift docs: https://github.com/openshift/openshift-docs/pull/18806. Once it's merged, we can adjust the QE deploy script accordingly. The reason we request users to add a label to their namespace and not doing it from one of our components (i.e virt-operator) is that in order to add labels to namespaces we need a cluster role to update namespaces. I've spent more time on this issue and found a more elegant solution: virt-operator will take care of patching labels on the namespace. https://github.com/kubevirt/kubevirt/pull/2952/ The reason I took a different approach at first was that I thought that permissions to patch namespaces at a cluster scope might be "too much" of permissions for virt-operator. But, with OpenShift Project's we can deploy a Role in our namespace that will let us operate on our Project only. Verify with: container-native-virtualization-kubevirt-ssp-operator:v2.2.0-21 container-native-virtualization-virt-operator:v2.2.0-15 container-native-virtualization-hyperconverged-cluster-operator:v2.2.0-12 Metric: vmi_status_running:count - Exists See attached. Created attachment 1654838 [details]
vmi_count metric
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:0307 Removing old need info. |