Bug 1755915

Summary: VM count is missing at the prometheus
Product: Container Native Virtualization (CNV) Reporter: guy chen <guchen>
Component: VirtualizationAssignee: Daniel Belenky <dbelenky>
Status: CLOSED ERRATA QA Contact: guy chen <guchen>
Severity: high Docs Contact:
Priority: high    
Version: 2.1.0CC: cnv-qe-bugs, dbelenky, fdeutsch, fromani, igulina, ipinto, mgoldboi, msivak, ncredi, rmohr, sgordon, sgott, stirabos
Target Milestone: ---   
Target Release: 2.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: hyperconverged-cluster-operator-container-v2.2.0-37 virt-operator-container-v2.2.0-15 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-01-30 16:27:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1761501    
Attachments:
Description Flags
prometheus kubevirt list
none
vmi_count metric none

Description guy chen 2019-09-26 12:53:09 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 guy chen 2019-09-26 13:00:03 UTC
Description of problem:
Installed a system and created a VM that started to run.
Opened the prometheus and looked for cnv:vmi_status_running:count matric.
There was no matric of vmi_status_running:count.

Version-Release number of selected component (if applicable):

kubevirt-ssp-operator:v2.1.0-17
virt-operator:v2.1.0-12

How reproducible:
Always

Steps to Reproduce:
1.Installed a system
2.created VM
3.Start VMI
4.Open prometheus and search for cnv:vmi_status_running:count matric.

Actual results:
There was no matric of vmi_status_running:count.

Expected results:
matric vmi_status_running:count will show 1

Additional info:

After investigation with fromani, fix is -
https://github.com/MarSik/kubevirt-ssp-operator/pull/98

Comment 8 Martin Sivák 2019-10-21 14:30:28 UTC
The documentation part is handled in https://bugzilla.redhat.com/show_bug.cgi?id=1761501

Comment 9 Martin Sivák 2019-10-21 14:51:43 UTC
The u/s backport to stable branch is here: https://github.com/MarSik/kubevirt-ssp-operator/pull/112

Comment 15 Israel Pinto 2019-12-19 18:11:11 UTC
Still the vm count is missing, attached kubevirt matrices 

Steps:
1. Create and run VM
2. Connect to prometheus 
3. Query the vm count

results:
Missing matric
 
SSP version:
container-native-virtualization-kubevirt-ssp-operator:v2.2.0-14

Comment 16 Israel Pinto 2019-12-19 18:11:57 UTC
Created attachment 1646641 [details]
prometheus kubevirt list

Comment 17 Fabian Deutsch 2019-12-20 15:10:27 UTC
Daniel, can you please take a look?

Comment 18 Daniel Belenky 2019-12-21 09:19:38 UTC
Absolutely.

Comment 19 Daniel Belenky 2019-12-22 16:27:40 UTC
OpenShift's monitoring-operator is looking for resources of kind "PrometheusRule" under namespaces that are marked with the openshift.io/cluster-monitoring="true" label. In the testing environment, a custom deploy script is used to deploy the HCO (https://pkgs.devel.redhat.com/cgit/containers/hco-bundle-registry/plain/marketplace-qe-testing.sh?h=cnv-2.2-rhel-8). When I've added the label to the namespace (openshift-cnv) the monitoring-operator picked it the next time it ran the reconcile loop and it appeared in the metrics board as expected.

I don't want to just add the label to the QE's deploy script because I need to first understand why we're using a different deploy script in QE than what we give to our clients. We risk having false positives here, and the fix should go to the code our clients use.

Comment 20 Daniel Belenky 2019-12-23 08:53:48 UTC
A PR to patch the deployment *upstream only*: https://github.com/kubevirt/hyperconverged-cluster-operator/pull/389.
I still need to figure out how our customers are instructed to install the HCO because someone still has to create the namespace for it manually.

Comment 21 Daniel Belenky 2019-12-24 08:28:56 UTC
Sent a patch to OpenShift docs: https://github.com/openshift/openshift-docs/pull/18806.
Once it's merged, we can adjust the QE deploy script accordingly.

The reason we request users to add a label to their namespace and not doing it from one of
our components (i.e virt-operator) is that in order to add labels to namespaces we need
a cluster role to update namespaces.

Comment 22 Daniel Belenky 2019-12-26 22:45:13 UTC
I've spent more time on this issue and found a more elegant solution:
virt-operator will take care of patching labels on the namespace.

https://github.com/kubevirt/kubevirt/pull/2952/

The reason I took a different approach at first was that I thought
that permissions to patch namespaces at a cluster scope might be
"too much" of permissions for virt-operator. But, with OpenShift
Project's we can deploy a Role in our namespace that will let us
operate on our Project only.

Comment 33 Israel Pinto 2020-01-23 12:47:45 UTC
Verify with:
container-native-virtualization-kubevirt-ssp-operator:v2.2.0-21
container-native-virtualization-virt-operator:v2.2.0-15
container-native-virtualization-hyperconverged-cluster-operator:v2.2.0-12

Metric: vmi_status_running:count  - Exists
See attached.

Comment 34 Israel Pinto 2020-01-23 12:48:23 UTC
Created attachment 1654838 [details]
vmi_count metric

Comment 37 errata-xmlrpc 2020-01-30 16:27:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:0307

Comment 38 Simone Tiraboschi 2020-05-28 09:43:25 UTC
Removing old need info.