Bug 1853133

Summary: [CNV-2.4] Deployment fails on KubeVirtMetricsAggregationNotAvailable
Product: Container Native Virtualization (CNV) Reporter: Lukas Bednar <lbednar>
Component: SSPAssignee: Karel Šimon <ksimon>
Status: CLOSED ERRATA QA Contact: Israel Pinto <ipinto>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 2.4.0CC: cnv-qe-bugs, dollierp, irose, lbednar, ncredi, nunnatsa, oyahud, rnetser, stirabos, talayan
Target Milestone: ---Keywords: Regression, TestBlocker
Target Release: 2.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kubevirt-ssp-operator-container-v2.4.0-66, hco-bundle-registry-container-v2.3.0-445 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-28 19:10:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Lukas Bednar 2020-07-02 04:38:46 UTC
Description of problem:

    - lastHeartbeatTime: "2020-07-02T04:35:15Z"
      lastTransitionTime: "2020-07-02T00:58:04Z"
      message: missing "Available" condition
      reason: KubeVirtMetricsAggregationNotAvailable
      status: "False"
      type: Available


Version-Release number of selected component (if applicable):
HCO-v2.3.0-433
OCP-4.5


How reproducible: 100


Steps to Reproduce:
1. Deploy CNV
2.
3.

Actual results: Failing on KubeVirtMetricsAggregationNotAvailable


Expected results: CNV deployed successfully


Additional info:

Comment 1 Nahshon Unna-Tsameret 2020-07-02 06:40:28 UTC
What is the timeout? how log it run before failing the test?

Comment 2 Tareq Alayan 2020-07-02 07:31:04 UTC
*** Bug 1853185 has been marked as a duplicate of this bug. ***

Comment 3 Simone Tiraboschi 2020-07-02 07:50:16 UTC
After more than 6 hours the issue is still there,
and indeed KubeVirtMetricsAggregationNotAvailable is not available:


 [cloud-user@ocp-psi-executor ~]$ oc get KubeVirtMetricsAggregation -n openshift-cnv   metrics-aggregation-kubevirt-hyperconverged -o yaml
 apiVersion: ssp.kubevirt.io/v1
 kind: KubevirtMetricsAggregation
 metadata:
   creationTimestamp: "2020-07-02T00:58:04Z"
   generation: 1
   labels:
     app: kubevirt-hyperconverged
   managedFields:
   - apiVersion: ssp.kubevirt.io/v1
     fieldsType: FieldsV1
     fieldsV1:
       f:metadata:
         f:labels:
           .: {}
           f:app: {}
         f:ownerReferences: {}
       f:spec: {}
     manager: hyperconverged-cluster-operator
     operation: Update
     time: "2020-07-02T00:58:04Z"
   - apiVersion: ssp.kubevirt.io/v1
     fieldsType: FieldsV1
     fieldsV1:
       f:status:
         .: {}
         f:conditions: {}
     manager: ansible-operator
     operation: Update
     time: "2020-07-02T07:30:06Z"
   name: metrics-aggregation-kubevirt-hyperconverged
   namespace: openshift-cnv
   ownerReferences:
   - apiVersion: hco.kubevirt.io/v1alpha1
     blockOwnerDeletion: true
     controller: true
     kind: HyperConverged
     name: kubevirt-hyperconverged
     uid: 5231311f-39f4-42bb-af2e-1a4d1533c0bb
   resourceVersion: "8270927"
   selfLink: /apis/ssp.kubevirt.io/v1/namespaces/openshift-cnv/kubevirtmetricsaggregations/metrics-aggregation-kubevirt-hyperconverged
   uid: ab78b85b-4d1f-447c-ac71-ce9fd006a52f
 spec: {}
 status:
   conditions:
   - lastTransitionTime: "2020-07-02T07:30:01Z"
     message: Running reconciliation
     reason: Running
     status: "False"
     type: Running
   - ansibleResult:
       changed: 0
       completion: 2020-07-02T07:30:05.990898
       failures: 1
       ok: 2
       skipped: 0
     lastTransitionTime: "2020-07-02T07:30:06Z"
     message: |
       The task includes an option with an undefined variable. The error was: 'operator_version' is undefined
 
       The error appears to be in '/opt/ansible/roles/KubevirtMetricsAggregation/tasks/main.yml': line 2, column 3, but may
       be elsewhere in the file depending on the exact syntax problem.
 
       The offending line appears to be:
 
       ---
       - name: Set operatorVersion and targetVersion
         ^ here
     reason: Failed
     status: "True"
     type: Failure


In SSP operator logs:
 --------------------------- Ansible Task StdOut -------------------------------
 
  TASK [Set operatorVersion and targetVersion] ******************************** 
 fatal: [localhost]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'operator_version' is undefined\n\nThe error appears to be in '/opt/ansible/roles/KubevirtCommonTemplatesBundle/tasks/main.yml': line 2, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n---\n- name: Set operatorVersion and targetVersion\n  ^ here\n"}
 
 -------------------------------------------------------------------------------
 {"level":"error","ts":1593675251.394438,"logger":"logging_event_handler","msg":"","name":"common-templates-kubevirt-hyperconverged","namespace":"openshift","gvk":"ssp.kubevirt.io/v1, Kind=KubevirtCommonTemplatesBundle","event_type":"runner_on_failed","job":"8769843336475300403","EventData.Task":"Set operatorVersion and targetVersion","EventData.TaskArgs":"","EventData.FailedTaskPath":"/opt/ansible/roles/KubevirtCommonTemplatesBundle/tasks/main.yml:2","error":"[playbook task failed]","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tsrc/github.com/operator-framework/operator-sdk/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/operator-framework/operator-sdk/pkg/ansible/events.loggingEventHandler.Handle\n\tsrc/github.com/operator-framework/operator-sdk/pkg/ansible/events/log_events.go:87"}
 {"level":"error","ts":1593675251.5331721,"logger":"runner","msg":"\u001b[0;34mansible-playbook 2.9.10\u001b[0m\r\n\u001b[0;34m  config file = /etc/ansible/ansible.cfg\u001b[0m\r\n\u001b[0;34m  configured module search path = [u'/opt/ansible/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']\u001b[0m\r\n\u001b[0;34m  ansible python module location = /usr/lib/python2.7/site-packages/ansible\u001b[0m\r\n\u001b[0;34m  executable location = /usr/bin/ansible-playbook\u001b[0m\r\n\u001b[0;34m  python version = 2.7.5 (default, Sep 26 2019, 13:23:47) [GCC 4.8.5 20150623 (Red Hat 4.8.5-39)]\u001b[0m\r\n\u001b[0;34mUsing /etc/ansible/ansible.cfg as config file\u001b[0m\r\n\r\nPLAYBOOK: kubevirtnodelabeller.yaml ********************************************\n\u001b[0;34m1 plays in /opt/ansible/kubevirtnodelabeller.yaml\u001b[0m\n\r\nPLAY [localhost] ***************************************************************\n\r\nTASK [Gathering Facts] *********************************************************\r\n\u001b[1;30mtask path: /opt/ansible/kubevirtnodelabeller.yaml:1\u001b[0m\n\u001b[0;32mok: [localhost]\u001b[0m\n\u001b[0;34mMETA: ran handlers\u001b[0m\n\r\nTASK [KubevirtCircuitBreaker : Extract the CR info] ****************************\r\n\u001b[1;30mtask path: /opt/ansible/roles/KubevirtCircuitBreaker/tasks/main.yml:3\u001b[0m\n\u001b[0;32mok: [localhost] => {\"ansible_facts\": {\"cr_info\": {\"apiVersion\": \"ssp.kubevirt.io/v1\", \"kind\": \"KubevirtNodeLabellerBundle\", \"metadata\": {\"creationTimestamp\": \"2020-07-02T00:58:04Z\", \"generation\": 1, \"labels\": {\"app\": \"kubevirt-hyperconverged\"}, \"managedFields\": [{\"apiVersion\": \"ssp.kubevirt.io/v1\", \"fieldsType\": \"FieldsV1\", \"fieldsV1\": {\"f:metadata\": {\"f:labels\": {\".\": {}, \"f:app\": {}}, \"f:ownerReferences\": {}}, \"f:spec\": {}}, \"manager\": \"hyperconverged-cluster-operator\", \"operation\": \"Update\", \"time\": \"2020-07-02T00:58:04Z\"}, {\"apiVersion\": \"ssp.kubevirt.io/v1\", \"fieldsType\": \"FieldsV1\", \"fieldsV1\": {\"f:status\": {\".\": {}, \"f:conditions\": {}}}, \"manager\": \"ansible-operator\", \"operation\": \"Update\", \"time\": \"2020-07-02T07:34:01Z\"}], \"name\": \"node-labeller-kubevirt-hyperconverged\", \"namespace\": \"openshift-cnv\", \"ownerReferences\": [{\"apiVersion\": \"hco.kubevirt.io/v1alpha1\", \"blockOwnerDeletion\": true, \"controller\": true, \"kind\": \"HyperConverged\", \"name\": \"kubevirt-hyperconverged\", \"uid\": \"5231311f-39f4-42bb-af2e-1a4d1533c0bb\"}], \"resourceVersion\": \"8273277\", \"selfLink\": \"/apis/ssp.kubevirt.io/v1/namespaces/openshift-cnv/kubevirtnodelabellerbundles/node-labeller-kubevirt-hyperconverged\", \"uid\": \"c22dd1a9-f565-4832-9d9a-2965410c41e2\"}, \"spec\": {}, \"status\": {\"conditions\": [{\"ansibleResult\": {\"changed\": 0, \"completion\": \"2020-07-02T07:17:21.089307\", \"failures\": 1, \"ok\": 6, \"skipped\": 0}, \"lastTransitionTime\": \"2020-07-02T07:17:21Z\", \"message\": \"The task includes an option with an undefined variable. The error was: 'operator_version' is undefined\\n\\nThe error appears to be in '/opt/ansible/roles/KubevirtNodeLabeller/tasks/main.yml': line 2, column 3, but may\\nbe elsewhere in the file depending on the exact syntax problem.\\n\\nThe offending line appears to be:\\n\\n---\\n- name: Set operatorVersion and targetVersion\\n  ^ here\\n\", \"reason\": \"Failed\", \"status\": \"False\", \"type\": \"Failure\"}, {\"lastTransitionTime\": \"2020-07-02T07:34:01Z\", \"message\": \"Running reconciliation\", \"reason\": \"Running\", \"status\": \"True\", \"type\": \"Running\"}]}}}, \"changed\": false}\u001b[0m\n\r\nTASK [KubevirtCircuitBreaker : Extract the disable info] ***********************\r\n\u001b[1;30mtask path: /opt/ansible/roles/KubevirtCircuitBreaker/tasks/main.yml:6\u001b[0m\n\u001b[0;32mok: [localhost] => {\"ansible_facts\": {\"is_paused\": false}, \"changed\": false}\u001b[0m\n\u001b[0;34mMETA: \u001b[0m\n\r\nTASK [KubevirtRepoInfo : Extract the image name] *******************************\r\n\u001b[1;30mtask path: /opt/ansible/roles/KubevirtRepoInfo/tasks/main.yml:3\u001b[0m\n\u001b[0;32mok: [localhost] => {\"ansible_facts\": {\"operator_image_name\": \"registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-kubevirt-ssp-operator@sha256:d4b4133c2c3e9402c895856a968495a3539b32c87e3947268bbdb9763994956a\"}, \"changed\": false}\u001b[0m\n\r\nTASK [KubevirtRepoInfo : Extract the SSP registry] *****************************\r\n\u001b[1;30mtask path: /opt/ansible/roles/KubevirtRepoInfo/tasks/main.yml:6\u001b[0m\n\u001b[0;32mok: [localhost] => {\"ansible_facts\": {\"image_name_prefix\": \"container-native-virtualization-\", \"ssp_registry\": \"registry-proxy.engineering.redhat.com/rh-osbs\"}, \"changed\": false}\u001b[0m\n\r\nTASK [KubevirtRepoInfo : Show the SSP registry] ********************************\r\n\u001b[1;30mtask path: /opt/ansible/roles/KubevirtRepoInfo/tasks/main.yml:10\u001b[0m\n\u001b[0;32mok: [localhost] => {\u001b[0m\r\n\u001b[0;32m    \"msg\": \"registry: registry-proxy.engineering.redhat.com/rh-osbs prefix: container-native-virtualization-\"\u001b[0m\r\n\u001b[0;32m}\u001b[0m\n\r\nTASK [KubevirtNodeLabeller : Set operatorVersion and targetVersion] ************\r\n\u001b[1;30mtask path: /opt/ansible/roles/KubevirtNodeLabeller/tasks/main.yml:2\u001b[0m\n\u001b[0;31mfatal: [localhost]: FAILED! => {\"msg\": \"The task includes an option with an undefined variable. The error was: 'operator_version' is undefined\\n\\nThe error appears to be in '/opt/ansible/roles/KubevirtNodeLabeller/tasks/main.yml': line 2, column 3, but may\\nbe elsewhere in the file depending on the exact syntax problem.\\n\\nThe offending line appears to be:\\n\\n---\\n- name: Set operatorVersion and targetVersion\\n  ^ here\\n\"}\u001b[0m\n\r\nPLAY RECAP *********************************************************************\r\n\u001b[0;31mlocalhost\u001b[0m                  : \u001b[0;32mok=6   \u001b[0m changed=0    unreachable=0    \u001b[0;31mfailed=1   \u001b[0m skipped=0    rescued=0    ignored=0   \r\n\n","job":"583103985801850551","name":"node-labeller-kubevirt-hyperconverged","namespace":"openshift-cnv","error":"exit status 2","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tsrc/github.com/operator-framework/operator-sdk/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/operator-framework/operator-sdk/pkg/ansible/runner.(*runner).Run.func1\n\tsrc/github.com/operator-framework/operator-sdk/pkg/ansible/runner/runner.go:239"}
 
 --------------------------- Ansible Task Status Event StdOut  -----------------
 
 PLAY RECAP *********************************************************************
 localhost                  : ok=6    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0   


This looks definitively like a bug on SSP side, moving there.

Comment 4 Simone Tiraboschi 2020-07-02 07:52:44 UTC
I think that the issue is caused by a missing value for OPERATOR_VERSION in the deployment for SSP in the CSV.
If so it should be already addressed by: 
https://github.com/MarSik/kubevirt-ssp-operator/pull/198

Comment 5 Omer Yahud 2020-07-02 08:30:38 UTC
The issue is with a missing variable in the _defaults.yaml ansible file downstream, working on a fix right now.

Comment 6 Lukas Bednar 2020-07-02 10:49:20 UTC
seeing little bit different message with HCO-v2.3.0-434

    - lastHeartbeatTime: "2020-07-02T10:47:10Z"
      lastTransitionTime: "2020-07-02T09:59:39Z"
      message: missing "Available" condition
      reason: KubeVirtMetricsAggregationNotAvailable
      status: "False"
      type: Available

Comment 7 Omer Yahud 2020-07-02 11:01:14 UTC
The new ssp-operator build contains the missing variable, moved to ON_QA

Comment 8 Denis Ollier 2020-07-03 12:20:31 UTC
Still failing with kubevirt-ssp-operator:v2.4.0-65:

>    Failed to import the required Python library (openshift >= 0.9.2) on kubevirt-ssp-operator-69c7fdb484-s97hc's Python /usr/bin/python2.
>    This is required for apply.
>    Please read module documentation and install in the appropriate location.
>    If the required library is installed, but Ansible is using the wrong Python interpreter, please consult the documentation on ansible_python_interpreter

Comment 9 Denis Ollier 2020-07-03 12:39:34 UTC
According to kubernetes ansible collection, openshift python module required version is 0.9.2 (See file /opt/ansible/.ansible/collections/ansible_collections/community/kubernetes/plugins/module_utils/raw.py).

However, kubevirt-ssp-operator image contains version 0.8.11.

> rpm -q python2-openshift
> python2-openshift-0.8.11-1.el7.noarch

Comment 10 Omer Yahud 2020-07-03 21:07:03 UTC
Where do you see this error? I'm running kubevirt-ssp-operator-container-v2.4.0-65 and it is working fine

Comment 11 Omer Yahud 2020-07-03 21:26:08 UTC
I managed to reproduce the issue, will work on a solution on sunday

Comment 12 Omer Yahud 2020-07-05 15:03:21 UTC
Upstream workaround PR can be found here: https://github.com/MarSik/kubevirt-ssp-operator/pull/200
operator-sdk bug can be found here: https://bugzilla.redhat.com/show_bug.cgi?id=1853915

Comment 13 Omer Yahud 2020-07-07 06:50:27 UTC
A new SSP build is available: kubevirt-ssp-operator-container-v2.4.0-66, not sure if we have to wait for an HCO build as well

Comment 14 Denis Ollier 2020-07-07 11:39:24 UTC
Since CVP is broken ATM, I deployed CNV from OSBS instead of Brew

CSV kubevirt-hyperconverged-operator.v2.4.0:
- createdAt: "2020-07-06 17:46:10"
- hyperconverged-cluster-operator:v2.4.0-62
- kubevirt-ssp-operator:v2.4.0-66

CNV deployment is successful with those versions.

Comment 15 Omer Yahud 2020-07-07 12:18:23 UTC
New builds were provided (see Fixed In Version), moving to ON_QA.

Comment 16 Ruth Netser 2020-07-08 12:22:09 UTC
Verified on OCP 4.5.0-rc., SSP v2.4.0-66:
Clean installation and after upgrade


$ oc get KubeVirtMetricsAggregation -n openshift-cnv   metrics-aggregation-kubevirt-hyperconverged -o yaml
apiVersion: ssp.kubevirt.io/v1
kind: KubevirtMetricsAggregation
metadata:
  creationTimestamp: "2020-07-08T11:07:30Z"
  generation: 1
  labels:
    app: kubevirt-hyperconverged
  managedFields:
  - apiVersion: ssp.kubevirt.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          .: {}
          f:app: {}
        f:ownerReferences: {}
      f:spec: {}
    manager: hyperconverged-cluster-operator
    operation: Update
    time: "2020-07-08T11:07:30Z"
  - apiVersion: ssp.kubevirt.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        f:observedVersion: {}
        f:operatorVersion: {}
        f:targetVersion: {}
    manager: Swagger-Codegen
    operation: Update
    time: "2020-07-08T11:08:21Z"
  - apiVersion: ssp.kubevirt.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        .: {}
        f:conditions: {}
    manager: ansible-operator
    operation: Update
    time: "2020-07-08T11:08:23Z"
  name: metrics-aggregation-kubevirt-hyperconverged
  namespace: openshift-cnv
  ownerReferences:
  - apiVersion: hco.kubevirt.io/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: HyperConverged
    name: kubevirt-hyperconverged
    uid: 463aa27a-76e5-43c5-a88c-f8792c04724a
  resourceVersion: "119186"
  selfLink: /apis/ssp.kubevirt.io/v1/namespaces/openshift-cnv/kubevirtmetricsaggregations/metrics-aggregation-kubevirt-hyperconverged
  uid: 203a87d8-7828-462e-9be9-490ee4a990d5
spec: {}
status:
  conditions:
  - lastTransitionTime: "2020-07-08T11:08:23Z"
    message: KubevirtMetricsAggregation is available.
    reason: available
    status: "True"
    type: Available
  - lastTransitionTime: "2020-07-08T11:08:23Z"
    message: KubevirtMetricsAggregation progressing
    reason: progressing
    status: "False"
    type: Progressing
  - lastTransitionTime: "2020-07-08T11:08:23Z"
    message: KubevirtMetricsAggregation degraded
    reason: degraded
    status: "False"
    type: Degraded
  - ansibleResult:
      changed: 6
      completion: 2020-07-08T11:08:23.59718
      failures: 0
      ok: 9
      skipped: 0
    lastTransitionTime: "2020-07-08T11:07:57Z"
    message: Awaiting next reconciliation
    reason: Successful
    status: "True"
    type: Running
  observedVersion: v2.4.0
  operatorVersion: v2.4.0
  targetVersion: v2.4.0

Comment 19 errata-xmlrpc 2020-07-28 19:10:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:3194