2058167 – Post deploy on a baremetal cluster SSP is looping attempting to reconcile

Bug 2058167 - Post deploy on a baremetal cluster SSP is looping attempting to reconcile

Summary: Post deploy on a baremetal cluster SSP is looping attempting to reconcile

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	SSP
Sub Component:
Version:	4.10.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	4.10.0
Assignee:	Andrej Krejcir
QA Contact:	Geetika Kapoor
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-02-24 12:21 UTC by Debarati Basu-Nag
Modified:	2022-03-16 16:07 UTC (History)
CC List:	7 users (show)
Fixed In Version:	kubevirt-ssp-operator-container-v4.10.0-50, hco-bundle-registry-4.10.0-696
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-03-16 16:07:15 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
ssp log (14.15 MB, text/plain) 2022-02-24 12:21 UTC, Debarati Basu-Nag	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Github	kubevirt ssp-operator pull 316	None	Merged	DataSources: Set app labels only when auto-update is disabled	2022-02-25 14:53:56 UTC
Github	kubevirt ssp-operator pull 317	None	Merged	[release-v0.13] DataSources: Set app labels only when auto-update is disabled	2022-02-25 14:53:54 UTC
Red Hat Issue Tracker	CNV-16644	None	None	None	2022-03-07 14:23:22 UTC
Red Hat Product Errata	RHSA-2022:0947	None	None	None	2022-03-16 16:07:19 UTC

Description Debarati Basu-Nag 2022-02-24 12:21:20 UTC

Created attachment 1863175 [details]
ssp log

Description of problem:
Post deployment of BM cluster bm02-cnvqe2-rdu2, noticed that HCO is in degraded state, due to SSP not being available. From the ssp operator log, it looks like it is continuously attempting to reconcile and failing

Version-Release number of selected component (if applicable):
4.10.0 - 686 

How reproducible:
Not sure.

Steps to Reproduce:
1.Not sure.
2.
3.

Actual results:
HCO Status condition:
====================
{
      "lastTransitionTime": "2022-02-23T16:59:54Z",
      "message": "Reconcile completed successfully",
      "observedGeneration": 73,
      "reason": "ReconcileCompleted",
      "status": "True",
      "type": "ReconcileComplete"
    },
    {
      "lastTransitionTime": "2022-02-24T00:24:04Z",
      "message": "SSP is not available: Reconciling SSP resources",
      "observedGeneration": 73,
      "reason": "SSPNotAvailable",
      "status": "False",
      "type": "Available"
    },
    {
      "lastTransitionTime": "2022-02-24T00:24:04Z",
      "message": "SSP is progressing: Reconciling SSP resources",
      "observedGeneration": 73,
      "reason": "SSPProgressing",
      "status": "True",
      "type": "Progressing"
    },
    {
      "lastTransitionTime": "2022-02-24T00:46:30Z",
      "message": "SSP is degraded: Reconciling SSP resources",
      "observedGeneration": 73,
      "reason": "SSPDegraded",
      "status": "True",
      "type": "Degraded"
    },
    {
      "lastTransitionTime": "2022-02-24T00:24:04Z",
      "message": "SSP is progressing: Reconciling SSP resources",
      "observedGeneration": 73,
      "reason": "SSPProgressing",
      "status": "False",
      "type": "Upgradeable"
    }
===================
SSP status:
===================
{
  "conditions": [
    {
      "lastHeartbeatTime": "2022-02-24T00:49:32Z",
      "lastTransitionTime": "2022-02-24T00:49:32Z",
      "message": "Reconciling SSP resources",
      "reason": "Available",
      "status": "False",
      "type": "Available"
    },
    {
      "lastHeartbeatTime": "2022-02-24T00:49:32Z",
      "lastTransitionTime": "2022-02-24T00:49:32Z",
      "message": "Reconciling SSP resources",
      "reason": "Progressing",
      "status": "True",
      "type": "Progressing"
    },
    {
      "lastHeartbeatTime": "2022-02-24T00:49:32Z",
      "lastTransitionTime": "2022-02-24T00:49:32Z",
      "message": "Reconciling SSP resources",
      "reason": "Degraded",
      "status": "True",
      "type": "Degraded"
    }
  ],
  "observedGeneration": 6,
  "observedVersion": "4.10.0",
  "operatorVersion": "4.10.0",
  "phase": "Deploying",
  "targetVersion": "4.10.0"
}
From SSP operator log, this message showing up again and again:
=================
{"level":"error","ts":1645655734.9615152,"logger":"controller-runtime.manager.controller.ssp","msg":"Reconciler error","reconciler group":"ssp.kubevirt.io","reconciler kind":"SSP","name":"ssp-kubevirt-hyperconverged","namespace":"openshift-cnv","error":"Operation cannot be fulfilled on ssps.ssp.kubevirt.io \"ssp-kubevirt-hyperconverged\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214"}
================

Attached is hco operator log and ssp operator log

Expected results:


Additional info:

Comment 1 Debarati Basu-Nag 2022-02-24 12:36:16 UTC

Moving to storage, as triage by Oren, indicated this is a CDI issue.

Comment 2 Arnon Gilboa 2022-02-24 14:09:13 UTC

Moved to SSP after having a debug session with @akrejcir

Comment 3 Andrej Krejcir 2022-02-24 15:19:01 UTC

I reproduced this on my development cluster. It is not related to bare metal.

The problem is that SSP and CDI modify the same labels in a loop.
This is the update done by SSP:

@ ["metadata","labels","app.kubernetes.io/component"]
- "storage"
+ "templating"
@ ["metadata","labels","app.kubernetes.io/managed-by"]
- "cdi-controller"
+ "ssp-operator"

And CID reverts it back.

I will post a PR to SSP, to break the loop.

Comment 5 Roni Kishner 2022-03-03 10:46:57 UTC

Verified on kubevirt-ssp-operator-container-v4.10.0-50

Note: The fix mention the labels are being set now when auto-update is disabled, this could mean the bug will again when auto-update is disabled, but I didn't manage to reproduce it even then so cant say for sure

Comment 8 errata-xmlrpc 2022-03-16 16:07:15 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.10.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0947

Note You need to log in before you can comment on or make changes to this bug.