Bug 2033579

Summary:	SRO cannot update the special-resource-lifecycle ConfigMap if the data field is undefined
Product:	OpenShift Container Platform	Reporter:	Quentin Barrand <quba>
Component:	Special Resource Operator	Assignee:	Quentin Barrand <quba>
Status:	CLOSED ERRATA	QA Contact:	liqcui
Severity:	high	Docs Contact:
Priority:	high
Version:	4.10	CC:	aos-bugs, bthurber
Target Milestone:	---
Target Release:	4.10.0
Hardware:	All
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2022-03-10 16:34:34 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Quentin Barrand 2021-12-17 10:26:47 UTC

Description of problem:

Because of a change in [1], when the special-resource-lifecycle ConfigMap's data field is undefined, SRO cannot update it and fails the reconciliation.

[1]: https://github.com/openshift/special-resource-operator/pull/84/files#diff-8b98b331c5d4acbeb7274c68973d20900daaed47c8d8f3e62ba39284379166bbL48-R41,


Steps to Reproduce:
0. Have SRO running
1. Install the infoscale recipe https://github.com/openshift/special-resource-operator/tree/master/charts/infoscale

Actual results:

 2021-12-17T02:56:07.849Z        INFO    ^[[1;32minfoscale-vtas  ^[[0m   RECONCILE REQUEUE: Could not reconcile chart    {"error": "cannot reconcile hardware states: failed to create state templates/3000-driver-container.yaml: after CRUD hooks failed: could not wait for resource: Waiting too long for resource: error or data not found: <nil> "}

Expected results:

The ConfigMap is updated and the reconciliation proceeds as expected.

Comment 3 liqcui 2021-12-30 02:50:16 UTC

We don't have infoscale environment and don't have veritas image registry account, so we met with image pull issue, can not do e2e test, we need to ask partner to help us to do the e2e testing.

Detailed Testing Results:

[ocpadmin@ec2-18-217-45-133 infoscale-vtas-0.0.1]$ oc get pods -n infoscale-vtas
NAME                                                   READY   STATUS         RESTARTS   AGE
infoscale-vtas-licensing-controller-69787566f7-74jjs   0/1     ErrImagePull   0          2m29s
[ocpadmin@ec2-18-217-45-133 infoscale-vtas-0.0.1]$ oc describe pod infoscale-vtas-licensing-controller-69787566f7-74jjs -n infoscale-vtas
Name:         infoscale-vtas-licensing-controller-69787566f7-74jjs
Namespace:    infoscale-vtas
Priority:     0
Node:         ip-10-0-139-35.us-east-2.compute.internal/10.0.139.35
Start Time:   Thu, 30 Dec 2021 02:12:24 +0000
Labels:       app=infoscale-vtas-licensing-controller
              pod-template-hash=69787566f7
              specialresource.openshift.io/owned=true
Annotations:  k8s.v1.cni.cncf.io/network-status:
                [{
                    "name": "openshift-sdn",
                    "interface": "eth0",
                    "ips": [
                        "10.128.2.12"
                    ],
Volumes:
  kube-api-access-xfd49:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
    ConfigMapName:           openshift-service-ca.crt
    ConfigMapOptional:       <nil>
QoS Class:                   BestEffort
Node-Selectors:              IS-cluster1=true
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age    From               Message
  ----     ------            ----   ----               -------
  Warning  FailedScheduling  2m37s  default-scheduler  0/6 nodes are available: 3 node(s) didn't match Pod's node affinity/selector, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.
  Warning  FailedScheduling  92s    default-scheduler  0/6 nodes are available: 3 node(s) didn't match Pod's node affinity/selector, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.
  Normal   Scheduled         15s    default-scheduler  Successfully assigned infoscale-vtas/infoscale-vtas-licensing-controller-69787566f7-74jjs to ip-10-0-139-35.us-east-2.compute.internal
  Normal   AddedInterface    13s    multus             Add eth0 [10.128.2.12/23] from openshift-sdn
  Normal   Pulling           13s    kubelet            Pulling image "veritas/infoscale-license:8.0.0.0000-rhel8"
  Warning  Failed            12s    kubelet            Failed to pull image "veritas/infoscale-license:8.0.0.0000-rhel8": rpc error: code = Unknown desc = reading manifest 8.0.0.0000-rhel8 in docker.io/veritas/infoscale-license: errors:
denied: requested access to the resource is denied
unauthorized: authentication required
  Warning  Failed   12s  kubelet  Error: ErrImagePull
  Normal   BackOff  12s  kubelet  Back-off pulling image "veritas/infoscale-license:8.0.0.0000-rhel8"
  Warning  Failed   12s  kubelet  Error: ImagePullBackOff
[ocpadmin@ec2-18-217-45-133 infoscale-vtas-0.0.1]$ oc get configmap -n infoscale-vtas
NAME                                   DATA   AGE
kube-root-ca.crt                       1      3m19s
openshift-service-ca.crt               1      3m19s
sh.helm.hooks.pre-install              0      3m13s
sh.helm.release.v1.infoscale-vtas.v1   1      3m13s

Comment 4 Quentin Barrand 2022-01-07 13:11:53 UTC

Veritas confirmed that the fix solves the problem: https://coreos.slack.com/archives/C02358PSC03/p1641554833000100?thread_ts=1639735346.216700&cid=C02358PSC03

Comment 7 errata-xmlrpc 2022-03-10 16:34:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056