Description of problem: If the CNO pod is recreated, the resources that are not rendered anymore are not deleted. Basically, the related objects field of the cluster operator status is wiped out upon cno recreation, which breaks the deletion of the related objetcs not rendered as there is no objects saved on status manager. In the following outputs first the related objects is removed, then updated with no admission controller, and the admission controller DaemonSet is still present on the cluster. (shiftstack) [stack@undercloud-0 ~]$ oc get co network -o yaml apiVersion: config.openshift.io/v1 kind: ClusterOperator metadata: annotations: network.operator.openshift.io/last-seen-state: '{"DaemonsetStates":[],"DeploymentStates":[]}' creationTimestamp: "2020-03-03T20:50:20Z" generation: 1 name: network resourceVersion: "2579694" selfLink: /apis/config.openshift.io/v1/clusteroperators/network uid: 014888b0-a1bf-4c1a-b427-d5467f33ba76 spec: {} status: conditions: - lastTransitionTime: "2020-03-09T15:57:56Z" status: "False" type: Degraded - lastTransitionTime: "2020-03-03T20:50:20Z" status: "True" type: Upgradeable - lastTransitionTime: "2020-03-09T16:10:31Z" status: "False" type: Progressing - lastTransitionTime: "2020-03-03T20:57:53Z" status: "True" type: Available extension: null versions: - name: operator version: 4.4.0-0.nightly-2020-03-03-110909 (shiftstack) [stack@undercloud-0 ~]$ oc get co network -o yaml apiVersion: config.openshift.io/v1 kind: ClusterOperator metadata: annotations: network.operator.openshift.io/last-seen-state: '{"DaemonsetStates":[],"DeploymentStates":[{"Namespace":"openshift-kuryr","Name":"kuryr-controller","LastSeenStatus":{"observedGeneration":26,"replicas":1,"updatedReplicas":1,"unavailableReplicas":1,"conditions":[{"type":"Progressing","status":"True","lastUpdateTime":"2020-03-08T19:37:53Z","lastTransitionTime":"2020-03-03T20:52:57Z","reason":"NewReplicaSetAvailable","message":"ReplicaSet \"kuryr-controller-57c7f8d95f\" has successfully progressed."},{"type":"Available","status":"False","lastUpdateTime":"2020-03-09T16:11:33Z","lastTransitionTime":"2020-03-09T16:11:33Z","reason":"MinimumReplicasUnavailable","message":"Deployment does not have minimum availability."}]},"LastChangeTime":"2020-03-09T16:12:04.3674935Z"}]}' creationTimestamp: "2020-03-03T20:50:20Z" generation: 1 name: network resourceVersion: "2579785" selfLink: /apis/config.openshift.io/v1/clusteroperators/network uid: 014888b0-a1bf-4c1a-b427-d5467f33ba76 spec: {} status: conditions: - lastTransitionTime: "2020-03-09T15:57:56Z" status: "False" type: Degraded - lastTransitionTime: "2020-03-03T20:50:20Z" status: "True" type: Upgradeable - lastTransitionTime: "2020-03-09T16:12:04Z" message: Deployment "openshift-kuryr/kuryr-controller" is not available (awaiting 1 nodes) reason: Deploying status: "True" type: Progressing - lastTransitionTime: "2020-03-03T20:57:53Z" status: "True" type: Available extension: null relatedObjects: - group: "" name: applied-cluster namespace: openshift-network-operator resource: configmaps - group: apiextensions.k8s.io name: network-attachment-definitions.k8s.cni.cncf.io resource: customresourcedefinitions - group: "" name: openshift-multus resource: namespaces - group: rbac.authorization.k8s.io name: multus resource: clusterroles - group: "" name: multus namespace: openshift-multus resource: serviceaccounts - group: rbac.authorization.k8s.io name: multus resource: clusterrolebindings - group: apps name: multus namespace: openshift-multus resource: daemonsets - group: "" name: multus-admission-controller namespace: openshift-multus resource: services - group: rbac.authorization.k8s.io name: multus-admission-controller-webhook resource: clusterroles - group: rbac.authorization.k8s.io name: multus-admission-controller-webhook resource: clusterrolebindings - group: admissionregistration.k8s.io name: multus.openshift.io resource: validatingwebhookconfigurations - group: "" name: openshift-service-ca namespace: openshift-network-operator resource: configmaps - group: apps name: multus-admission-controller namespace: openshift-multus resource: daemonsets - group: monitoring.coreos.com name: monitor-multus-admission-controller namespace: openshift-multus resource: servicemonitors - group: "" name: multus-admission-controller-monitor-service namespace: openshift-multus resource: services - group: rbac.authorization.k8s.io name: prometheus-k8s namespace: openshift-multus resource: roles - group: rbac.authorization.k8s.io name: prometheus-k8s namespace: openshift-multus resource: rolebindings - group: monitoring.coreos.com name: prometheus-k8s-rules namespace: openshift-multus resource: prometheusrules - group: "" name: openshift-kuryr resource: namespaces - group: rbac.authorization.k8s.io name: kuryr resource: clusterroles - group: "" name: kuryr namespace: openshift-kuryr resource: serviceaccounts - group: rbac.authorization.k8s.io name: kuryr resource: clusterrolebindings - group: apiextensions.k8s.io name: kuryrnets.openstack.org resource: customresourcedefinitions - group: apiextensions.k8s.io name: kuryrnetpolicies.openstack.org resource: customresourcedefinitions - group: "" name: kuryr-config namespace: openshift-kuryr resource: configmaps - group: apps name: kuryr-cni namespace: openshift-kuryr resource: daemonsets - group: apps name: kuryr-controller namespace: openshift-kuryr resource: deployments - group: "" name: openshift-network-operator resource: namespaces versions: - name: operator version: 4.4.0-0.nightly-2020-03-03-110909 (shiftstack) [stack@undercloud-0 ~]$ oc get po -n openshift-kuryr NAME READY STATUS RESTARTS AGE kuryr-cni-4plvz 1/1 Running 0 4m59s kuryr-cni-68bkt 1/1 Running 0 5m58s kuryr-cni-6k2x2 1/1 Running 0 6m29s kuryr-cni-msbtk 1/1 Running 0 7m2s kuryr-cni-qlnrk 1/1 Running 0 4m25s kuryr-cni-rgl6w 1/1 Running 0 5m25s kuryr-controller-59d7fcf5fd-p5n8l 1/1 Running 3 7m6s kuryr-dns-admission-controller-dzlpl 1/1 Running 0 14m kuryr-dns-admission-controller-lmx2s 1/1 Running 0 14m kuryr-dns-admission-controller-w97jb 1/1 Running 0 14m Version-Release number of selected component (if applicable): Tested with ocp 4.4, but also applicable to other releases. How reproducible: Steps to Reproduce: 1. Recreate the CNO with some new configuration 2. This new config makes a Kubernetes resource to not be rendered anymore 3. Notice the resource is still there even if not rendered Actual results: Expected results: Additional info:
Unable to reproduce orphaned resources on 4.5.0-0.nightly-2020-03-16-101116 with OpenShiftSDN The behaviour of the clusteroperator network relatedObjects on OpenShiftSDN seems to be slightly different, 'relatedObjects' never seems to be nil. When I delete the network-operator pod I do not see relatedObjects changing to nil. The only way I was able to reproduce the original issue on 4.4 SDN was to scale the CNO DaemonSet to 0, then oc edit and change the network config. Orphaned resources reproduction steps on 4.4 OpenShiftSDN 1. Add a new multus network network oc edit networks.operator.openshift.io cluster spec: additionalNetworks: - name: bridge-ipam-dhcp namespace: openshift-multus rawCNIConfig: '{ "name": "bridge-ipam-dhcp", "cniVersion": "0.3.1", "type": "bridge", "master": "ens5", "ipam": { "type": "dhcp" } }' type: Raw 2. Verify dhcp daemon pods are created in multus namespace oc get -n openshift-multus pods -l app=dhcp-daemon 3. scale the CNO to 0 and verify the pod is deleted oc -n openshift-network-operator scale deployment network-operator --replicas=0 4. oc edit networks.operator.openshift.io cluster and delete the additional network we added in step 1 5. oc -n openshift-network-operator scale deployment network-operator --replicas=1 6. verify the dhcp pods are still alive and have not been terminated oc get -n openshift-multus pods -l app=dhcp-daemon With these steps the dhcp-pods are not terminated on 4.4 On 4.5.0-0.nightly-2020-03-16-101116 the dhcp-daemon pods are terminated This seems to suggest something has been resolved on 4.5 With OpenShiftSDN I have never seen clusteroperator network 'relatedObjects' be nil @anusaxen reports that with OVNKubernetes he also has not seen 'relatedObjects' be nil Can the Kuryr team also look and see if the root cause for the 'relatedObjects' nil state can be identified as well?
The fix was already in place with 4.5.0-0.nightly-2020-03-16-101116 release image. I could see the 'relatedObjects' fields not present also when using OpenShiftSDN, by constantly checking the field value with: 'oc get co network -o yaml -w' (it keeps the record of changes that happened in the object) The same issue can be seen with Kuryr, as the population of the relatedObjects only happens after the updated of ClusterOperator have happened. The fix solves the issue with Kuryr as well.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409