Bug 1956607

Summary:	Multiple manifests lack cluster profile annotations
Product:	OpenShift Container Platform	Reporter:	W. Trevor King <wking>
Component:	Bare Metal Hardware Provisioning	Assignee:	sdasu
Bare Metal Hardware Provisioning sub component:	cluster-baremetal-operator	QA Contact:	Shelly Miron <smiron>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	high
Priority:	high	CC:	aos-bugs, beth.white, rbartal, sdasu
Version:	4.8	Keywords:	Triaged
Target Milestone:	---	Flags:	sdasu: needinfo-
Target Release:	4.8.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-07-27 23:05:53 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description W. Trevor King 2021-05-04 04:18:33 UTC

Many baremetal manifests lack cluster profile annotations [1], e.g. [2].  Checking fc.2:

$ oc adm release extract --to manifests quay.io/openshift-release-dev/ocp-release:4.8.0-fc.2-x86_64
$ for X in manifests/*.yaml; do ENTRIES="$(yaml2json < "${X}" | jq -r '.[].metadata | select((.annotations // {}) | keys | tostring | contains("include.release.openshift.io/") | not) | .namespace + " " + .name + " " + (.annotations | tostring)')"; if test -
n "${ENTRIES}"; then echo "${X}"; echo "${ENTRIES}"; fi done
...
manifests/0000_30_baremetal-operator_01_baremetalhost.crd.yaml
baremetal-operator-system baremetal-operator-leader-election-role null
 baremetal-operator-manager-role null
 baremetal-operator-proxy-role null
 baremetal-operator-metrics-reader null
baremetal-operator-system baremetal-operator-leader-election-rolebinding null
 baremetal-operator-manager-rolebinding null
 baremetal-operator-proxy-rolebinding null
baremetal-operator-system baremetal-operator-ironic null
baremetal-operator-system baremetal-operator-controller-manager-metrics-service null
baremetal-operator-system baremetal-operator-controller-manager null
manifests/0000_31_cluster-baremetal-operator_05_prometheus_rbac.yaml
  null
manifests/0000_31_cluster-baremetal-operator_05_rbac.yaml
  null
...
manifests/0000_90_cluster-baremetal-operator_03_servicemonitor.yaml
openshift-machine-api cluster-baremetal-operator-servicemonitor {"exclude.release.openshift.io/internal-openshift-hosted":"true"}

Without profile annotations, the manifests will not be pushed into clusters, and any functionality which depends on them will be broken.  You probably want to add at least one cluster-profile annotation to each manifest or drop the manifest altogether.

[1]: https://github.com/openshift/enhancements/blob/0e2f2474ee858350c4dec0033439036426a5663d/enhancements/update/cluster-profiles.md
[2]: https://github.com/openshift/baremetal-operator/blob/d22c5f710cdce4a30f3d8a8457d45ad53dd3f3a2/config/render/capm3.yaml#L5

Comment 1 sdasu 2021-05-05 17:42:13 UTC

We actually use [1] to add annotations to manifests/0000_30_baremetal-operator_01_baremetalhost.crd.yaml

[1] - https://github.com/openshift/baremetal-operator/blob/master/config/crd/ocp/ocp_kustomization.yaml

[stack@localhost dev-scripts]$ oc adm release extract --to manifests quay.io/openshift-release-dev/ocp-release:4.8.0-fc.2-x86_64
Extracted release payload from digest sha256:76fcf7d935c9acc8e3bef6b46a94360e61a5f56958316313d5a7d6a8b5c6484e created at 2021-04-30T14:14:56Z

[stack@localhost dev-scripts]$ head manifests/0000_30_baremetal-operator_01_baremetalhost.crd.yaml
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  annotations:
    controller-gen.kubebuilder.io/version: v0.4.1
    include.release.openshift.io/self-managed-high-availability: "true"
    include.release.openshift.io/single-node-developer: "true"
  creationTimestamp: null
  name: baremetalhosts.metal3.io
spec:

Are we missing some annotation? If, so which one?

Comment 2 W. Trevor King 2021-05-05 19:19:10 UTC

Should be the ones from comment 0.  Double checking the one you call out in comment 1:

$ oc adm release extract --to manifests quay.io/openshift-release-dev/ocp-release:4.8.0-fc.2-x86_64
$ yaml2json <manifests/0000_30_baremetal-operator_01_baremetalhost.crd.yaml | jq -r '.[] | .kind + " " + .metadata.name + " " + (.metadata.annotations | tostring)'
CustomResourceDefinition baremetalhosts.metal3.io {"controller-gen.kubebuilder.io/version":"v0.4.1","include.release.openshift.io/self-managed-high-availability":"true","include.release.openshift.io/single-node-developer":"true"}
Role baremetal-operator-leader-election-role null
ClusterRole baremetal-operator-manager-role null
ClusterRole baremetal-operator-proxy-role null
ClusterRole baremetal-operator-metrics-reader null
RoleBinding baremetal-operator-leader-election-rolebinding null
ClusterRoleBinding baremetal-operator-manager-rolebinding null
ClusterRoleBinding baremetal-operator-proxy-rolebinding null
ConfigMap baremetal-operator-ironic null
Service baremetal-operator-controller-manager-metrics-service null
Deployment baremetal-operator-controller-manager null

So you set the annotations on the CRD itself, but not on the other resources Role, ClusterRoles, etcd. that you include in that file.

Comment 3 sdasu 2021-05-11 14:32:30 UTC

While adding the required annotations, ran into the following CI test failure:
INFO[2021-05-10T22:10:26Z] May 10 21:46:03.565: FAIL: Pods in platform namespaces are not following resource request/limit rules or do not have an exception granted: 
INFO[2021-05-10T22:10:26Z]   apps/v1/Deployment/openshift-machine-api/baremetal-operator-controller-manager/container/kube-rbac-proxy does not have a cpu request (rule: "apps/v1/Deployment/openshift-machine-api/baremetal-operator-controller-manager/container/kube-rbac-proxy/request[cpu]") 
INFO[2021-05-10T22:10:26Z]   apps/v1/Deployment/openshift-machine-api/baremetal-operator-controller-manager/container/kube-rbac-proxy does not have a memory request (rule: "apps/v1/Deployment/openshift-machine-api/baremetal-operator-controller-manager/container/kube-rbac-proxy/request[memory]") 
INFO[2021-05-10T22:10:26Z]   apps/v1/Deployment/openshift-machine-api/baremetal-operator-controller-manager/container/manager does not have a cpu request (rule: "apps/v1/Deployment/openshift-machine-api/baremetal-operator-controller-manager/container/manager/request[cpu]") 
INFO[2021-05-10T22:10:26Z]   apps/v1/Deployment/openshift-machine-api/baremetal-operator-controller-manager/container/manager does not have a memory request (rule: "apps/v1/Deployment/openshift-machine-api/baremetal-operator-controller-manager/container/manager/request[memory]") 


While we do some profiling on our end to figure what these values need to be, could we get an exception for openshift-machine-api/baremetal-operator-controller-manager/container/kube-rbac-proxy and openshift-machine-api/baremetal-operator-controller-manager/container/manager?

Comment 4 sdasu 2021-05-11 16:12:39 UTC

Ignore previous comment because we were able to get past it.

Comment 7 W. Trevor King 2021-06-03 19:06:00 UTC

I came back to this issue while revisiting in bug 1956606.  From [1], which repeats the procedure from comment 0 on 4.8.0-0.nightly-2021-06-03-084005:

manifests/0000_31_cluster-baremetal-operator_05_prometheus_rbac.yaml
  null
manifests/0000_31_cluster-baremetal-operator_05_rbac.yaml
  null
manifests/0000_90_cluster-baremetal-operator_03_servicemonitor.yaml
openshift-machine-api cluster-baremetal-operator-servicemonitor {"exclude.release.openshift.io/internal-openshift-hosted":"true"}

I think the first two are just from trailing --- in the YAML.  The final entry is because of [2], so the unused manifest is intentional.  I'm a bit fuzzy on why we removed the annotation instead of removing the whole manifest.  If it's been months since making the manifest a no-op, maybe we should just drop it?  Or is there a ticket tracking work to make the ServiceMonitor functional again?

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1956606#c4
[2]: https://github.com/openshift/cluster-baremetal-operator/pull/111/commits/4b78e012b6fa438f5ca597461d9597462bb15dc7

Comment 10 errata-xmlrpc 2021-07-27 23:05:53 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438