Bug 2187277

Summary: [Fusion-aaS] managed-fusion-agent.v2.0.11 csv failed in deployment
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: suchita <sgatfane>
Component: odf-managed-serviceAssignee: Ohad <omitrani>
Status: CLOSED NOTABUG QA Contact: Neha Berry <nberry>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.12CC: dbindra, ocs-bugs, odf-bz-bot
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-04-19 06:05:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description suchita 2023-04-17 10:35:08 UTC
Description of problem:
managed-fusion-agent.v2.0.11 csv failed in deployment with the below error in log
-----------------------
2023-04-17T06:45:14.621Z    INFO    controllers.ManagedFusion    reconciling PrometheusProxyNetworkPolicy resources
2023-04-17T06:45:14.622Z    ERROR    controllers.ManagedFusion    An error was encountered during reconcilePhases    {"error": "failed to update egressFirewall: unable to get AWS IMDS ConfigMap: ConfigMap \"aws-data\" not found"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
    /go/pkg/mod/sigs.k8s.io/controller-runtime.5/pkg/internal/controller/controller.go:298
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
    /go/pkg/mod/sigs.k8s.io/controller-runtime.5/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
    /go/pkg/mod/sigs.k8s.io/controller-runtime.5/pkg/internal/controller/controller.go:214
2023-04-17T06:45:14.622Z    ERROR    controller-runtime.manager.controller.secret    Reconciler error    {"reconciler group": "", "reconciler kind": "Secret", "name": "builder-token-dntdr", "namespace": "openshift-logging", "error": "failed to update egressFirewall: unable to get AWS IMDS ConfigMap: ConfigMap \"aws-data\" not found"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
    /go/pkg/mod/sigs.k8s.io/controller-runtime.5/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
    /go/pkg/mod/sigs.k8s.io/controller-runtime.5/pkg/internal/controller/controller.go:214
-----------------------

$ oc get csv -n openshift-storage
NAME                                      DISPLAY                       VERSION           REPLACES                                  PHASE
managed-fusion-agent.v2.0.11              Managed Fusion Agent          2.0.11                                                      Failed

Version-Release number of selected component (if applicable):
oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.12.12   True        False         6h15m

$ oc get csv
NAME                                      DISPLAY                  VERSION           REPLACES                                  PHASE
managed-fusion-agent.v2.0.11              Managed Fusion Agent     2.0.11                                                      Succeeded
observability-operator.v0.0.20            Observability Operator   0.0.20            observability-operator.v0.0.19            Succeeded
ose-prometheus-operator.4.10.0            Prometheus Operator      4.10.0                                                      Succeeded
route-monitor-operator.v0.1.494-a973226   Route Monitor Operator   0.1.494-a973226   route-monitor-operator.v0.1.493-a866e7c   Succeeded

$ oc get csv
ocNAME                                      DISPLAY                       VERSION           REPLACES                                  PHASE
mcg-operator.v4.11.6                      NooBaa Operator               4.11.6            mcg-operator.v4.11.5                      Succeeded
observability-operator.v0.0.20            Observability Operator        0.0.20            observability-operator.v0.0.19            Succeeded
ocs-operator.v4.11.6                      OpenShift Container Storage   4.11.6            ocs-operator.v4.11.5                      Succeeded
ocs-osd-deployer.v2.0.12                  OCS OSD Deployer              2.0.12            ocs-osd-deployer.v2.0.11                  Succeeded
odf-csi-addons-operator.v4.11.6           CSI Addons                    4.11.6            odf-csi-addons-operator.v4.11.5           Succeeded
odf-operator.v4.11.6                      OpenShift Data Foundation     4.11.6            odf-operator.v4.11.5                      Succeeded
ose-prometheus-operator.4.10.0            Prometheus Operator           4.10.0            ose-prometheus-operator.4.8.0             Succeeded
route-monitor-operator.v0.1.494-a973226   Route Monitor Operator        0.1.494-a973226   route-monitor-operator.v0.1.493-a866e7c   Succeeded


How reproducible:
4/4

Steps to Reproduce:
1. Deploy New managed Fusion agent using deployment https://docs.google.com/document/d/1Jdx8czlMjbumvilw8nZ6LtvWOMAx3H4TfwoVwiBs0nE/edit?usp=sharing
2. 
3.

Actual results:
managed-fusion-agent.v2.0.11 csv  in failed status

Expected results:
managed-fusion-agent.v2.0.11 csv should be succeeded. 

Additional info:
Workaround:
The issue is because there are a set of labels added to the namespace to ensure the pod security and because aws data gather pod requires host network the pod was not allowed to come up.
For now as a workaround please apply these labels to the managed-fusion namespace if you see this issue.

labels:
    kubernetes.io/metadata.name: managed-fusion
    pod-security.kubernetes.io/audit: baseline
    pod-security.kubernetes.io/audit-version: v1.24
    pod-security.kubernetes.io/enforce: privileged
    pod-security.kubernetes.io/warn: baseline
    pod-security.kubernetes.io/warn-version: v1.24
    security.openshift.io/scc.podSecurityLabelSync: "false"

Reference Threadlink: https://chat.google.com/room/AAAANBK1onY/fGOB7s8Or6E

Comment 1 Dhruv Bindra 2023-04-19 06:05:25 UTC
The pod-security.kubernetes.io/enforce label was added by the automation that QE team uses. It has been updated to not add the label and create a namespace using oc new-project command as that is the command that lambda will use to create the namespace. Closing the BZ as not a bug.