Bug 1868646
| Summary: | With pre-configured pv-pools before OCS upgrade, noobaa-operator pod reports panic and is in CLBO post upgrade to 4.5-rc1 | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Container Storage | Reporter: | Neha Berry <nberry> |
| Component: | Multi-Cloud Object Gateway | Assignee: | Jacky Albo <jalbo> |
| Status: | CLOSED ERRATA | QA Contact: | Neha Berry <nberry> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.5 | CC: | assingh, ebenahar, etamir, jalbo, muagarwa, nbecker, ocs-bugs |
| Target Milestone: | --- | Keywords: | AutomationBackLog |
| Target Release: | OCS 4.5.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | 4.5.0-64.ci | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-09-15 10:18:38 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Neha Berry
2020-08-13 11:33:08 UTC
looks like ImagePullSecrets was changed from before and after the upgrade. Can you share oc get noobaa -o yaml from before and after the upgrade so we can be sure that's this is the cause of the issue? And if indeed it was changed can you please verify that a normal upgrade without a change to the secrets works? thanks (In reply to Jacky Albo from comment #3) > looks like ImagePullSecrets was changed from before and after the upgrade. > Can you share oc get noobaa -o yaml from before and after the upgrade so we > can be sure that's this is the cause of the issue? > And if indeed it was changed can you please verify that a normal upgrade > without a change to the secrets works? thanks @Jacky I do not have oc get noobaa -o yaml before upgrade. Does it get collected as part of must-gather? Moreover, I am not sure how the secret got changed as we had only created few pv-pools and obcs... We do not even know how to change the secret on get noobaa -o yaml after upgrade $ oc get noobaa -o yaml apiVersion: v1 items: - apiVersion: noobaa.io/v1alpha1 kind: NooBaa metadata: creationTimestamp: "2020-08-10T11:39:51Z" finalizers: - noobaa.io/graceful_finalizer generation: 3 labels: app: noobaa managedFields: - apiVersion: noobaa.io/v1alpha1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:labels: .: {} f:app: {} f:ownerReferences: {} f:spec: .: {} f:affinity: .: {} f:nodeAffinity: .: {} f:requiredDuringSchedulingIgnoredDuringExecution: .: {} f:nodeSelectorTerms: {} f:coreResources: .: {} f:limits: .: {} f:cpu: {} f:memory: {} f:requests: .: {} f:cpu: {} f:memory: {} f:dbImage: {} f:dbResources: .: {} f:limits: .: {} f:cpu: {} f:memory: {} f:requests: .: {} f:cpu: {} f:memory: {} f:dbStorageClass: {} f:dbVolumeResources: .: {} f:requests: .: {} f:storage: {} f:endpoints: .: {} f:maxCount: {} f:minCount: {} f:resources: .: {} f:limits: .: {} f:cpu: {} f:memory: {} f:requests: .: {} f:cpu: {} f:memory: {} f:image: {} f:pvPoolDefaultStorageClass: {} f:tolerations: {} manager: ocs-operator operation: Update time: "2020-08-13T08:54:09Z" - apiVersion: noobaa.io/v1alpha1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:finalizers: {} f:spec: f:cleanupPolicy: {} f:status: .: {} f:accounts: .: {} f:admin: .: {} f:secretRef: .: {} f:name: {} f:namespace: {} f:actualImage: {} f:conditions: {} f:endpoints: .: {} f:readyCount: {} f:virtualHosts: {} f:observedGeneration: {} f:phase: {} f:readme: {} f:services: .: {} f:serviceMgmt: .: {} f:externalDNS: {} f:internalDNS: {} f:internalIP: {} f:nodePorts: {} f:podPorts: {} f:serviceS3: .: {} f:externalDNS: {} f:internalDNS: {} f:internalIP: {} f:nodePorts: {} f:podPorts: {} manager: noobaa-operator operation: Update time: "2020-08-13T12:42:24Z" name: noobaa namespace: openshift-storage ownerReferences: - apiVersion: ocs.openshift.io/v1 blockOwnerDeletion: true controller: true kind: StorageCluster name: ocs-storagecluster uid: a8c09b98-373e-4e11-9b5a-08f241de2bc8 resourceVersion: "33882083" selfLink: /apis/noobaa.io/v1alpha1/namespaces/openshift-storage/noobaas/noobaa uid: 84ccf3cf-753c-433f-8ece-41bdaa53405a spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: cluster.ocs.openshift.io/openshift-storage operator: Exists coreResources: limits: cpu: "1" memory: 4Gi requests: cpu: "1" memory: 4Gi dbImage: registry.redhat.io/rhscl/mongodb-36-rhel7@sha256:ba74027bb4b244df0b0823ee29aa927d729da33edaa20ebdf51a2430cc6b4e95 dbResources: limits: cpu: 500m memory: 500Mi requests: cpu: 500m memory: 500Mi dbStorageClass: ocs-storagecluster-ceph-rbd dbVolumeResources: requests: storage: 50Gi endpoints: maxCount: 1 minCount: 1 resources: limits: cpu: "1" memory: 2Gi requests: cpu: "1" memory: 2Gi image: quay.io/rhceph-dev/mcg-core@sha256:d2e4edc717533ae0bdede3d8ada917cec06a946e0662b560ffd4493fa1b51f27 pvPoolDefaultStorageClass: ocs-storagecluster-ceph-rbd tolerations: - effect: NoSchedule key: node.ocs.openshift.io/storage operator: Equal value: "true" status: accounts: admin: secretRef: name: noobaa-admin namespace: openshift-storage actualImage: quay.io/rhceph-dev/mcg-core@sha256:d2e4edc717533ae0bdede3d8ada917cec06a946e0662b560ffd4493fa1b51f27 conditions: - lastHeartbeatTime: "2020-08-10T11:39:52Z" lastTransitionTime: "2020-08-13T12:42:24Z" message: noobaa operator completed reconcile - system is ready reason: SystemPhaseReady status: "True" type: Available - lastHeartbeatTime: "2020-08-10T11:39:52Z" lastTransitionTime: "2020-08-13T12:42:24Z" message: noobaa operator completed reconcile - system is ready reason: SystemPhaseReady status: "False" type: Progressing - lastHeartbeatTime: "2020-08-10T11:39:52Z" lastTransitionTime: "2020-08-10T11:39:52Z" message: noobaa operator completed reconcile - system is ready reason: SystemPhaseReady status: "False" type: Degraded - lastHeartbeatTime: "2020-08-10T11:39:52Z" lastTransitionTime: "2020-08-13T12:42:24Z" message: noobaa operator completed reconcile - system is ready reason: SystemPhaseReady status: "True" type: Upgradeable endpoints: readyCount: 1 virtualHosts: - s3.openshift-storage.svc observedGeneration: 3 phase: Ready readme: "\n\n\tWelcome to NooBaa!\n\t-----------------\n\tNooBaa Core Version: \ 5.5.0-3ff3e13\n\tNooBaa Operator Version: 2.3.0\n\n\tLets get started:\n\n\t1. Connect to Management console:\n\n\t\tRead your mgmt console login information (email & password) from secret: \"noobaa-admin\".\n\n\t\t\tkubectl get secret noobaa-admin -n openshift-storage -o json | jq '.data|map_values(@base64d)'\n\n\t\tOpen the management console service - take External IP/DNS or Node Port or use port forwarding:\n\n\t\t\tkubectl port-forward -n openshift-storage service/noobaa-mgmt 11443:443 &\n\t\t\topen https://localhost:11443\n\n\t2. Test S3 client:\n\n\t\tkubectl port-forward -n openshift-storage service/s3 10443:443 &\n\t\tNOOBAA_ACCESS_KEY=$(kubectl get secret noobaa-admin -n openshift-storage -o json | jq -r '.data.AWS_ACCESS_KEY_ID|@base64d')\n\t\tNOOBAA_SECRET_KEY=$(kubectl get secret noobaa-admin -n openshift-storage -o json | jq -r '.data.AWS_SECRET_ACCESS_KEY|@base64d')\n\t\talias s3='AWS_ACCESS_KEY_ID=$NOOBAA_ACCESS_KEY AWS_SECRET_ACCESS_KEY=$NOOBAA_SECRET_KEY aws --endpoint https://localhost:10443 --no-verify-ssl s3'\n\t\ts3 ls\n\n" services: serviceMgmt: externalDNS: - https://noobaa-mgmt-openshift-storage.apps.sagrawal-dc3-ind.qe.rh-ocs.com internalDNS: - https://noobaa-mgmt.openshift-storage.svc:443 internalIP: - https://172.30.158.249:443 nodePorts: - https://10.70.60.44:30117 podPorts: - https://10.129.2.40:8443 serviceS3: externalDNS: - https://s3-openshift-storage.apps.sagrawal-dc3-ind.qe.rh-ocs.com internalDNS: - https://s3.openshift-storage.svc:443 internalIP: - https://172.30.96.165:443 nodePorts: - https://10.70.60.44:31431 podPorts: - https://10.129.2.38:6443 kind: List metadata: resourceVersion: "" selfLink: "" ok so @nberry provided me with the cluster creds. Thank you As I was thinking it seems that before the upgrade the system was using a secret in order to reach quay docker hub - default-dockercfg-sd5pn After the upgrade it was change to not using a secret at all removing a secret after a upgrade seems to not be handled correctly - we will need to think of the right way in attacking this old image: quay.io/rhceph-dev/mcg-core@sha256:f5fa382c8bcf832d079692e1980b0560ba5a12e155e8bc0715cfd6acc314f602 new image: quay.io/rhceph-dev/mcg-core@sha256:d2e4edc717533ae0bdede3d8ada917cec06a946e0662b560ffd4493fa1b51f27 maybe it's recommended to check upgrade for now without changing secrets fixed a issue with changing the imagePullSecret in the wrong way 1. there will be IO disruptions on the pods - as they will get restarted for using the new image 2. they are restarted as part of the upgrade - deleting the pod running the old image and starting a new one instead running the new version 3. this is great :) this is important info for us In short this is the expected behaviour and you can go ahead and close it (In reply to Jacky Albo from comment #14) > 1. there will be IO disruptions on the pods - as they will get restarted for > using the new image > 2. they are restarted as part of the upgrade - deleting the pod running the > old image and starting a new one instead running the new version > 3. this is great :) this is important info for us > > In short this is the expected behaviour and you can go ahead and close it thank you Jacky for all the confirmations. Moving the BZ to verified state based on Comment#13 and Comment#14. Thanks Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Container Storage 4.5.0 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3754 |