Bug 1967435

Summary: noobaa-core-0 core image didn't get upgraded when upgrading OCS from 4.7 to 4.8
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Petr Balogh <pbalogh>
Component: Multi-Cloud Object GatewayAssignee: Liran Mauda <lmauda>
Status: CLOSED WORKSFORME QA Contact: Petr Balogh <pbalogh>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.8CC: ebenahar, etamir, nbecker, ocs-bugs, odf-bz-bot, ratamir
Target Milestone: ---Keywords: Automation
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-12 14:03:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Petr Balogh 2021-06-03 07:18:05 UTC
Description of problem (please be detailed as possible and provide log
snippests):
In this execution AWS IPI FIPS ENCRYPTION 3AZ RHCOS 3M 3W 3I Cluster:
https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/1003/consoleFull

I see that noobaa-core pod image didn't get upgraded.

03:29:55 - MainThread - ocs_ci.ocs.ocs_upgrade - INFO - Old images which are going to be upgraded: ['registry.redhat.io/ocs4/cephcsi-rhel8@sha256:eb8922464a2f5b8a78f0b003d00f208fb319b462b866a18d1e393fffa84a5a34', 'registry.redhat.io/ocs4/mcg-core-rhel8@sha256:1496a3e823db8536380e01c58e39670e9fa2cc3d15229b2edc300acc56282c8c', 'registry.redhat.io/ocs4/mcg-rhel8-operator@sha256:5c9ebda7eb82db9b20d3cbac472e2cc284e099a099e2e8a8db11994e61e17e19',

In must gather:
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j011aife3c333-ua/j011aife3c333-ua_20210602T151755/logs/failed_testcase_ocs_logs_1622655138/test_upgrade_ocs_logs/ocs_must_gather/quay-io-rhceph-dev-ocs-must-gather-sha256-4cf9b04bc34bccb6fd801e42867308aee3dec18987d8507f2b58552d6d45dc19/namespaces/openshift-storage/oc_output/pods

you can see that noobaa-core pod still have old image but I guess it suppose to have one of this new one:
03:29:55 - MainThread - ocs_ci.ocs.ocs_upgrade - INFO - New images for upgrade: ['quay.io/rhceph-dev/cephcsi@sha256:2296774ae82d85b93cef91dbfe6897a6a40dcc1cf3d9cff589b283313474f747', 'quay.io/rhceph-dev/mcg-core@sha256:68832b8afaf01e49f418e67cec1e3def3a86cd967f8ea6fa4728045484cfd69f', 'quay.io/rhceph-dev/mcg-operator@sha256:f73d206c0e206ca9d83bd90d0a9c37a580bf94aca5819cba431876ad8f549e6c',


This is defined in  noobaa-operator-54c886c97c-md4jd
NOOBAA_CORE_IMAGE:        quay.io/rhceph-dev/mcg-core@sha256:68832b8afaf01e49f418e67cec1e3def3a86cd967f8ea6fa4728045484cfd69f


Version of all relevant components (if applicable):
Upgrade from 4.7.0 to 4.8.0-406.ci


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

CSV:
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j011aife3c333-ua/j011aife3c333-ua_20210602T151755/logs/failed_testcase_ocs_logs_1622655138/test_upgrade_ocs_logs/ocs_must_gather/quay-io-rhceph-dev-ocs-must-gather-sha256-4cf9b04bc34bccb6fd801e42867308aee3dec18987d8507f2b58552d6d45dc19/namespaces/openshift-storage/oc_output/csv

NAME                         DISPLAY                       VERSION        REPLACES              PHASE
ocs-operator.v4.8.0-406.ci   OpenShift Container Storage   4.8.0-406.ci   ocs-operator.v4.7.0   Succeeded

Is already in 4.8 . So I guess in this stage the noobaa Core should have new image.


Is there any workaround available to the best of your knowledge?
Don't know.


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1


Can this issue reproducible?
Not sure yet, this is first job I started looking at today, if I will find more occurrences I will link it in follow up comment.

Can this issue reproduce from the UI?
Haven't tried

If this is a regression, please provide more details to justify this:
Yes

Steps to Reproduce:
1. Install OCS 4.7.0 on mentioned platform
2. Upgrade to 4.8.0 internal build
3. Core pod has old image


Actual results:
noobaa-core pod has old image

Expected results:
Have new image in noobaa-core


Additional info:
Must gather:
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j011aife3c333-ua/j011aife3c333-ua_20210602T151755/logs/failed_testcase_ocs_logs_1622655138/test_upgrade_ocs_logs/

Comment 5 Petr Balogh 2021-06-03 08:59:20 UTC
Trying to reproduce it here:
https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/1011/console

This job will pause before teardown.

Comment 6 Petr Balogh 2021-06-04 08:11:13 UTC
In the execution:

https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/1011/consoleFull

it wasn't reproduced as I see we are running now tier1 after upgrade + from console output I see:

18:44:11 - MainThread - ocs_ci.ocs.ocp - INFO - All the images: {'core': 'quay.io/rhceph-dev/mcg-core@sha256:68832b8afaf01e49f418e67cec1e3def3a86cd967f8ea6fa4728045484cfd69f'} were successfully upgraded in: noobaa-core-0!
18:44:11 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc -n openshift-storage get Pod noobaa-db-pg-0 -n openshift-storage -o yaml
18:44:16 - MainThread - ocs_ci.ocs.ocp - INFO - All the images: {'db': 'registry.redhat.io/rhel8/postgresql-12@sha256:03a1e02a1b3245f9aa0ddd3f7507b915a8f7387a1674969f6ef039a5d7fd8bf0', 'init': 'quay.io/rhceph-dev/mcg-core@sha256:68832b8afaf01e49f418e67cec1e3def3a86cd967f8ea6fa4728045484cfd69f'} were successfully upgraded in: noobaa-db-pg-0!
18:44:16 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc -n openshift-storage get Pod noobaa-endpoint-74c8949cf8-pwf8r -n openshift-storage -o yaml
18:44:21 - MainThread - ocs_ci.ocs.ocp - INFO - All the images: {'endpoint': 'quay.io/rhceph-dev/mcg-core@sha256:68832b8afaf01e49f418e67cec1e3def3a86cd967f8ea6fa4728045484cfd69f'} were successfully upgraded in: noobaa-endpoint-74c8949cf8-pwf8r!
18:44:21 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc -n openshift-storage get Pod noobaa-operator-864dbcf7bb-w922v -n openshift-storage -o yaml
18:44:27 - MainThread - ocs_ci.ocs.ocp - INFO - All the images: {'noobaa_cor': 'quay.io/rhceph-dev/mcg-core@sha256:68832b8afaf01e49f418e67cec1e3def3a86cd967f8ea6fa4728045484cfd69f', 'noobaa_db': 'registry.redhat.io/rhel8/postgresql-12@sha256:03a1e02a1b3245f9aa0ddd3f7507b915a8f7387a1674969f6ef039a5d7fd8bf0', 'noobaa-operator': 'quay.io/rhceph-dev/mcg-operator@sha256:f73d206c0e206ca9d83bd90d0a9c37a580bf94aca5819cba431876ad8f549e6c'} were successfully upgraded in: noobaa-operator-864dbcf7bb-w922v!

Comment 11 Petr Balogh 2021-07-12 13:18:12 UTC
We had more executions, for example this one:
https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/1372/

From this production job:
https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-trigger-aws-ipi-fips-encryption-3az-rhcos-3m-3w-3i-upgrade-ocs-auto/18/

This has passed upgrade stage and now running tier1 after upgrade which is still progressing.

So we didn't hit this issue again yet.