1955328 – Upgrade of noobaa DB failed when upgrading OCS 4.6 to 4.7

Bug 1955328 - Upgrade of noobaa DB failed when upgrading OCS 4.6 to 4.7

Summary: Upgrade of noobaa DB failed when upgrading OCS 4.6 to 4.7

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Container Storage
Classification:	Red Hat Storage
Component:	Multi-Cloud Object Gateway
Sub Component:
Version:	4.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	OCS 4.7.0
Assignee:	Danny
QA Contact:	Raz Tamir
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1956256
TreeView+	depends on / blocked

Reported:	2021-04-29 21:06 UTC by Petr Balogh
Modified:	2021-05-19 09:21 UTC (History)
CC List:	9 users (show)
Fixed In Version:	v4.7.0-377.ci
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Clones:	1956256 (view as bug list)
Environment:
Last Closed:	2021-05-19 09:21:26 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	noobaa noobaa-operator pull 626/files	0	None	None	None	2021-05-03 10:08:57 UTC
Github	noobaa noobaa-operator pull 628	0	None	open	Backport to 5.7	2021-05-03 11:01:40 UTC

Description Petr Balogh 2021-04-29 21:06:09 UTC

Description of problem (please be detailed as possible and provide log
snippests):
This job:
https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/649/consoleFull

Failed OCS upgrade test as it has not expected number of the pods. When I looked at the must gather I see those pods:
noobaa-db-0                                                       1/1     Running     0          24m     10.129.2.33    ip-10-0-190-240.us-east-2.compute.internal   <none>           <none>
noobaa-db-pg-0                                                    1/1     Running     0          27m     10.129.2.31    ip-10-0-190-240.us-east-2.compute.internal   <none>           <none>
noobaa-operator-b8cd8767-mn7w5                                    1/1     Running     0          27m     10.131.0.94    ip-10-0-137-77.us-east-2.compute.internal    <none>           <none>
noobaa-upgrade-job-2p2tn                                          0/1     Error       0          6m21s   10.129.2.50    ip-10-0-190-240.us-east-2.compute.internal   <none>           <none>
noobaa-upgrade-job-2x9n4                                          0/1     Error       0          12m     10.129.2.43    ip-10-0-190-240.us-east-2.compute.internal   <none>           <none>
noobaa-upgrade-job-8j94d                                          0/1     Error       0          17m     10.129.2.38    ip-10-0-190-240.us-east-2.compute.internal   <none>           <none>
noobaa-upgrade-job-p7dcl                                          0/1     Error       0          22m     10.129.2.34    ip-10-0-190-240.us-east-2.compute.internal   <none>           <none>

http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j018ai3c33-ua/j018ai3c33-ua_20210429T163029/logs/failed_testcase_ocs_logs_1619717324/test_upgrade_ocs_logs/ocs_must_gather/quay-io-rhceph-dev-ocs-must-gather-sha256-0e929cb3857e60e2f154be3ce2f4a2aa2924b2e660e5cf96b9a5f64897a0d072/namespaces/openshift-storage/oc_output/pods_-owide


Version of all relevant components (if applicable):
OCS 4.7.0-364.ci
OCP 4.7.0-0.nightly-2021-04-29-115807


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes


Is there any workaround available to the best of your knowledge?
No


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1


Can this issue reproducible?
Not sure yet, first time I see the issue.


Can this issue reproduce from the UI?
Haven't tried


If this is a regression, please provide more details to justify this:
Yes


Steps to Reproduce:
1. Install OCS 4.6.4
2. Upgrade to latest RC OCS 4.7 build
3.


Actual results:
Noobaa DB is failing to be upgraded

Expected results:
Have noobaa DB upgraded


Additional info:
Job: https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/649
Logs:
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j018ai3c33-ua/j018ai3c33-ua_20210429T163029/logs/failed_testcase_ocs_logs_1619717324/test_upgrade_ocs_logs/

Comment 11 Petr Balogh 2021-05-06 07:13:08 UTC

We have another occurrence of this bug here

https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/687/

With usage of build:
quay.io/rhceph-dev/ocs-registry:4.7.0-377.ci

This time on this env type:
AWS IPI FIPS ENCRYPTION 3AZ RHCOS 3Masters 3Workers 3Infra nodes

First one was on:
AWS IPI 3AZ RHCOS 3Masters 3Workers


http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j003aife3c333-ua/j003aife3c333-ua_20210505T080105/logs/failed_testcase_ocs_logs_1620205433/test_upgrade_ocs_logs/ocs_must_gather/quay-io-rhceph-dev-ocs-must-gather-sha256-76da8d529f412bb79d33d99fec3d180953c257b904fbbd49f102d5637b17fc04/namespaces/openshift-storage/oc_output/pods_-owide

Here I see:
noobaa-db-0                                                       1/1     Running     0          14m     10.130.2.24    ip-10-0-219-56.us-east-2.compute.internal    <none>           <none>
noobaa-db-pg-0                                                    1/1     Running     0          15m     10.130.2.23    ip-10-0-219-56.us-east-2.compute.internal    <none>           <none>
noobaa-operator-7c64ddbcb-pd7mn                                   1/1     Running     0          15m     10.129.2.23    ip-10-0-150-221.us-east-2.compute.internal   <none>           <none>
noobaa-upgrade-job-5wjbk                                          0/1     Error       0          12m     10.129.2.25    ip-10-0-150-221.us-east-2.compute.internal   <none>           <none>
noobaa-upgrade-job-crlrl                                          0/1     Error       0          10m     10.129.2.29    ip-10-0-150-221.us-east-2.compute.internal   <none>           <none>
noobaa-upgrade-job-rz2cr                                          0/1     Error       0          11m     10.129.2.27    ip-10-0-150-221.us-east-2.compute.internal   <none>           <none>
noobaa-upgrade-job-s8c6j                                          0/1     Error       0          12m     10.129.2.26    ip-10-0-150-221.us-east-2.compute.internal   <none>           <none>
noobaa-upgrade-job-wdfhq                                          0/1     Error       0          11m     10.129.2.28    ip-10-0-150-221.us-east-2.compute.internal   <none>           <none>

Full must gather logs:
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j003aife3c333-ua/j003aife3c333-ua_20210505T080105/logs/failed_testcase_ocs_logs_1620205433/test_upgrade_ocs_logs/

Comment 18 Petr Balogh 2021-05-06 08:13:12 UTC

OK, opened the new BZ:
https://bugzilla.redhat.com/show_bug.cgi?id=1957639

I will run few more verification before marking this as verified.

Comment 19 Petr Balogh 2021-05-07 10:53:31 UTC

We haven't seen this issue in the last two RC build when we ran a lot of upgrade testing.

Here I am adding just one of the upgrade job from the same combination where we hit this issue:
https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/707

Log path:
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j021ai3c33-ua/j021ai3c33-ua_20210506T231908

Hence marking as verified.

Comment 21 errata-xmlrpc 2021-05-19 09:21:26 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2041

Note You need to log in before you can comment on or make changes to this bug.