Bug 1967244

Summary: Upgrade from 4.6.4 to 4.7.0 fails during noobaa-upgrade-job
Product: [Red Hat Storage] Red Hat OpenShift Container Storage Reporter: Michael McNeill <mmcneill>
Component: Multi-Cloud Object GatewayAssignee: Romy Ayalon <rayalon>
Status: CLOSED ERRATA QA Contact: Raz Tamir <ratamir>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.7CC: ebenahar, etamir, muagarwa, nbecker, nberry, ocs-bugs, rayalon
Target Milestone: ---Keywords: AutomationBackLog, ZStream
Target Release: OCS 4.7.1   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-06-15 16:50:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Michael McNeill 2021-06-02 16:50:24 UTC
Description of problem (please be detailed as possible and provide log
snippests):

After upgrade from 4.6.4 to 4.7.0, the noobaa-upgrade-job dies consistently with an error:
Jun-2 16:02:45.582 [/13]    [L0] core.util.postgres_client:: _connect: connected { host: 'noobaa-db-pg-0.noobaa-db-pg', user: 'noobaa', password: 'REDACTED', database: 'nbcore', port: 5432 }
Jun-2 16:02:45.582 [/13]    [L0] core.util.postgres_client:: connected
Jun-2 16:02:45.602 [/13]   [LOG] CONSOLE:: migrating system_history
Jun-2 16:03:16.246 [/13] [ERROR] core.util.postgres_client:: postgres_client: T00000000|Q00000066: failed with error: Error: Connection terminated unexpectedly
    at Connection.<anonymous> (/root/node_modules/noobaa-core/node_modules/pg/lib/client.js:132:73)
    at Object.onceWrapper (events.js:421:28)
    at Connection.emit (events.js:315:20)
    at Connection.EventEmitter.emit (domain.js:467:12)
    at Socket.<anonymous> (/root/node_modules/noobaa-core/node_modules/pg/lib/connection.js:108:12)
    at Socket.emit (events.js:327:22)
    at Socket.EventEmitter.emit (domain.js:467:12)
    at endReadableNT (internal/streams/readable.js:1327:12)
    at processTicksAndRejections (internal/process/task_queues.js:80:21)

This prevents the upgrade from completing (as shown in the output of oc get csv)
NAME                                 DISPLAY                            VERSION    REPLACES                             PHASE
cert-utils-operator.v1.0.6           Cert Utils Operator                1.0.6                                           Succeeded
container-security-operator.v3.5.1   Quay Container Security            3.5.1      container-security-operator.v3.5.0   Succeeded
elasticsearch-operator.5.0.2-18      OpenShift Elasticsearch Operator   5.0.2-18                                        Succeeded
ocs-operator.v4.7.0                  OpenShift Container Storage        4.7.0      ocs-operator.v4.6.4                  Installing
quay-operator.v3.5.1                 Red Hat Quay                       3.5.1      quay-operator.v3.5.0                 Succeeded

Version of all relevant components (if applicable):

OpenShift 4.7.9
OpenShift Container Storage 4.7.0 (previously 4.6.4)
VMware 6.7U3
Local Storage Operator 4.7.0-202105210300.p0

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

Yes, Noobaa is entirely not functional. Missing noobaa-core and noobaa-endpoint pods. Noobaa operator doesn't report as ready. 

Is there any workaround available to the best of your knowledge?

No.

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?

1: Simply followed the documentation to upgrade OCS from 4.6 to 4.7


Can this issue reproducible?

Unsure

Can this issue reproduce from the UI?

Unsure

Steps to Reproduce:
1. Upgrade OCS from 4.6.4 to 4.7.0

Actual results:

Upgrade never completes, noobaa-core and noobaa-endpoint pods do not exist, noobaa-upgrade-job fails.

Expected results:

Upgrade completes and OCS returns to available. 

Additional info: 
Must gather attached.

Comment 13 Elad 2021-06-14 17:00:24 UTC
Moving to VERIFIED based on regression testing results with v4.7.1-410.ci

Comment 17 errata-xmlrpc 2021-06-15 16:50:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Container Storage 4.7.1 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2449