Bug 2246334 - [4.12.z clone][MCG] RPC method "list_objects" fails with "RPC: object.list_objects() Call failed: failed to WebSocket dial"
Summary: [4.12.z clone][MCG] RPC method "list_objects" fails with "RPC: object.list_ob...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: Multi-Cloud Object Gateway
Version: 4.12
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ODF 4.12.10
Assignee: Nimrod Becker
QA Contact: Uday kurundwade
URL:
Whiteboard:
: 2238933 (view as bug list)
Depends On: 2227835 2246333
Blocks: 2238925 2238933 2246336
TreeView+ depends on / blocked
 
Reported: 2023-10-26 10:16 UTC by krishnaram Karthick
Modified: 2023-12-14 06:09 UTC (History)
7 users (show)

Fixed In Version: 4.12.10-1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2246333
: 2246336 (view as bug list)
Environment:
Last Closed: 2023-12-14 06:09:22 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github noobaa noobaa-operator pull 1236 0 None open [Backport to 5.12] BugFix - [MCG] RPC method "list_objects" fails with "RPC: object.list_objects() Call failed: failed t... 2023-10-26 19:05:13 UTC
Red Hat Product Errata RHSA-2023:7820 0 None None None 2023-12-14 06:09:42 UTC

Description krishnaram Karthick 2023-10-26 10:16:39 UTC
The bug blocks a lot of automation test runs and needs to be backported to all z streams. This bug is to track the backport for 4.12.z


+++ This bug was initially created as a clone of Bug #2246333 +++

The bug blocks a lot of automation test runs and needs to be backported to all z streams. This bug is to track the backport for 4.13.z

+++ This bug was initially created as a clone of Bug #2227835 +++

Description of problem (please be detailed as possible and provide log
snippests):
----------------------------------------------------------------------------
In certain OCS-CI tests we still use the RPC API to list the objects in a bucket, and recent 4.13 regression analysis showed that they failed with the same error I got when trying to reproduce it on 4.14:

$ ~/ocs-ci/data/mcg-cli api object_api list_objects '{"bucket": "first.bucket"}' -ojson -n openshift-storage                                                                            
INFO[0001] ✅ Exists: NooBaa "noobaa"                    
INFO[0001] ✅ Exists: Service "noobaa-mgmt"              
INFO[0002] ✅ Exists: Secret "noobaa-operator"           
INFO[0002] ✅ Exists: Secret "noobaa-admin"              
INFO[0002] ✈️  RPC: object.list_objects() Request: map[bucket:first.bucket] 
WARN[0002] RPC: GetConnection creating connection to wss://localhost:0/rpc/ 0xc000f37b60 
INFO[0002] RPC: Connecting websocket (0xc000f37b60) &{RPC:0xc0002f86e0 Address:wss://localhost:0/rpc/ State:init WS:<nil> PendingRequests:map[] NextRequestID:0 Lock:{state:1 sema:0} ReconnectDelay:0s cancelPings:<nil>} 
ERRO[0002] RPC: closing connection (0xc000f37b60) &{RPC:0xc0002f86e0 Address:wss://localhost:0/rpc/ State:init WS:<nil> PendingRequests:map[] NextRequestID:0 Lock:{state:1 sema:0} ReconnectDelay:0s cancelPings:<nil>} 
WARN[0002] RPC: RemoveConnection wss://localhost:0/rpc/ current=0xc000f37b60 conn=0xc000f37b60 
ERRO[0002] ⚠️  RPC: object.list_objects() Call failed: failed to WebSocket dial: failed to send handshake request: Get "https://localhost:0/rpc/": dial tcp [::1]:0: connect: can't assign requested address 
FATA[0002] ❌ failed to WebSocket dial: failed to send handshake request: Get "https://localhost:0/rpc/": dial tcp [::1]:0: connect: can't assign requested address

Other RPC queries such as create_auth and read_system still work as expected in both 4.13 and 4.14.


Version of all relevant components (if applicable):
----------------------------------------------------------------------------
OC version:
Client Version: 4.12.0-ec.5
Kustomize Version: v4.5.7
Server Version: 4.14.0-0.nightly-2023-07-30-191504
Kubernetes Version: v1.27.3+4aaeaec

OCS verison:
ocs-operator.v4.14.0-90.stable              OpenShift Container Storage   4.14.0-90.stable              Succeeded

Cluster version
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-0.nightly-2023-07-30-191504   True        False         5h49m   Cluster version is 4.14.0-0.nightly-2023-07-30-191504

Rook version:
rook: v4.14.0-0.a2658b13fd55bc922f3e2c00eb45fc03735ce8c2
go: go1.20.5

Ceph version:
ceph version 17.2.6-100.el9cp (ea4e3ef8df2cf26540aae06479df031dcfc80343) quincy (stable)


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
----------------------------------------------------------------------------
Yes, it's failing OCS-CI tests.


Is there any workaround available to the best of your knowledge?
----------------------------------------------------------------------------
Other methods of listing the bucket, such as "aws s3 ls s3://first.bucket"


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
----------------------------------------------------------------------------
1


Can this issue reproducible?
----------------------------------------------------------------------------
Yes


Can this issue reproduce from the UI?
----------------------------------------------------------------------------
No


If this is a regression, please provide more details to justify this:
----------------------------------------------------------------------------
Yes, the same error was not present in 4.12 regression runs.


Steps to Reproduce:
1. Run the following command via the MCG-CLI:
$ ~/ocs-ci/data/mcg-cli api object_api list_objects '{"bucket": "first.bucket"}' -ojson -n openshift-storage  


Actual results:
----------------------------------------------------------------------------
The query fails and the objects are not listed


Expected results:
----------------------------------------------------------------------------
The bucket's list of objects


Additional info:
----------------------------------------------------------------------------
One of the TCs that show this issue (specifically in check_if_mirroring_is_done):
https://github.com/red-hat-storage/ocs-ci/blob/master/tests/manage/mcg/test_multi_region.py

Example RP links:
https://reportportal-ocs4.apps.ocp-c1.prod.psi.redhat.com/ui/#ocs/launches/465/13108/599033/599034/599035/log
https://reportportal-ocs4.apps.ocp-c1.prod.psi.redhat.com/ui/#ocs/launches/465/13085/598040/598041/598042/log
https://reportportal-ocs4.apps.ocp-c1.prod.psi.redhat.com/ui/#ocs/launches/465/12808/587239/587252/587253/log

Example ocs-must-gather logs:
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j-001vu1cms33-t4a/j-001vu1cms33-t4a_20230702T223827/logs/failed_testcase_ocs_logs_1688341290/test_multiregion_mirror_ocs_logs/j-001vu1cms33-t4a/
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j-027aikt1c33-t4a/j-027aikt1c33-t4a_20230628T101648/logs/failed_testcase_ocs_logs_1687950522/test_multiregion_mirror_ocs_logs/j-027aikt1c33-t4a/
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j-223ai3c33-uo/j-223ai3c33-uo_20230706T071042/logs/failed_testcase_ocs_logs_1688630684/test_fill_bucket_ocs_logs/j-223ai3c33-uo/

--- Additional comment from RHEL Program Management on 2023-07-31 15:52:34 UTC ---

This bug having no release flag set previously, is now set with release flag 'odf‑4.14.0' to '?', and so is being proposed to be fixed at the ODF 4.14.0 release. Note that the 3 Acks (pm_ack, devel_ack, qa_ack), if any previously set while release flag was missing, have now been reset since the Acks are to be set against a release flag.

--- Additional comment from RHEL Program Management on 2023-07-31 15:53:32 UTC ---

This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being proposed as a blocker for this release. Please resolve ASAP.

--- Additional comment from Red Hat Bugzilla on 2023-08-03 08:30:40 UTC ---

Account disabled by LDAP Audit

--- Additional comment from Sagi Hirshfeld on 2023-08-14 13:22:15 UTC ---

Apparently, this is also failing a couple of tier1 that cover noobaa caching scenarios:

- https://github.com/red-hat-storage/ocs-ci/blob/5ef38b97a6c2f594ea7f07b64414c2f44eb83491/tests/manage/mcg/test_namespace_crd.py#L425-L498
- https://github.com/red-hat-storage/ocs-ci/blob/5ef38b97a6c2f594ea7f07b64414c2f44eb83491/tests/manage/mcg/test_namespace_crd.py#L500-L584

Raising the priority to "high" since this is blocking the coverage of the scenarios.

--- Additional comment from Elad on 2023-08-21 09:00:18 UTC ---

Multiple test scenarios are currently blocked and lack coverage. Therefore, setting TestBlocker keyword

--- Additional comment from Danny on 2023-08-21 09:55:42 UTC ---

hi @

--- Additional comment from Danny on 2023-08-21 09:59:48 UTC ---

hi @shirshfe 

what is the API that is failing with the same error? we currently identified a few APIs (like list_objects) that are not supported by "noobaa api" CLI. we are trying to resolve it but this is not a regression. Existing tests should not fail with the same issue.

--- Additional comment from RHEL Program Management on 2023-08-22 07:40:37 UTC ---

This BZ is being approved for ODF 4.14.0 release, upon receipt of the 3 ACKs (PM,Devel,QA) for the release flag 'odf‑4.14.0

--- Additional comment from RHEL Program Management on 2023-08-22 07:40:37 UTC ---

Since this bug has been approved for ODF 4.14.0 release, through release flag 'odf-4.14.0+', the Target Release is being set to 'ODF 4.14.0

--- Additional comment from errata-xmlrpc on 2023-08-23 06:20:16 UTC ---

This bug has been added to advisory RHBA-2023:115514 by ceph-build service account (ceph-build.COM)

--- Additional comment from Sagi Hirshfeld on 2023-08-23 09:06:32 UTC ---

Hi dzaken, the API that is failing in the additional TCs that I added in my previous comment is the same: object_api::list_objects.

At the start of this quarter we had to change the way we make RPC queries so the use the MCG-CLI instead of the HTTP calls due to the deprecation of the old route, which would explain why existing tests have failed. I falsely assumed that both methods ultimately interact with the underlying API in the same manner, thus the Regression KeyWord. I'll remove it now that I better understand the difference.

--- Additional comment from Sagi Hirshfeld on 2023-08-23 17:05:00 UTC ---

Verified on 4.14.0-114: all the above TCs have passed when ran locally.

--- Additional comment from Sunil Kumar Acharya on 2023-09-21 05:54:14 UTC ---

Please update the requires_doc_text(RDT) flag/text appropriately.

--- Additional comment from Mahesh Shetty on 2023-10-04 13:14:23 UTC ---

As dicsussed, this issue still exists in ODF 4.14-139 build

$ noobaa api object_api list_objects '{"bucket": "oc-bucket-25c5ab874d9043aea4fc2a41a5fb16"}' -ojson -n openshift-storage
INFO[0003] ✅ Exists: NooBaa "noobaa"                    
INFO[0003] ✅ Exists: Service "noobaa-mgmt"              
INFO[0004] ✅ Exists: Secret "noobaa-operator"           
INFO[0004] ✅ Exists: Secret "noobaa-admin"              
INFO[0006] ✈️  RPC: object.list_objects() Request: map[bucket:oc-bucket-25c5ab874d9043aea4fc2a41a5fb16] 
WARN[0006] RPC: GetConnection creating connection to wss://localhost:0/rpc/ 0xc00067efc0 
INFO[0006] RPC: Connecting websocket (0xc00067efc0) &{RPC:0xc0000b1950 Address:wss://localhost:0/rpc/ State:init WS:<nil> PendingRequests:map[] NextRequestID:0 Lock:{state:1 sema:0} ReconnectDelay:0s cancelPings:<nil>} 
ERRO[0006] RPC: closing connection (0xc00067efc0) &{RPC:0xc0000b1950 Address:wss://localhost:0/rpc/ State:init WS:<nil> PendingRequests:map[] NextRequestID:0 Lock:{state:1 sema:0} ReconnectDelay:0s cancelPings:<nil>} 
WARN[0006] RPC: RemoveConnection wss://localhost:0/rpc/ current=0xc00067efc0 conn=0xc00067efc0 
ERRO[0006] ⚠️  RPC: object.list_objects() Call failed: failed to websocket dial: failed to send handshake request: Get "https://localhost:0/rpc/": dial tcp [::1]:0: connect: connection refused 
FATA[0006] ❌ failed to websocket dial: failed to send handshake request: Get "https://localhost:0/rpc/": dial tcp [::1]:0: connect: connection refused

--- Additional comment from Mahesh Shetty on 2023-10-04 14:20:57 UTC ---

Logs here: http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bz2227835/

--- Additional comment from Aayush Chouhan on 2023-10-11 07:51:04 UTC ---

This above issue(raised by Mahesh) occured because of the CLI version mismatch. Closing the bug now. Thanks

--- Additional comment from RHEL Program Management on 2023-10-26 10:15:49 UTC ---

This bug having no release flag set previously, is now set with release flag 'odf‑4.14.0' to '?', and so is being proposed to be fixed at the ODF 4.14.0 release. Note that the 3 Acks (pm_ack, devel_ack, qa_ack), if any previously set while release flag was missing, have now been reset since the Acks are to be set against a release flag.

--- Additional comment from RHEL Program Management on 2023-10-26 10:15:49 UTC ---

The 'Target Release' is not to be set manually at the Red Hat OpenShift Data Foundation product.

The 'Target Release' will be auto set appropriately, after the 3 Acks (pm,devel,qa) are set to "+" for a specific release flag and that release flag gets auto set to "+".

--- Additional comment from RHEL Program Management on 2023-10-26 10:15:49 UTC ---

This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being proposed as a blocker for this release. Please resolve ASAP.

Comment 6 krishnaram Karthick 2023-10-27 10:30:49 UTC
*** Bug 2238933 has been marked as a duplicate of this bug. ***

Comment 15 errata-xmlrpc 2023-12-14 06:09:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Data Foundation 4.12.10 Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:7820


Note You need to log in before you can comment on or make changes to this bug.