Bug 2122785 - Redhat-operators CatalogSource is not present after few reboots [NEEDINFO]
Summary: Redhat-operators CatalogSource is not present after few reboots
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: distribution
Version: 4.11
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Boris Ranto
QA Contact: Petr Balogh
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-08-30 19:07 UTC by Shay Rozen
Modified: 2023-08-09 16:43 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-10-26 06:16:54 UTC
Embargoed:
muagarwa: needinfo? (srozen)


Attachments (Terms of Use)

Description Shay Rozen 2022-08-30 19:07:34 UTC
Description of problem:
On SNO (wasn't tested on ODF cluster) after multiple reboots CatalogSource is not present. There is an new automation test that tests out of 10 reboots how many times CatalogSource was not present. Is not consistent but it happens. Last run was 2 out of 10.
Maybe it's the wrong component. Please advise.

Version-Release number of selected component (if applicable):
OCP 4.11.0-0.nightly-2022-08-04-081314
Any ODF CatalogSource.

How reproducible:
multiple reboot will reproduce. 

Steps to Reproduce:
1. Install OCP, add catalogsource.
2. In my case I'm installing LVM but I'm not sure it is related.
3. reboot the cluster and check for catalogsource.
4. repeat step 3.

Actual results:
test that is doing 10 reboots reports
ocs_ci.ocs.exceptions.CatalogSourceNotFoundAfterReboot: Catalogsource redhat-operators not found in 2 out of 10 reboots

Expected results:
CatalogSource should be present after all reboots.

Additional info:
Test is gethering events regarding redhat-operators:
[2022-08-30T17:35:56.214Z] 17:35:56 - MainThread - test_lvm_node_reboot_catalogsource - [32mINFO[0m  - ** default                                            4h22m       Warning   ResolutionFailed                             namespace/openshift-storage                                           constraints not satisfiable: no operators found from catalog redhat-operators in namespace openshift-marketplace referenced by subscription odf-lvm-operator, subscription odf-lvm-operator exists
[2022-08-30T17:35:56.214Z] 17:35:56 - MainThread - test_lvm_node_reboot_catalogsource - [32mINFO[0m  - ** openshift-marketplace                              4h16m       Warning   Unhealthy                                    pod/redhat-operators-scr7p                                            Startup probe failed: command timed out
[2022-08-30T17:35:56.214Z] 17:35:56 - MainThread - test_lvm_node_reboot_catalogsource - [32mINFO[0m  - ** openshift-marketplace                              4h12m       Warning   Unhealthy                                    pod/redhat-operators-scr7p                                            Readiness probe errored: rpc error: code = NotFound desc = container is not created or running: checking if PID of 6c5824f03e50a4cc4991171685e67bf9803448bf496963aaab5458bfde7d6eda is running failed: container process not found
[2022-08-30T17:35:56.214Z] 17:35:56 - MainThread - test_lvm_node_reboot_catalogsource - [32mINFO[0m  - ** openshift-marketplace                              4h11m       Warning   Unhealthy                                    pod/redhat-operators-rbstl                                            Startup probe failed: timeout: failed to connect service ":50051" within 1s
[2022-08-30T17:35:56.214Z] 17:35:56 - MainThread - test_lvm_node_reboot_catalogsource - [32mINFO[0m  - ** openshift-marketplace                              170m        Warning   Unhealthy                                    pod/redhat-operators-nwq42                                            Startup probe failed: timeout: failed to connect service ":50051" within 1s
[2022-08-30T17:35:56.214Z] 17:35:56 - MainThread - test_lvm_node_reboot_catalogsource - [32mINFO[0m  - ** default                                            158m        Warning   ResolutionFailed                             namespace/openshift-storage                                           [failed to populate resolver cache from source redhat-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.178.95:50051: connect: connection refused", failed to populate resolver cache from source community-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.105.221:50051: connect: connection refused"]
[2022-08-30T17:35:56.214Z] 17:35:56 - MainThread - test_lvm_node_reboot_catalogsource - [32mINFO[0m  - ** default                                            152m        Warning   ResolutionFailed                             namespace/openshift-storage                                           [failed to populate resolver cache from source redhat-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.178.95:50051: connect: connection refused", failed to populate resolver cache from source certified-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.228.141:50051: connect: connection refused", failed to populate resolver cache from source community-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.105.221:50051: connect: connection refused"]
[2022-08-30T17:35:56.214Z] 17:35:56 - MainThread - test_lvm_node_reboot_catalogsource - [32mINFO[0m  - ** default                                            148m        Warning   ResolutionFailed                             namespace/openshift-storage                                           [failed to populate resolver cache from source community-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.105.221:50051: connect: connection refused", failed to populate resolver cache from source redhat-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.178.95:50051: connect: connection refused", failed to populate resolver cache from source redhat-marketplace/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.120.80:50051: connect: connection refused"]
[2022-08-30T17:35:56.214Z] 17:35:56 - MainThread - test_lvm_node_reboot_catalogsource - [32mINFO[0m  - ** default                                            148m        Warning   ResolutionFailed                             namespace/openshift-storage                                           failed to populate resolver cache from source redhat-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.178.95:50051: connect: connection refused"
[2022-08-30T17:35:56.214Z] 17:35:56 - MainThread - test_lvm_node_reboot_catalogsource - [32mINFO[0m  - ** default                                            141m        Warning   ResolutionFailed                             namespace/openshift-storage                                           [failed to populate resolver cache from source redhat-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.178.95:50051: connect: connection refused", failed to populate resolver cache from source community-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.105.221:50051: connect: connection refused"]
[2022-08-30T17:35:56.214Z] 17:35:56 - MainThread - test_lvm_node_reboot_catalogsource - [32mINFO[0m  - ** openshift-marketplace                              141m        Warning   Unhealthy                                    pod/redhat-operators-nwq42                                            Startup probe failed: timeout: failed to connect service ":50051" within 1s
[2022-08-30T17:35:56.471Z] 17:35:56 - MainThread - test_lvm_node_reboot_catalogsource - [32mINFO[0m  - ** default                                            2m36s       Warning   ResolutionFailed                             namespace/openshift-storage                                           [failed to populate resolver cache from source certified-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.228.141:50051: connect: connection refused", failed to populate resolver cache from source community-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.105.221:50051: connect: connection refused", failed to populate resolver cache from source redhat-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.178.95:50051: connect: connection refused"]
[2022-08-30T17:35:56.471Z] 17:35:56 - MainThread - test_lvm_node_reboot_catalogsource - [32mINFO[0m  - ** default                                            2m9s        Warning   ResolutionFailed                             namespace/openshift-storage                                           constraints not satisfiable: no operators found from catalog redhat-operators in namespace openshift-marketplace referenced by subscription odf-lvm-operator, subscription odf-lvm-operator exists


Note You need to log in before you can comment on or make changes to this bug.