Created attachment 1966022 [details] topology_not_visible Created attachment 1966022 [details] topology_not_visible Description of problem (please be detailed as possible and provide log snippests): Topology stops showing and "Oh no! Something went wrong.." became visible. Not clear what might be a trigger, since there was not performed any actions other that clicking deployment elements opening them using management console. Immediately after deployment the Topology was working normally and stopped showing after ±5 hours. Stuck trace: http://pastebin.test.redhat.com/1100573 odf-console deployment without alerts. Warnings are visible in cluster overview. KubeHpaMaxedOut May 21, 2023, 4:31 PM HPA openshift-storage/noobaa-endpoint has been running at max replicas for longer than 15 minutes. View details OVNKubernetesNorthboundDatabaseClusterMemberError May 21, 2023, 3:08 PM OVN northbound database server(s) has not been a RAFT cluster member for a period of time which may indicate degraded OVN database high availability cluster. View details OVNKubernetesNorthboundDatabaseInboundConnectionMissing May 21, 2023, 3:08 PM OVN northbound database server(s) do not have expected number of inbound connections for a RAFT cluster which may indicate degraded OVN database high availability. View details OVNKubernetesNorthboundDatabaseOutboundConnectionMissing May 21, 2023, 3:08 PM OVN northbound database server(s) do not have expected number of outbound connections for a RAFT cluster which may indicate degraded OVN database high availability. View details OVNKubernetesSouthboundDatabaseClusterMemberError May 21, 2023, 3:08 PM OVN southbound database server(s) has not been a RAFT cluster member for a period of time which may indicate degraded OVN database high availability. View details OVNKubernetesSouthboundDatabaseInboundConnectionMissing May 21, 2023, 3:08 PM OVN southbound database server(s) do not have expected number of inbound connections for a RAFT cluster which may indicate degraded OVN database high availability. View details OVNKubernetesSouthboundDatabaseOutboundConnectionMissing May 21, 2023, 3:08 PM OVN southbound database server(s) do not have expected number of outbound connections for a RAFT cluster which may indicate degraded OVN database high availability. View details CannotRetrieveUpdates May 21, 2023, 3:08 PM Failure to retrieve updates means that cluster administrators will need to monitor for available updates on their own or risk falling behind on security or other bugfixes. If the failure is expected, you can clear spec.channel in the ClusterVersion object to tell the cluster-version operator to not retrieve updates. Failure reason VersionNotFound . For more information refer to https://console-openshift-console.apps.dosypenk-215.qe.rh-ocs.com/settings/cluster/. View details AlertmanagerReceiversNotConfigured May 21, 2023, 11:17 AM Alerts are not configured to be sent to a notification system, meaning that you may not be notified in a timely fashion when important failures occur. Check the OpenShift documentation to learn how to configure notifications with Alertmanager. No actions from user was performed, other than using topology - open deployments of the nodes. Open topology from another browser (Safari) did not help. must-gather logs: https://drive.google.com/drive/folders/1qQMRP2kWfyZFRNqdwfT_NtaXMnEucVMH?usp=share_link Version of all relevant components (if applicable): OC version: Client Version: 4.12.0-202208031327 Kustomize Version: v4.5.4 Server Version: 4.13.0-0.nightly-2023-05-20-014943 Kubernetes Version: v1.26.3+b404935 OCS verison: ocs-operator.v4.13.0-203.stable OpenShift Container Storage 4.13.0-203.stable Succeeded Cluster version NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.13.0-0.nightly-2023-05-20-014943 True False 6h30m Cluster version is 4.13.0-0.nightly-2023-05-20-014943 Rook version: rook: v4.13.0-0.e5648f0a2577b9bfd2aa256d4853dc3e8d94862a go: go1.19.6 Ceph version: ceph version 17.2.6-50.el9cp (c202ddb5589554af0ce43432ff07cd7ce8f35243) quincy (stable) Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? No Is there any workaround available to the best of your knowledge? No Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? no Can this issue reproduce from the UI? yes If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Deploy OCP 4.13 and ODF 4.13 2. Login to management console 3. Open Storage / Data Foundation / Topology tab Actual results: "Oh no! Something went wrong.." instead of Topology Expected results: Topology of the cluster presented Additional info: We had an issue in history, looking similar, but with another stuck trace https://bugzilla.redhat.com/show_bug.cgi?id=2192670
My hunch is that there was a network connectivity issue which caused the topology to fail. I will investigate this issue further.
adding console logs for new occurrences 1. http://pastebin.test.redhat.com/1101763 2. Full log -> https://drive.google.com/file/d/1oxnJVVxK-tcAsY30gesif_88E7EWjPc1/view?usp=share_link