Bug 2208861

Summary:	[UI] ODF Topology fails render having TypeError
Product:	[Red Hat Storage] Red Hat OpenShift Data Foundation	Reporter:	Daniel Osypenko <dosypenk>
Component:	management-console	Assignee:	Bipul Adhikari <badhikar>
Status:	CLOSED CANTFIX	QA Contact:	Daniel Osypenko <dosypenk>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	4.13	CC:	amagrawa, badhikar, muagarwa, nthomas, odf-bz-bot, skatiyar
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-09-26 04:36:09 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Daniel Osypenko 2023-05-21 16:41:08 UTC

Created attachment 1966022 [details]
topology_not_visible

Created attachment 1966022 [details]
topology_not_visible

Description of problem (please be detailed as possible and provide log
snippests):


Topology stops showing and "Oh no! Something went wrong.." became visible.
Not clear what might be a trigger, since there was not performed any actions other that clicking deployment elements opening them using management console.
Immediately after deployment the Topology was working normally and stopped showing after ±5 hours.  

Stuck trace:
http://pastebin.test.redhat.com/1100573

odf-console deployment without alerts.

Warnings are visible in cluster overview.

KubeHpaMaxedOut
May 21, 2023, 4:31 PM
HPA openshift-storage/noobaa-endpoint has been running at max replicas for longer than 15 minutes.
View details
OVNKubernetesNorthboundDatabaseClusterMemberError
May 21, 2023, 3:08 PM
OVN northbound database server(s) has not been a RAFT cluster member for a period of time which may indicate degraded OVN database high availability cluster.
View details
OVNKubernetesNorthboundDatabaseInboundConnectionMissing
May 21, 2023, 3:08 PM
OVN northbound database server(s) do not have expected number of inbound connections for a RAFT cluster which may indicate degraded OVN database high availability.
View details
OVNKubernetesNorthboundDatabaseOutboundConnectionMissing
May 21, 2023, 3:08 PM
OVN northbound database server(s) do not have expected number of outbound connections for a RAFT cluster which may indicate degraded OVN database high availability.
View details
OVNKubernetesSouthboundDatabaseClusterMemberError
May 21, 2023, 3:08 PM
OVN southbound database server(s) has not been a RAFT cluster member for a period of time which may indicate degraded OVN database high availability.
View details
OVNKubernetesSouthboundDatabaseInboundConnectionMissing
May 21, 2023, 3:08 PM
OVN southbound database server(s) do not have expected number of inbound connections for a RAFT cluster which may indicate degraded OVN database high availability.
View details
OVNKubernetesSouthboundDatabaseOutboundConnectionMissing
May 21, 2023, 3:08 PM
OVN southbound database server(s) do not have expected number of outbound connections for a RAFT cluster which may indicate degraded OVN database high availability.
View details
CannotRetrieveUpdates
May 21, 2023, 3:08 PM
Failure to retrieve updates means that cluster administrators will need to monitor for available updates on their own or risk falling behind on security or other bugfixes. If the failure is expected, you can clear spec.channel in the ClusterVersion object to tell the cluster-version operator to not retrieve updates. Failure reason VersionNotFound . For more information refer to https://console-openshift-console.apps.dosypenk-215.qe.rh-ocs.com/settings/cluster/.
View details
AlertmanagerReceiversNotConfigured
May 21, 2023, 11:17 AM
Alerts are not configured to be sent to a notification system, meaning that you may not be notified in a timely fashion when important failures occur. Check the OpenShift documentation to learn how to configure notifications with Alertmanager.


No actions from user was performed, other than using topology - open deployments of the nodes.
Open topology from another browser (Safari) did not help. 

must-gather logs: https://drive.google.com/drive/folders/1qQMRP2kWfyZFRNqdwfT_NtaXMnEucVMH?usp=share_link

Version of all relevant components (if applicable):

OC version:
Client Version: 4.12.0-202208031327
Kustomize Version: v4.5.4
Server Version: 4.13.0-0.nightly-2023-05-20-014943
Kubernetes Version: v1.26.3+b404935

OCS verison:
ocs-operator.v4.13.0-203.stable              OpenShift Container Storage   4.13.0-203.stable              Succeeded

Cluster version
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.0-0.nightly-2023-05-20-014943   True        False         6h30m   Cluster version is 4.13.0-0.nightly-2023-05-20-014943

Rook version:
rook: v4.13.0-0.e5648f0a2577b9bfd2aa256d4853dc3e8d94862a
go: go1.19.6

Ceph version:
ceph version 17.2.6-50.el9cp (c202ddb5589554af0ce43432ff07cd7ce8f35243) quincy (stable)


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
No

Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
no

Can this issue reproduce from the UI?
yes

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Deploy OCP 4.13 and ODF 4.13 
2. Login to management console
3. Open Storage / Data Foundation / Topology tab


Actual results:
"Oh no! Something went wrong.." instead of Topology

Expected results:
Topology of the cluster presented

Additional info:
We had an issue in history, looking similar, but with another stuck trace
https://bugzilla.redhat.com/show_bug.cgi?id=2192670

Comment 7 Bipul Adhikari 2023-05-24 07:23:44 UTC

My hunch is that there was a network connectivity issue which caused the topology to fail. I will investigate this issue further.

Comment 9 Daniel Osypenko 2023-06-08 06:44:41 UTC

adding console logs for new occurrences
1. http://pastebin.test.redhat.com/1101763
2. Full log -> https://drive.google.com/file/d/1oxnJVVxK-tcAsY30gesif_88E7EWjPc1/view?usp=share_link

Comment 12 Bipul Adhikari 2023-09-26 04:36:09 UTC

This bug is very hard to reproduce and we have not yet hit a customer case related to this issue. Closing it as can't fix. If we ever hit this in production we can reopen it later.