Bug 2037914 - [vSphere] TargetDown alerts for submariner components in managed clusters
Summary: [vSphere] TargetDown alerts for submariner components in managed clusters
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Advanced Cluster Management for Kubernetes
Classification: Red Hat
Component: Submariner
Version: rhacm-2.4
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
: ---
Assignee: Maayan Friedman
QA Contact: Noam Manos
Christopher Dawson
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-01-06 20:02 UTC by Sidhant Agrawal
Modified: 2022-06-16 08:24 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-06-16 08:24:37 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
screenshots_and_submariner_logs (1.37 MB, application/zip)
2022-01-06 20:02 UTC, Sidhant Agrawal
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github stolostron backlog issues 18762 0 None None None 2022-01-06 21:54:01 UTC

Description Sidhant Agrawal 2022-01-06 20:02:01 UTC
Created attachment 1849338 [details]
screenshots_and_submariner_logs

**What happened**:

There are two TargetDown alerts being reported for submariner components in both Managed cluster which are misleading because the ACM console shows status as Healthy.

Alert details:
```
100% of the submariner-lighthouse-coredns/submariner-lighthouse-coredns targets in submariner-operator namespace have been unreachable for more than 15 minutes. This may be a symptom of network connectivity issues, down nodes, or failures within these components. Assess the health of the infrastructure and nodes running these targets and then contact support.

50% of the submariner-gateway-metrics/submariner-gateway-metrics targets in submariner-operator namespace have been unreachable for more than 15 minutes. This may be a symptom of network connectivity issues, down nodes, or failures within these components. Assess the health of the infrastructure and nodes running these targets and then contact support.
```

**What you expected to happen**:

No false alerts in managed clusters.


**How to reproduce it (as minimally and precisely as possible)**:

1. Install ACM
2. Import 2 Managed clusters with non-overlapping networks
3. Connect the Managed clusters using Submariner add-ons via console
Observe the console for alerts related to submariner component in managed clusters.


**Anything else we need to know?**:

In the ACM console everything looks fine with Connection and Agent status as Healthy for both managed clusters.
The below results from `subctl verify kubeconfig.c1 kubeconfig.c2 --only connectivity,service-discovery` also looks good.
```
Ran 23 of 41 Specs in 641.705 seconds
SUCCESS! -- 23 Passed | 0 Failed | 0 Pending | 18 Skipped
```
Only the alerts in managed clusters are indicating issues with submariner.


**Environment**:
- Platform: VMware vSphere
- Versions:
    OCP: 4.9.0-0.nightly-2021-12-23-045233
    RHACM: 2.4.1
    Submariner: 0.11.0
- Submariner version & image repository, Diagnose information, Gather information from both managed clusters can be found in the attachment.

Comment 4 Daniel Farrell 2022-05-26 12:50:09 UTC
Submariner had no support for Vsphere in 0.11, and in (upcoming) 0.12 it's tech preview. I think there have been some relevant changes in the UI and how we pass statuses. It's possible this was fixed, but it would be good to re-test with ACM 2.5 and SubM 0.12.*.

Comment 6 Nir Yechiel 2022-06-16 08:24:37 UTC
This we retested with ACM 2.5/Submariner 0.12.1 and seems to be working fine now.


Note You need to log in before you can comment on or make changes to this bug.