Bug 2037605

Summary: Openshift Virtualization alert 50% of the hyperconverged-cluster-operator-metrics/hyperconverged-cluster-operator-metrics targets in openshift-cnv namespace have been unreachable for more than 15 minutes on port 8686
Product: Container Native Virtualization (CNV) Reporter: Yash <ymotiyel>
Component: InstallationAssignee: João Vilaça <jvilaca>
Status: CLOSED ERRATA QA Contact: Satyajit Bulage <sbulage>
Severity: medium Docs Contact:
Priority: high    
Version: 4.8.3CC: cnv-qe-bugs, dbasunag, fdeutsch, kmajcher, maugarci, stirabos
Target Milestone: ---   
Target Release: 4.10.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: hco-bundle-registry-container-v4.10.2-5 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-06-14 17:42:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yash 2022-01-06 04:45:29 UTC
Description of problem:
In the scenario, Below alerts getting generated due to the services running on port 8686 which were later on seen even after upgrading a cluster.

~~~
alertname = TargetDown
cluster = XX
datacenter = XX
job = hyperconverged-cluster-operator-metrics
namespace = openshift-cnv
prometheus = openshift-monitoring/k8s
service = hyperconverged-cluster-operator-metrics
severity = warning
Annotations
description = 50% of the hyperconverged-cluster-operator-metrics/hyperconverged-cluster-operator-metrics targets in openshift-cnv namespace have been unreachable for more than 15 minutes. This may be a symptom of network connectivity issues, down nodes, or failures within these components. Assess the health of the infrastructure and nodes running these targets and then contact support.
summary = Some targets were not reachable from the monitoring server for an extended period of time.
~~~

[Hostname]$ oc get svc |grep metrics|grep hypercon
hyperconverged-cluster-operator-metrics                          ClusterIP  10.255.112.238  <none>       8383/TCP,8686/TCP  324d       <<<<<<
hyperconverged-cluster-webhook-metrics                           ClusterIP  10.255.96.250   <none>       8383/TCP,8686/TCP  324d
kubevirt-hyperconverged-operator-metrics                         ClusterIP  10.255.139.248  <none>       8383/TCP           7d        <<<<<<<<

[Hostname]$ oc get endpoints |grep operator |grep hyperconverged
hyperconverged-cluster-operator-metrics                          10.254.13.93:8383,10.254.13.93:8686                               324d  
kubevirt-hyperconverged-operator-metrics                         10.254.13.93:8383                                                 7d

Version-Release number of selected component (if applicable):
Tested in OCP 4.6 and OpenShift Virtualization 2.5
Tested in OCP 4.7 and OpenShift Virtualization 2.6
Tested in OCP 4.8 and OpenShift Virtualization 4.8

How reproducible:
Every Time after upgrading from 4.6 to 4.8

Steps to Reproduce:
1. Install OCP Virtualization 4.6  
2. Upgrade OCP 4.6 to 4.8
3.

Actual results:
OCP Virtualization services leftover seen after the upgrade.

Expected results:
There should not be any service with endpoint 8686 As the hco-operator no longer listens on port 8686 and only listens on port 8383. 


Additional info:

Comment 4 Simone Tiraboschi 2022-05-02 14:21:22 UTC
postponing to 4.10.2, not so urgent.

Comment 12 errata-xmlrpc 2022-06-14 17:42:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.10.2 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5026