Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2069705

Summary:	prometheus target "serviceMonitor/openshift-metallb-system/monitor-metallb-controller/0" has a failure with "server returned HTTP status 502 Bad Gateway"
Product:	OpenShift Container Platform	Reporter:	Sunil Gurnale <sgurnale>
Component:	Networking	Assignee:	Federico Paolinelli <fpaoline>
Networking sub component:	Metal LB	QA Contact:	Arti Sood <asood>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	medium
Priority:	medium	CC:	elevin, fpaoline
Version:	4.10
Target Milestone:	---
Target Release:	4.11.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:
Clones:	2089179 (view as bug list)		Environment:
Last Closed:	2022-08-10 11:02:32 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	2089179

Description Sunil Gurnale 2022-03-29 13:57:36 UTC

Description of problem:
MetalLb Upstream metrics endpoint incorrect for controller.
[1] Default Metallb Operator setup has no issue and Prometheus target for the controller is UP.
[2] Issue is observed when trying to hosts the speakers on specific infra nodes, desired outcome is seen the speakers are running on the infra nodes with the specific taints applied, but Prometheus target has a failure with "server returned HTTP status 502 Bad Gateway" error"
[3] kube-rbac-proxy container in the controller pod shows "http: proxy error: dial tcp 10.249.80.164:7472: connect: connection refused"

Version-Release number of selected component (if applicable): OCPv4.10.5

How reproducible: Always

Steps to Reproduce:

1. Install IPI cluster

2. Install MetalLB operator

3. Configure MetalLB instance and verify that monitoring finds metrics target using prometheus target discovery

4. Add infra nodes with taint using infra machine set.
https://docs.openshift.com/container-platform/4.10/networking/metallb/metallb-operator-install.html#nw-metallb-operator-limit-speaker-to-nodes_metallb-operator-install

5. Configure MetalLB and put MetalLB speakers to these infra nodes.

Actual results:
The metrics service endpoint for the MetalLB controller is down when going to https://prometheus-k8s-openshift-monitoring.apps../targets
An alert is seen after a while in the UI as well.

Expected results:
MetalLB controller should be UP in Prometheus target after tainting speaker pods on infra nodes

Additional info:

Cu suspects this issue might be because the container "kube-rbac-proxy" from deployment "controller" is pointing to upstream
"--upstream=http://$(METALLB_HOST):7472/". The METALLB_HOST is configured as "status.hostIP".

We believe this is a bug and the controller deployment from Metallb should point the kube-rbac-proxy container to upstream status.podIP:7472

Comment 5 Federico Paolinelli 2022-04-27 14:18:34 UTC

We found the issue, the ports currently used for metrics are not in openshift's reserved range, working on a fix.

Comment 6 Federico Paolinelli 2022-04-27 14:20:57 UTC

Just an extra note: "reserved range for pods that run with hostnetwork: true".

Comment 9 Federico Paolinelli 2022-05-23 07:33:30 UTC

I just filed https://bugzilla.redhat.com/show_bug.cgi?id=2089179 for tracking the backport.

Comment 10 elevin 2022-05-24 09:00:56 UTC

4.11.0-0.nightly-2022-05-18-171831
metallb-operator.4.11.0-202205191659
=====================================
can not find metallb  metrics on prometheus pods:
******************************************************************
oc exec speaker-kvs66 -n metallb-system -- curl localhost:29151/metrics | grep metallb_bfd_control_packet_output

# HELP metallb_bfd_control_packet_output Number of sent BFD control packets
# TYPE metallb_bfd_control_packet_output counter
metallb_bfd_control_packet_output{peer="10.46.55.34"} 2763
******************************************************************
oc exec prometheus-k8s-0 -n openshift-monitoring -- curl http://localhost:9090/api/v1/query?query=metallb_bfd_control_packet_output

{"status":"success","data":{"resultType":"vector","result":[]}}
******************************************************************

Metallb prometheus targets (ports 9120 & 9121) have status "down"

******************************************************************

Scrape failed
server returned HTTP status 401 Unauthorized

******************************************************************

Comment 12 elevin 2022-05-30 06:25:29 UTC

metallb-operator.4.11.0-202205242136
OCP 4.11.0-0.nightly-2022-05-18-171831
=======================================

• [SLOW TEST:50.158 seconds]
MetalLB BGP
/home/elevin/projects/cnf-gotestMy/remove/onemore/cnf-gotests/test/network/metallb/tests/bgp-test.go:25
  updates
  /home/elevin/projects/cnf-gotestMy/remove/onemore/cnf-gotests/test/network/metallb/tests/bgp-test.go:106
    metrics
    /home/elevin/projects/cnf-gotestMy/remove/onemore/cnf-gotests/test/network/metallb/tests/bgp-test.go:173
      provides Prometheus BGP metrics
      /home/elevin/projects/cnf-gotestMy/remove/onemore/cnf-gotests/test/network/metallb/tests/bgp-test.go:200

Comment 14 errata-xmlrpc 2022-08-10 11:02:32 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069