1820580 – [4.4] some targets are down with Kuryr network

Bug 1820580 - [4.4] some targets are down with Kuryr network

Summary: [4.4] some targets are down with Kuryr network

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	4.4.0
Assignee:	Michał Dulko
QA Contact:	GenadiC
Docs Contact:
URL:
Whiteboard:
Depends On:	1822861
Blocks:
TreeView+	depends on / blocked

Reported:	2020-04-03 11:48 UTC by Junqi Zhao
Modified:	2020-05-04 11:48 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Clones:	1822861 (view as bug list)
Environment:
Last Closed:	2020-05-04 11:48:02 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
504 error for prometheus API in console (187.15 KB, image/png) 2020-04-03 11:48 UTC, Junqi Zhao	no flags	Details
all targets are down on Prometheus UI (168.28 KB, image/png) 2020-04-03 11:49 UTC, Junqi Zhao	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-network-operator pull 585	0	None	closed	Bug 1820580: [release-4.4] Kuryr: Open metric endpoint ports from pod subnets	2020-05-29 08:52:45 UTC
Red Hat Product Errata	RHBA-2020:0581	0	None	None	None	2020-05-04 11:48:21 UTC

Description Junqi Zhao 2020-04-03 11:48:04 UTC

Created attachment 1675995 [details]
504 error for prometheus API in console

Description of problem:
504 error for prometheus API in console and all targets are down with Kuryr network, see the attached picture. no such error with other network type
# oc get network/cluster -oyaml
apiVersion: config.openshift.io/v1
kind: Network
metadata:
  creationTimestamp: "2020-04-03T02:32:18Z"
  generation: 2
  name: cluster
  resourceVersion: "2607"
  selfLink: /apis/config.openshift.io/v1/networks/cluster
  uid: c12d517e-585d-45bc-bef1-2a566aec0acd
spec:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  externalIP:
    policy: {}
  networkType: Kuryr
  serviceNetwork:
  - 172.30.0.0/16
status:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  networkType: Kuryr
  serviceNetwork:
  - 172.30.0.0/16

all endpoints are down,see the picture, or see from CLI, example:
# token=`oc sa get-token prometheus-k8s -n openshift-monitoring`
# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://192.168.0.26:9100/metrics'
curl: (7) Failed connect to 192.168.0.26:9100; Connection timed out

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Login admin UI as cluster admin, check "Home -> Overview" page
2. Check targets in prometheus UI
3.

Actual results:
504 error for prometheus API in console and all targets are down with Kuryr network

Expected results:
no error

Additional info:

Comment 1 Junqi Zhao 2020-04-03 11:49:36 UTC

Created attachment 1675996 [details]
all targets are down on Prometheus UI

Comment 14 Junqi Zhao 2020-04-20 01:17:16 UTC

Will verify it after Bug 1825215 is fixed

Comment 15 Junqi Zhao 2020-04-23 07:26:29 UTC

Tested with 4.4.0-0.nightly-2020-04-21-210658, all targets are UP

Comment 17 errata-xmlrpc 2020-05-04 11:48:02 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581

Note You need to log in before you can comment on or make changes to this bug.